If you feel you’re having “De ja vu”, this is possibly the third time that I’ve written about not using GitHub, or more specifically, not using Git. I’ve wrestled with this problem for several years now, and finally, I’m just done trying to use GitHub to host my own projects. Fans of git frequently ask me to explain why I can’t use git, or insist that I’m wrong about git, but I don’t think I’m wrong, and I don’t think I’m being unreasonable. Git simply isn’t the best version control system for software development, not even close. I’m going to spell out my reasoning here, hopefully once, and for all.
Git, Subversion, and Me.
Okay, I’ve spilled the beans so to speak. I’m going to be talking about Git with reference to Subversion. Why? Well because the very first version control system that I was introduced to was Subversion, and I’ve used it for many years. I want to be clear however, that I’m not really here to advocate for using subversion.
You see, as much as I can admit that Git does have strengths, I can also admit that Subversion has weaknesses. The simple fact is however, that the strengths that Subversion has, for me, out-weighed its weaknesses, and the opposite is true for Git.
Having “grown up” on subversion, my development habits make use of subversion features, which git simply doesn’t have. Therefore, as I talk about my reasons for being unable to use git, and the advantages of subversion, I do ask that you please keep this one thing in mind: I’m not trying to hate on one product, or promote the other, I’m simply evaluating them from my own personal experience. That said, I really don’t like git much.
It’s no longer there at “git-scm.com” but there was once a statement that referred to cvs, svn and perforce as “antiquated” version control systems. This statement was early after the release of git, and I expect (though I know nothing for sure here), that it was written by Linus ( Linus Torvalds, author of git ) himself.
Just about everything on the site talked very negatively about earlier version control systems, and Linus is not exactly known for having any significant level of “tact” when he talks about products or companies that he doesn’t like. What is listed there on git-scm.com now is, of course, a list of the tools features and a more measured comparison with older version control system. So lets look at a few of these.
Git was originally written by Linus Torvalds (among others), in order to manage contributions to the Linux Kernel source code. This is going to become quite relevant when I begin talking about the weaknesses of Git, but for now, lets consider some of it’s strengths.
First among them, Git is “distributed.” Fans of git talk about it being a “distributed scm” as though this is some stroke of genius in it’s design which fundamentally makes git superior to everything else. Of course, this is not a fundamental statement of truth, so lets consider what it really means.
Why was git designed to be a distributed system in the first place? Well, we’d have to ask Linus I guess, but he doesn’t answer my calls, so I’ll speculate some. It’s possible that the idea for using a distributed SCM was to lower hosting costs that might be associated with hosting a “central” repository on a server. This isn’t overly likely, I think that there was money floating around the Linux project at the time that git was developed, but it is one possible motivation.
The more probable answer is that a distributed SCM lets you work “old school”, as though you are still passing around copies of the source code on floppy disks. Suppose for a moment that I’ve just written a new feature into the code, and I hand that code over to Linus as a submission for inclusion into the main project. In this scenario, git allows Linus to compare my copy of the source with his own “main” copy, and to pull in my changes.
It also means that if you want to make a new copy of the source code, all you need do, is make a local copy. This is a fast, efficient means of creating essentially a new branch of the code. I can see how this would work well for the Linux Kernel Project.
A follow on from the distributed nature of git is that you can make “local commits”. Now I really have to hand it to git here, when comparing with subversion, local commits is a really nice feature, I like it a lot. I really wish subversion had this, but alas, it currently does not. As a consequence of being able to do local commits, and of git being distributed, you can work just about anywhere – even offline, say, on a plane for example. Okay, nice.
Git is fast – yes, but no, but … well – Okay, the fastest way to create a new branch of some source code project is to just make a copy of whatever directory you have that source code in. Local copies therefore are fast, and Git being distributed does indeed make it fast in this regard.
So, we have Distributed, meaning Fast, Easy Branching, Local Commits, all sounds great. Now lets get to the fun bit where I pound on Git a little bit.
Well the biggest weakness of Git is that it was designed for the Linux Kernel project. This one is going to take a lot more explaining, which I’ll do later in this post, but to put it briefly here… Git was designed for managing a monolithic source code project, in which the majority of the files it would handle would be Assembler or C source code files. It was not designed for managing source code with dependencies, nor was it designed to handle files other than plain text source code. It can do some of these things, to some degree, but it does them poorly. I’ll expound on this further in a moment, but now lets briefly look at some other weaknesses.
In my opinion, the second largest weakness of Git is that it’s distributed.
Yes, it’s biggest strength is also it’s biggest weakness. First of all, let me talk about the often used reasoning for it being a benefit that git is distributed, that being “You can work anywhere, even on a plane!” – Well, generally people don’t actually use git as a distributed system. Oh, I’m sure there are some that do, and if you’re reading this, good for you – but the majority of development uses of git involve hosting a central repository.
It could be GitHub, GitLab, or some other provider, but generally these days a git repository is hosted somewhere centrally. This may be less true in the open source development world, but certainly if you’ve any experience as a commercial developer using git, you’ll be familiar with pulling the “develop” branch, creating a local feature branch, making changes to it, committing, then pushing them back to the central repository, before handling pull requests.
I’ll let you in on a little known secret – if you’re using just about any other version control system, you can pull a copy of the central repository before you go board a plane, work on that code during your flight, and commit it back when you’ve landed… you can even make local copies of that code should you wish to. In truth however, unless you’re flying first class, I cannot fathom how anyone can work on source code while on a flight. How do you even make a laptop comfortably fit between you and the seat in front? Much less concentrate on code with the distractions of a flight! More power to you if you can do this, but I sure can’t, I’ve tried. I am reliably assured by a friend that it is possible, but it simply never worked out for me.
What may be more relevant is that you can indeed work offline using a git repository. This is just as true with most other source code management tools though, and if you’re going to share your code with others, at some point you’re going to have to connect to a public repository. Anyway, the fact is, I can make a copy of even a subversion repository, locally, just as fast as with a git repository, and work on it just fine.
Subversion does litter the project directory with hidden “.svn” directories containing the version information, which I know irritates some, particularly if you still use older diff tools for example, but with modern diff tools being able to simply ignore these, it’s really no longer an issue.
Further more, the “branching” benefits of Git come with many disadvantages. Again, this is more a concern for commercial software houses (don’t worry, I have gripes for open source also, which I’ll come to in a moment), but essentially the commercial software development industry seems to have finally come around to the fact that managing lots of branches is unnecessary and cumbersome.
While it’s certainly not ubiquitous, it’s now generally understood that a strategy of keeping branch lives short and simply ‘tagging’ releases, is a simpler strategy to employ than maintaining multiple release and long-lived feature branches. Of course, the policies of different software houses vary wildly, and they are generally quite slow to adopt new strategy, but things are moving this way. Why? Well because the longer a branch exists, the further it diverges from the “develop” branch, that-is, the branch that is considered to be the current source of truth.
Prior to the existence of git, us subversion users had adopted a “trunk based development” strategy already – it was common practice to create a trunk directory, a branches directory, and a tags directory within your subversion repository, and to use ‘trunk’ as the source of truth. It might be it the latest bleeding edge code, or the current stable code, you could decide depending on what fit your company best, but we had this. I’ve been witnessing more and more companies reverting to this strategy using their git hosted code, as the complications of git branching become ever more apparent. Even git-flow, an entire third-party add on for git, designed to manage complex branching strategy, is now falling out of favor, beginning to be recognized as bad practice.
So git is fast? The website has some measurements for git performance vs subversion. I don’t really want to get into how unfairly skewed those measures are, much beyond asking these questions.. How old are they? Which version of subversion was used? Are the comparisons based on irrelevant tests, such as local branching vs remote branching?
There are several more questions, but frankly, they don’t matter. I’ll just admit it, in many situations subversion is slower, and I don’t care to spend the time to determine all of the finer points of where it is. My point is that they’re different systems with entirely different designs such that these comparisons are quite unreasonable. For instance, I could say that subversion is faster at remote branching…. because git doesn’t do remote branching, it’s comparing oranges to apples. What matters is that subversion is actually “sufficient” in performance. However, performance does matter, and here is where git really fails hard…
Developer productivity. Git is quite complex to learn and use – such that often, if someone talks negatively about git, they’re told in a quite condescending tone “You just have to learn how to use it.” – I’ve certainly had this experience. Well let me say this in no uncertain terms: I’m a software developer by trade, and by hobby. I have, as most software developers have, dealt with complex systems many times throughout my career, I’m not afraid to get technical. I have spent the time to learn git, and I still don’t like it. So where does that leave you?
The highest praise that I can give to a software tool is that it does what it should and is easy to use. Any tool which is not easy to use, is not a useful tool – it’s that simple. Difficulties with complex components of a git based work flow, from understanding how to re-base, how to perform three way merges, how to merge complex branching structures, how to correctly organize and maintain commit history, etc etc – all of these complexities just cost time. If you’ve ever been on the business end of a bad merge, you’ll understand the pain I’m describing here.
Another weakness of git, out of the box at least, is that it is very bad and slow when handling large files. This is quite a well known and understood problem for git, so I’ll spare the detail, you can go look it up for yourself if you’d like to know more. There are now ways to work around the limitations of git – but the work around is a part of my biggest complaint with git, which I haven’t yet even got to!
My weaknesses section here is already quite long, and I’ve not yet gotten to the most relevant weaknesses of git for me. Of course, since I’m here explaining why I won’t be using git any longer, I’m going to have more to say against it, than in it’s favor. Lets get a little bit of a refresher from the rant, by looking at some more strengths of Git.
The True Strengths of Git.
So here it is – an opinion much more than an unbiased fact, the true strength of Git is GitHub. I know, there are other services which host Git repositories now, and I’m not even certain that GitHub was the first, though it may have been. The point here is that GitHub adds a huge weft of value to git. Ironic that what I believe to be gits biggest true strength is a centralized hosting service, but there you have it.
Being able to…
- Create / Alter / Delete a repository at the click of a button.
- To make projects available to the public, with changes reflected in an instant.
- To manage user rights to pull and push source.
- Host pull-request discussions on changes.
- Manage merges in a convenient web UI.
- Trigger actions to run docker containers for CI builds and processes.
- Manage project “tickets” and “issues” all in a clean interface.
- Many More….
The level of convenience introduced by GitHub really tore down the barrier for access to a version control system. GitHub is truly a huge value, and I’m going to miss it for my subversion hosted projects, but I’m still not going to use git – more in a moment. If you want my honest opinion, the name Linus Torvalds may well have had weight in making git popular, but the real driving force behind the popularity of Git has been GitHub.
Earlier version control systems were, and still are available on hosted services, but GitHub just managed to hit that sweet spot of making it all usable with a clean interface. It’s a crying shame that GitHub didn’t branch out to support other version control systems! In fact, go check out svnhub.com briefly, have a giggle at the message there, then come back here.
GitHub fairly recently announced that they’ll be deprecating subversion support. What? Didn’t I just say that they didn’t support subversion? Well yes, and I stand by it. GitHub did provide a bridging service which allowed you to pull git repositories using a subversion client. The reason is simple, subversion was wildly popular prior to git. The reason that gits own website had disparaging remarks against subversion previously, and to this day show performance comparisons with subversion specifically, and that GitHub offered this subversion bridge are all clear. Subversion was the competition.
Now I said clearly above, I’m not really here to praise subversion, it has it’s weaknesses, but it’s still worth noticing negativity aimed at subversion by those with a vested interest in promoting Git. Subversion solves many of the weaknesses of Git that I’ve yet to discuss, and has the added advantage of having solved those problems before Git even created them for us.
The True Weaknesses of Git.
The biggest weakness of Git is that it doesn’t have any real means of handling dependencies. Wait! Before you start talking about Sub-Modules and Sub-Trees and whatever else you’re going to throw at me, I am aware of them, and I’m discounting them as options for handling dependencies. I have my reasons and will explain them in a moment, so give me a chance to make my argument first.
Okay, that out of the way, lets say it again. Git doesn’t have any real means of handling dependencies. You see, it was designed originally as I said, for that monolithic dependency free project, the Linux Kernel. An operating system kernel is the first thing that gets loaded up when a computer is turned on, barring some technical exceptions such as the BIOS and boot-loader, essentially an operating system kernel has no dependencies.
It’s true that on a source code level, a project such as Linux could be broken down into smaller pieces, and each of those be considered dependencies, but they aren’t true dependencies, they’re just sub-components of the main project. In the case of the Linux Kernel, it’s copy-left license means that any other source code which even touches the Linux Kernel code, automatically falls under the same license. Thus, even if a third party library were used, it would immediately become open source, and could be freely copied into the source code directories of the Linux Kernel source. Git was designed originally for working with this monolithic source code project, and thus, had no support for dependencies at all.
Git was released without any sub-modules or sub-trees features, back in April of 2005. It grew in popularity relatively quickly, which was to be expected with a name like Linus Torvalds behind it. At some point in the following two years, it was identified that some ability to pull in dependencies would be useful and sub-modules were added by version 1.5.3 released in November 2007. Sub modules had some limitations however, such that sub trees were introduced in version 1.7.11 in June of 2012 to mitigate them. The trouble is, neither option actually solves the problem of using dependencies in git.
I don’t believe they can either, I think it’s a fundamental design flaw in Git it’s self. I may be wrong of course, and it’s not like I’ve gone poking around the git source or anything, but I think that the limitations on how Git repository URLs currently work, may be tightly woven into the Git source, such that this problem can’t be fixed. Now, I will get to exactly how sub-trees and sub-modules are broken in a moment, but first, I want to discuss why this is a problem and the consequences of it.
I’ve worked for several employers over the past decade ( I’ve had a bit of a bumpy ride ), that use git or some other version control system with the exact same problems, with regards to dependencies. As I’ve said, I’ll come back to explain how each of sub-modules and sub-trees fail, but for the sake of this argument, cast those features aside for a moment, and just accept my proposition that git has no features to support dependencies, and let me demonstrate with a thought experiment.
The thought experiment.
I’d like you to do something, as a thought experiment (or do it for real if you like), a task that with subversion I used to simply take for granted. Go to a popular Git hosting site such as GitHub or GitLab, which-ever your preference, find a reasonable sized project, pull a copy, open it in your IDE of choice, and build it. … Now, did it build? Okay, there may be some that did, but in the vast majority of cases the answer will be no. There will be missing dependencies. The README.MD file for the project will likely have some statement such as “Install Composer” or “Install NPM” or “Homebrew” or “NuGet”, or “Bundler”, “Ant”, “Gradle”,”BowerPhp”, “Poet”…… I could go on all day long.
There are countless third party tools out there for managing source code dependencies, you just can’t avoid them anymore – well, with Git you can’t. Worse still, each of those dependency managers might require you to install something else, say a python interpreter, or the Java runtime, or some other dependency.
So this brings me to the real consequence of this problem of dependencies with Git. All I cared about when I began this project of discovery, was that I was writing a blog and wanted to share code with my readers. I wanted to be able to update that code as and when it needed updating, using a version control system, and have my readers be able to access and use that code. This is essentially an open source purpose.
What I didn’t want, is for my readers to have to install a dependency manager. GitHub seemed like the right choice – you didn’t even have to install a git client because you could click on the little “download zip” button to get a copy of the source all nicely bundled in a compressed file. Perfect! – Or so I thought. The dependency problems meant that I started experimenting with Sub Modules and Sub Trees, trying to find a solution that would allow a reader to simply “download everything” in one action. The solution did not exist.
Before I move on to finally discuss the problems with Sub Modules and Trees, there is another side-related problem with Git. As I mentioned above, and encouraged you to research for yourself. Git has limitations when it comes to file sizes, and even repository sizes. This lead to Microsoft developing the “Git LFS” or Large File System, as well as other extensions. I also mentioned that I did not like this solution, and this is why – In order to use Git LFS you must independently install it. Yet another thing for my readers – or users of your open source project, to have to install before they can use your code or other resources. This is a theme that continues, with the “Recursive Dependency” problem that I’ll briefly discuss below, you can work around it, if you install a third party tool such as “GIL”, but your consumers will have to install this same tool also…
Why Sub-Modules and Sub-Trees fail.
Okay, I’ve promised more than once to explain, neither Sub Modules nor Sub Trees solve the problem of dependencies. It’s not that either feature is not useful, they certainly are, it’s that they are incomplete. Worse, in trying to explain exactly how they are incomplete, between them they solve most, though not all, problems when it comes to dependency management. The problems are that A) You can’t use both features at the same time, so you must decide between their limitations, and B) Neither solves the “Recursive Dependency” problem.
So lets discuss this “Recursive Dependency” problem first, and I do find myself struggling to describe it, so here’s my best shot…
Git repository URL’s are pinned to the root of the target repository, even when using them to pull in ‘external’ repositories. Sub modules and Sub trees therefore must pull in the entire repository (barring the sparse checkout feature I’ll discuss in a moment). Suppose you have two dependencies then, which both depend on a third dependency. The first two, each pulls in their own copy of the third. As independent repositories, each is configured with relative paths to the third within themselves. There is no way to have both of the first two dependencies share a single copy of the third without breaking their own relative paths when they are each pulled as independent projects.
If that all sounds a little difficult to wrap your head around, it’s because it is. Talking about problems of recursion is difficult to articulate to begin with. Consider re-reading it, and if that doesn’t help, consider trying to set the scenario up for yourself and you’ll see what I mean.
Why is it a problem that dependencies can’t be shared among other dependencies? Well it means that you end up with multiple copies of dependencies, and worse, because of additional problems with sub-modules and sub-trees, which I’ll discuss shortly, those copies could be different versions of those dependencies.
Sub Modules are essentially a pointer to a specific commit in some other repository. When Git pulls the parent, it also pulls the sub module – at least, the current version of Git does, but back when I started having problems with dependency management in Git, this was not the case. I had consumers of my code reaching out to ask why the code would not compile, because they’d not issued the command to pull the sub modules. Worse, at that time, the available UI clients for Git did not understand sub modules either, this HAD to be done on the command line.
Of course, I was able to put instructions into the readme file, but still I felt cheated that anyone wanting to use my code had to follow additional command-line steps to be able to get to the point that they could build it. In my earlier subversion hosted projects, a single “Checkout” (which is a pull in Git terminology) would check out everything that was needed.
The problems did not end here either. The UI clients did not understand sub modules at that time, and so, if I wanted to push changes back to a sub module I’d have to do it manually, at the command line. As I’ve said, I’m not afraid of that command line, but this was something I could do trivially in the subversion UI, and it doesn’t stop there either.
I mentioned above that a sub-module is a pointer to a “SPECIFIC REVISION” in another repository. Locking to a revision makes sense if you are using a third-party repository, because when the author of that repository makes changes, your code is not affected by those changes. This means that when breaking changes are made to a dependency, your code does not break, your code is insulated from changes.
When you’re using your own repositories as dependencies however, and you make bug fixes, it would be nice to let those projects which depend on the altered code update to the latest copy, in order to benefit from the bug fixes. With Git sub modules, the only way to do this is to update the reference in the parent repository, such that the sub module pointer now points at the new revision.
This is a task that must be done manually for every repository that depends on the altered repository. At the time I was developing both a unit testing framework, and a runtime library – all of my personal projects depended on both of these, so if I altered either, I’d have to manually alter all of my projects (dozens) to take advantage of that change.
The most bitter pill to swallow of course, subversion already solved this problem. Subversion URL’s are flexible, such that if you want to lock to a specific revision you can do so. For instance, if you have a repository at https://example.com/myrepo and you want the 400th revision, you’d use https://example.com/myrepo@400 to lock the external to that revision. If you wanted to always get the latest, simply omit the @xyz and you’d get the latest.
The short description of sub trees is that they are a copy of the external repository, copied into the parent. Unlike sub modules, a sub tree can be set up to track the head revision of another repository, finally giving that option. However, being a copy and without having their own git information node, they harder to push back to. Still not impossible, and still easy enough for anyone brave enough to type a few commands into a command prompt, but none the less, they are more work for the repository maintainer.
Sub trees ALMOST solve the dependency problem, and if a good client would make them just a little easier to work with, well they’d solve most of the dependency problem – BUT – they are still subject to the recursive dependency problem.
Again, given that I come from a subversion background, consider this scenario. You’ve written some library and pulled it in as a dependency of your project. You encounter a nasty little bug that forces you to step through the code in the debugger, and you’ve found that the problem is in the library. You spot it, you stomp on it, you save your change. Now, you want to push that change back to the library – with git you’re off to the command line to run the series of commands, which, if you’ve studied and memorized the command line switches you know how to do. If you’ve not studied, hopefully you’ve kept a cheat sheet laying around, and can dutifully copy the commands. With subversion, you either issue a single command on the command line, or, with one of the mature and well established clients, click a button and the change goes in. One more button, and all of your repositories are updated with the change, and all with their paths configured such that they compile out of the box.
Sparse Check-Out and Symlinks
I’ve had it suggested to me that sparse check-outs might help to resolve the recursive dependency problem. Sparse check outs is a feature that allows you to check out only a part of a repository, and even lets you pin sub-directories. Unfortunately, the way that sparse check outs pins directories, still leaves it challenging to configure dependencies to resolve the recursive dependency problem. I’ve experimented with them, but they still leave work for the consumer to perform, and thus far I’ve been unable to configure dependencies to work the way that I need them too.
There’s still a chance that I’m miss understanding the feature, it’s been around for some time but is still new to me, and it’s still labeled as an experimental feature of Git, subject to potential future change.
Similarly, it’s been suggested to me that I try using symbolic links to solve the problem. This I haven’t tried yet, but I’m skeptical that symbolic links can work in a cross-platform way. I will try them, for the sake of understanding, but here’s where I’m at….
Summing it up
Git has left us for more than 15 years with this problem of managing dependencies, which from my perspective, it caused. Subversion before it, had far more flexible URLs and could easily be used to configure dependencies such that each repository could either stand alone, or be configured as a dependency with relative paths. It could lock to a revision, or track the latest.
During its life-time, Git has tried to solve the problem, first with Sub Modules, then Sub Trees, and now Sparse, and yet, it still fails. Even if sparse check-outs, with or without the addition of symbolic links, does eventually solve this problem – Why?!
With so many other issues compounded on top of the dependency problems, why should I continue going out of my way to try to work with this tool? I have a tool that works already.
Each time friends, fans of git, or anyone else tries to help me solve this problem, it ultimately fails. At this point, trying to make it work just feels like a desperate reach. I know you want to help me, and that you feel I’m missing out without Git in my life, but honestly I’m not. I’m happy with my now subversion setup.
Subversion is My Answer.
I am done fighting with my version control tool to do things that should be trivial. The fans of Git still would like to convince me to use it, and that it’s amazing and awesome, and distributed! Whatever, you do you. I need a tool that meets my very simple dependency requirements, while being easy to use for me, and those consuming my code. Git is NOT it.
I have written my own internal management system for subversion. Using the Apache foundation builds, an apache http server, and some custom code, I’m able to use my system to create, edit, and drop repositories. I’m able to embed them into my blog for others to download easily, you don’t even need a subversion client to use them – just click “Download Zip” to get the source in a zip file. You’ll get the source, and all dependencies, so long as those dependencies are my own repositories.
Since migrating my projects back to subversion, I’ve been able to setup a complete build infrastructure to automatically build code and automate inline documentation. My repositories are rapidly becoming easier to manage, and share code between, and I finally feel I can make progress. I still use Git for my employer, but for my own purposes, it’s been exorcised from my workflow entirely.
I’m also considering making parts of my home-brew tooling available to you also. If you’re interested, just keep an eye on my blog or social media.
Thanks for reading!