Monday, January 3, 2011


I just had a eureka moment when it comes to working with Git. I recently moved from Mercurial to Git and it took me a while before I realized that I've been working in a SVN mindset with Mercurial this whole time.
Let me explain.
I now see that it all comes down to a simple methodological decision: create a branch for every meaningful change. I didn't believe this at first when I read this idea a few months ago. Why should I create a branch for every small modification? Changes to the code-base are mostly serial, occurring one after the other and often depending on each other, so won't all this branching just create more merging headaches?

Now I see that they don't. In fact, they reduce merging. The problem was that I was thinking of merging in SVN terms - taking unrelated deltas of code and applying them to the same branch. That indeed is a pain in the ass. But it's not what merging means in Git. In fact, it's a shame that Git even uses the same word - merge. Linus should have come up with a different term. So, yes, what SVN people call "merge" almost never happens in Git, even with all these extra branches being created.

There is a simple fact that just has to sink in when you move from SVN to Git: unless you actively harm Git's history (e.g. cherry-picking revisions) merges always work.* Please digest this fact for a moment. Assume that it's true and try consider how it should change your working habits.

Let's take an example of a typical coding scenario. You work on a new feature and after you start coding for a while you notice you need to do some refactoring or introduce a new common function. In the end you decide to abort the new feature - maybe it didn't pan out as you'd hoped and you choose to cancel it. However, you do want to keep that little bit of refactoring that you did - that's just a good common piece of code that's unrelated to the feature you were working on and you don't want to throw it away along with the abandoned feature.

The wrong way to do this is to do the refactoring on the same branch as the feature. If you do this then you have to later cherry pick that delta of code and apply it to your "trunk". But that circumvents Git's history. It's like taking a patch of code from an unrelated branch and applying it. It may work and it might not. 

The right way to do this is to open a new branch for the refactoring effort and merge it to the feature branch. This is an instantaneous merge in Git - a single click that takes less time than a commit in SVN.

Now, if you want to keep the feature, you just keep working on it and merge it into the trunk in the end. If you want to throw it away, you just merge the refactoring branch back to the trunk and continue from there. In fact, both of these alternatives are trivial "fast forward" operations in Git. In other words, no real merging (in the sense you know from SVN where changes from multiple sources are applied to the same file) is taking place. Git is just moving a "pointer" from one commit to another. It doesn't even look at the files!

I know, the 2nd diagram may look more complicated, but there are really just two tiny steps: creating a refactoring branch and merging it into the feature branch. It's just two lines:
git checkout -b "refactoring"
git merge refactoring
And you don't have to commit anything after the merge. The key difference here is that the correct way of doing things lets Git keep track of the common ancestor of each commit. It then knows how to merge the refactoring into the feature branch because it can go back and see their common history - which is the secret sauce that makes Git merges so delicious. 

This is nice because it also lets you switch between tasks quite easily. Each has its own branch, after all. And at the end of the day you can pick and choose which branches you want to merge into the "trunk" and which to stow away for another day or toss out the window. And it just works

So I learned my lesson. You never know where a given task is going to take you. You often find yourself fixing stuff "along the way" and then having to keep track of these fixes in case you stop working on the original task. By creating a branch for every meaningful change that stands on its own you allow yourself to experiment with different ways of solving a problem and you never think twice about doing some refactoring even in the middle of a risky experimental feature.

* Well, merges nearly always work. If you change the exact same line in multiple branches Git isn't going to try to mash them together. It's just going to tell you to figure it out yourself. Which is great - it's what you want it to do. It takes no chances. When it merges something successfully, you know it's 100% safe. This includes moving functions around between files and within files - almost any sort of change you can think of.

1 comment:

Assaf said...

A neat trick, if you're working against a branch that's tracked on a remote repository, you can work optimistically and only create a topic branch once you decide to switch to something else:

Post a Comment