Git Standards

Wednesday, April 18, 2018

Note: At my old job, I helped write up our Git standards, but I lost access to that document when I changed jobs. To keep track of my thoughts on Git in perpetuity, I am posting them publicly here.

Git is the democracy of programming: it is the worst tool for version control, except all those other tools that have been tried from time to time. Among other problems, Git’s command names are obtuse (what is a “rebase”?) and non-orthogonal (how is “resetting” different than “checking out”?), and Git generally impedes the creation of an accurate mental model, but without understanding its underlying directed acyclic graph, one can’t move from beginner to intermediate user. Still, it is a necessary tool in every developer’s toolbox, and following good Git practices leads to smoother, more productive development. This guide assumes you already know how to use Git and discusses some of the higher level issues around standards for collaboration.

If that doesn't fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of 'It's really pretty simple, just think of branches as...' and eventually you'll learn the commands that will fix everything. — XKCD on Git

Why use Git?

Before talking more about how to use Git per se, it’s useful to review why it is we bother with such a bad tool in the first place, so that we can tell when mastering Git’s subtleties is time well spent and when it is mere yak shaving.

There are three major reasons to use Git:

It allows us to easily synchronize changes with other users and machines.
It gives us greater depth of understanding of why and how code was written.
It allows us to act boldly during development.

The first reason is fairly straightforward. Anyone who has ever had to manually reconcile “group project (final) 2 Saturday (no, really final!).docx” can tell you that just emailing files around is not a good way to keep track of files. However, even a shared drive, Dropbox, or live coding tool is not as good as Git because Git allows for the diffing and merging of simultaneously developed code. Git lets you and your teammates work on the same codebase without stepping on each other’s toes. Git also has the advantage of syncing to a build server without much hassle.

Next, a fine grained Git history is a debugger’s best friend. With an atomic commit history (see below), it becomes easy to tell why and how a bug was introduced or if some seemingly useless code really is obsolete or is still needed to fix an obscure corner case. Just use the git blame command to travel back in time until you find the true source of a line of code (which is often hidden behind false changes to style or syntax that you must dig beneath), and the answer to your question will become clear. A good code review process also builds up understanding among colleagues in real time. Those reviewing your pull request will come away from it with a general sense of what changes are happening to the codebase and why, which makes the project less dependent on one person’s know-how. Those being reviewed will improve from reading the perspectives of others on their code.

Finally, the most fun reason to embrace Git is that it lets you code fearlessly. I have seen and practiced coding without version control. It’s always a bit trepidatious because you never know when you’ll accidentally change something important without realizing it and then lose the ability to get back to a working state. With Git you can confidently try out an idea without fear of creating an unrecoverable state or losing incremental progress. Just take frequent snapshots on your branch and you can always recover, no matter how crazy the idea you try out turns out to be. And if an idea is possibly a good one, but now isn’t a good time to pursue it, one can always just commit it to a branch and push it off to the server for future rediscovery and completion.

Configuring Git

Before diving into development, it’s worth taking the time to setup one’s Git configuration. The file ~/.gitconfig has a variety of useful settings. You will probably want to set up colors for git diff (pro-tip: set a background color, so that you can see whitespace changes to line endings). You may also want to set up auto-pruning of old branches, auto-rebasing of pulled branches, and aliases for frequently used commands. Most important, be sure to set a global ignore file. Git reads from ~/.config/git/ignore by default, but this can be configured. Make sure that Git ignores .DS_Store files if you use a Mac, .pyc files if you program in Python, and in general any files that you don’t want or need preserved in a repo, such as hidden files and files created by IDEs and editors.

Another helpful tool to setup for Git is tab completion. If you use Homebrew and Bash on Mac, add brew install git and then add source "$(brew --prefix)/etc/bash_completion" to your Bash profile. This will let you type git chec⇥ ma⇥ and have it tab complete to git checkout master or any other sub-command or branch.

Creating a repo

When a repo is created, the first commit should be something small so that it is easier to rebase off of the first commit later if necessary. Typically, this should be a .gitignore file for the project (which should be set to ignore password files, compilation artifacts, and other detritous not covered in a user’s global ignore file) and a README.md. Open source projects should also include a LICENSE file.

README.md should explain briefly what the repo is for and how to install it. Installation instructions should be brief, typically something like “Install Python and Postgres then run ./setup.sh” or “Install Docker and run docker-compose -f local.yml up”. If there are complicated installation instructions with many small steps, these should be abstracted and put into an installer script.

At this point in a project, it may also be necessary to check in a large amount of code at once, e.g. copied from an old project or autogenerated by a tool. Since it is not feasible to review large blobs of code like this, this should be done as early in the life of a project as possible and then not again. Code imported from another project should have a note in the commit message explaining where it came from, so the code archeologists can trace it in the future.

Early in a project’s life, there may be no production system that uses the repo, so there will be no deployment process, only local development. In this case, the goal should be to get to the point that a beta system can be established and connected to master as soon as possible, in order to work out the inevitable deployment process bugs. This step will make it easier to share work with non-developer colleagues and shake out bugs before they can threaten delivery dates.

In general, for a new project, the highest priority should be to get the repo to a working state after which normal development using atomic commits (see below) can be practiced.

Git branching basics

Development should be done by branching off of and merging into the master branch. Pushing directly to master branch should be discouraged as a practice (see code reviews, below). Forcing pushing the master branch should be disabled in Github or Gitlab’s branch protection settings.

In the typical case, a developer will begin a branch in order to work on a specific feature with the expectation that development will take less than a week (usually only a day or two). Branches should be named like developer-name/feature-ticket. For example, if Alice Smith opens a branch to work on ticket SUN-234 for fixing horizontal scrolling on mobile, it might be called as/mobile-scroll-SUN-234. If Bob opens a quick branch to improve query time without a ticket in the ticketing system, it might be called bob/query-optimization. The reason to use developer names first is that it makes it easy to tab complete to the branch when checking it out and easy to tell which developer is working on the branch. The reason to end with the ticket number is that ticket numbers are valuable to know but hard to type. The reason to have a description in addition to a ticket number is that ticket numbers are hard to remember. If it is anticipated that multiple developers will work on a branch, it can be prefixed with feature/ instead of a developer’s name.

It may be necessary to have a separate branch named “production” if it is not possible or desirable to deploy frequently, but the preference should be that master is deployed frequently or even automatically.

There are more complicated systems for branching like Git flow that have a separate “develop” branch and distinguish between “features” and “hot fixes”. These systems are fine if they fit your use case, but if a project releases code quickly, the level of complexity involved in Git Flow is unnecessary.

Ideally, branches should be short lived. One strategy to allow merging a branch before a large feature is fully completed is to create a feature flag that lets the production system have the new code in it but in an inactive state.

Commits

Merge branch 'asdfasjkfdlas/alkdjf' into sdkjfls-final — XKCD on Git Commit

A good commit is “atomic”: it takes the repo from one working state to another working state by making the smallest change necessary for the feature or bug fix. Making your commits atomic is a boon to later developers (possibly including yourself), because it lets you understand exactly how and why a change was made. It also makes reviewing code much more tractable. For example, if you need to apply a code formatting tool, move the file to a new location, and add a bug fix, try to make this three separate commits. Doing this in one commit will make it difficult for reviewers to tell what really changed, and it will confuse future code historians.

It is not always easy to make a change atomically. In those cases, you may need to make several smaller commits that constitute part of a change and then “rebase” to “squash” them into a few atomic commits. (Github also has a setting letting you squash all of the commits in a branch as part of merging.)

Avoid blindly committing all changes in your repo. You can use a GUI or git diff to see what has changed since your last commit, or use git add -p and Git will interactively walk you through each chunk of code changed and stage it. This practice makes it less likely that you will erroneously commit junk files, files containing passwords, or debugging code.

A good commit message begins with a short subject line. The subject line must be short because Github and other tools will truncate long messages. The subject line should consist of “[Ticket#] Area: Imperative command”; for example, “SUN-345 CSS: Fix float overflow” or “#789 Backend: Improve query performance”. Ticket numbers should come first because they’re all about the same length, so this makes them easy to skim visually. The prefix before the colon lets the reader quickly know what area of the repo was affected. Imperative voice is used for changes because it is concise.

Some commits will need further details spelled out in the body of the message. It is useful to think of commit messages as another layer of comments embedded in your code. Normal comments should spell out the “why” rather than the “what.” Take this notorious example of a bad comment:

// Increase i by 1
i++;

Contrast it with this good example:

// We need to track the index here so we can report it in the final output
i++;

A bad code comment explains the “what”. This is bad because it is redundant (the code itself should be clear about what it’s doing) and if the code changes, the comment may become inaccurate. A good code comment explains the “why”. This adds information not otherwise visible and is more likely to aid future developers.

Git commit messages should address “which” part of the code has been changed and “what” the change is in their subject line. Unlike a comment in code, a commit message cannot become outdated because it will always report what the purpose of the code was at the time of its creation. If helpful, further explanation about the “why” of a change should go in the body of the commit. For example, the body is a good place to put small details like links to more information about the bug the code is working around or the algorithm the code implements. That said, most commits may need no body at all, and truly helpful “why” comments should be made inline in the code.

Reviews

Make it a habit to review all changes through a pull request (Gitlab uses the term “merge request”), no matter how small the change. Bugs like to sneak in when you make a small change that couldn’t possibly have any negative effects. If a change really is small, then it shouldn’t be a burden to review it. If minimizing time to release is important, then it’s even more important to ensure that the release is correct. Github and other sites have settings to require an approving review before allowing a merge to master. Turn this on.

Large changes are hard to review. Break large changes into many smaller atomic changes to help reviewers understand what’s being done.

It is normal to feel emotional when your code is being reviewed, particularly when working with a new teammate. Reviewers should try to accommodate this by using language that invites response, such as, “Is there a reason you’re doing Y here instead of X? I think X has advantage Z, but there may be something I’m missing.” Try to make it clear as a reviewer that you are trying to improve the code, not criticize its author. Authors should remember that it is expected that a reviewer may have a lot of feedback and corrections, and this is no reflection on their skill level. Because the author has been focusing on getting the branch to work, it is very likely that there may be an easier way to write the code that is only obvious in hindsight or some stylistic changes that would make the code easier to read. Both authors and reviewers should try to practice Egoless Programming by giving and receiving advice in a spirit of openness and concern for the code and the feelings of others.

Often new reviewers aren’t sure where to start. One way to approach a review is by using a rubric to think about the code from a number of different angles:

Is this change atomic? Does it make the minimum viable change to get to a working state?
Does this change follow the DRY (Don’t Repeat Yourself) principle? Can it be simplified by abstracting out repeated patterns?
Is the code readable? Will future maintainers be able to tell what the code is doing and why?
Is this code safe? Does it defend against unexpected inputs or unusual conditions? (E.g. For CSS, will it work on screens with unusual proportions or when a list is unusually short or long.)
Does this code follow the project’s coding standards? Does it pass all tests and linting?

The last question can and should be automated using a continuous integration (CI) system like Jenkins, Travis, Circle CI, or Gitlab Runner to automatically test branches. Most languages now have code formatters so that code can be automatically indented and spaced correctly, such as Prettier for JavaScript and CSS. Use these so that you don’t have to take time in a review to enforce these standards.

Sometimes a developer wants feedback on an unfinished branch. In those cases, open a pull request with the title “WIP:” for “work in progress”. When reviewing a WIP, resist the impulse to comment on small stylistic details. Focus on the big picture and think about whether the approach taken is likely to be optimal. Those looking for feedback can help reviewers by explaining in words what has been done so far, what the plan is for the future, and what areas of the plan present uncertainty.

For example, a front-end designer might open a WIP with a comment like,

I think I can make the CSS easier to extend by changing to BEM naming. I’ve rewritten the stylesheets for the landing page and search page, but this has broken some of the other pages. I’m not sure how to handle nested objects like the search field and it seems like the more info box should be the same on the different pages but one is implemented with floats and the other is flexbox, so I don’t know if it will really be possible to consolidate them.

Reviewers who give feedback should focus on addressing the author’s concerns, not correcting minor stylistic issues with the CSS that will be solved in a later polishing commit.

Sometimes a branch might be used for demonstrating some proof-of-concept code with no expectation that it will ever be merged. In that case, open the pull request with the title “DNM” for “Do Not Merge”.

Merging/rebasing

Once the code has been positively reviewed and automated tests have passed, the author of the code should be the one to perform the actual merge operation. The reason for this is that inevitably, when someone else merges the code, there will be one more small thing that will jump out to the author just as the merge is about to happen. Having the author merge solves this problem and provides a simple convention as to whose job it is to do the merging.

Merges must never have multiple parents. Some quick background. Most commits in Git have one commit which is the parent of that commit. However, a merge commit may have multiple parents. When a merge commit has multiple parents it can become very difficult to tell when a change was introduced or even what the final state of the code is. For this reason, all merges should be able to be “fast forwarded” (although they need not be done as a “fast forward merge”).

To prevent commits with multiple parents from getting into a repo, in Github, uncheck “Allow merge commits” or in Gitlab, choose “Merge commit with semi-linear history” in settings. Once these settings are in place, a branch can only be merged when it can be applied without conflicts or multiple parents. In Github, use “Squash and merge” to combine all commits on a branch into one. This should be done when individual commits were just checkpoints for the developer and not needed for an atomic history. Use “Rebase and merge” when all commits on a branch are atomic and should be preserved as part of a repo’s permanent history.

It is common to read the advice that one should never force push a branch. This advice is outdated. It used to be that forcing pushing could potentially harm the master branch, but now the master branch can be protected from force pushing in the settings of Github and Gitlab, which means it is safe to use force pushing in one’s feature branches.

That’s it. After the branch is merged into master, delete the remote branch, so that members of the project can tell what’s under active development and what’s not. Even after “deleting” the branch, the commits that made up your old commit will still be available for inspection in the merge request, so nothing is lost by deleting it.

Further resources

I retain my copyright on the other posts on this blog, but this post is licensed under the Creative Commons CC-BY 4.0 license and may be be shared with attribution. Feel free to fork the contents of this post on Github.

The Ethically-Trained Programmer