Note: At my old job, I helped write up our Git standards, but I lost access to that document when I changed jobs. To keep track of my thoughts on Git in perpetuity, I am posting them publicly here.
Git is the democracy of programming: it is the worst tool for version control, except all those other tools that have been tried from time to time. Among other problems, Git’s command names are obtuse (what is a “rebase”?) and non-orthogonal (how is “resetting” different than “checking out”?), and Git generally impedes the creation of an accurate mental model, but without understanding its underlying directed acyclic graph, one can’t move from beginner to intermediate user. Still, it is a necessary tool in every developer’s toolbox, and following good Git practices leads to smoother, more productive development. This guide assumes you already know how to use Git and discusses some of the higher level issues around standards for collaboration.
My friend and former colleague Jason Goldstein has a great article up about the problems with Python’s asyncio framework.
For what it’s worth, when I was at PBS, a different coworker and I tried to do a test project to learn how to write asynchronous code. We wrote scripts in both Python 3 and Go that would go onto Github, get a list of users on our project, and download their personal repo information concurrently. When we finished, we compared the apps to see the strengths and weakness of the languages.
Both apps ended up working (although the Python app cheated in a few ways, for example by ignoring paginated responses), but I found the Go app to be easier to write than the Python app, even though it was significantly more verbose. One of the biggest problems for the Python app was just finding documentation that I could understand and apply. In Go, the main problem was that you’re writing the concurrency scaffolding yourself, so it’s easy to write a spaghetti mess if you let yourself. In Python you more often run into the problem that doing something concurrently is a pain, so you do it in a blocking manner even when you shouldn’t. For example, really you should be collecting links asynchronously and adding new links to a queue as you go, but it turns out to be easier to do things one at time, even if that’s less efficient.
Hugo is a great static site generator written in Go. I use it for this blog. Its advantages are that it’s very fast, very easy to set up, and very flexible, but its disadvantage is that it doesn’t have the mature community support that Jekyll has. One example of that is that Hugo has no particular recommended route for managing a static asset pipeline. In this post, I’d like to explain how my personal pipeline works to see if it can help other Hugo users.
At its Worldwide Developers Conference on June 5, Apple announced that one of the tentpole features of macOS High Sierra will be anti-ad tracking technology:
Intelligent Tracking Prevention in Safari uses machine learning to identify and remove the tracking data that advertisers employ to follow users’ web activity.
At first glance, this may seem to be bad for Google and other online advertisers. However, that perception is mistaken.
Camille Fournier has a great article on How do Individual Contributors Get Stuck?, but I’d like to focus today on the opposite question: How do programmers get into flow?
My theory is that there are three different idealized programmer personality types, each with its own strengths and weaknesses. Before you can get into flow, you have to know what kind of programmer you are, so that you know what kinds of things put you into the flow. Not everyone has the same strengths, and that’s okay. In fact, for a good team the more diverse your strengths the better! If everyone on the team has the same strengths, that means they have the same weaknesses and the company may have major blindspots as a result. So what are some strengths and weakness of different kinds of programmers?
Programmers who just want to get things done
Some programmers are happiest when they can see concrete results of their work. Sure, the code they end up producing has some rough edges, but they made something you can see and touch today. These programmers get frustrated when their work has no visible output and seems like it’s just rearranging code that already works.
- Pros: Work fast
- Cons: Work can be sloppy and hard to extend
- Bored when: Refactoring or stuck on a hard problem
- Best use: Getting the prototype 90% done in one week (the other 90% will take a team of a dozen six months)
Programmers who just want to solve hard problems
For some programmers, the harder the problem is, the harder it is to resist. Sure, they were supposed to change all the borders from light grey to dark grey two weeks ago, but they got a little distracted by implementing a ground up rewrite of the rendering engine in OCaml and assembly language, so that will have to wait.
- Pros: Will solve a problem you never thought possible to solve
- Cons: What they produce can be hard for anyone else to grok
- Bored when: Tediously gluing bits of code together
- Best use: Creating the core algorithm that no one else on the team understands but gives your company the edge
- Spirit Languages: Haskell, Clojure
Programmers who just want to create the perfect thing
Some programmers are obsessed with beautiful code. They can’t believe that the code base uses CamelCase and snake_case in the same project. They’re personally offended that a method call runs in O(N2) time instead of O(N log N). They’ve started a secret branch of the repo to fix all the whitespace problems and rewrite the API, but they have to do it on their lunch break because the client just wants to ship a new feature.
- Pros: Create maintainable, performant code
- Cons: Prone to bikeshedding and wheelspinning redesigns; suffer from fear of a blank canvas
- Bored when: The inherited codebase is ugly but there’s no time to fix it
- Best use: Refactoring work done by the other kinds of programmers; bug hunts
- Spirit Languages: Python, Go
Of course, in reality, no one is really just one kind of programmer. We all mix and match different personality elements on different days. But it could be helpful to know yourself and know what kind of programmer you tend to be the closest to in order to utilize your talents best.
So… what kind of programmer are you?
go generate command was added in Go 1.4, “to automate the running of tools to generate source code before compilation.”
If you write code for a living, there’s a chance that at some point in your career, someone will ask you to code something a little deceitful – if not outright unethical.
This happened to me back in the year 2000. And it’s something I’ll never be able to forget.
As developers, we are often one of the last lines of defense against potentially dangerous and unethical practices.
The more software continues to take over every aspect of our lives, the more important it will be for us to take a stand and ensure that our ethics are ever-present in our code.
Since that day, I always try to think twice about the effects of my code before I write it. I hope that you will too.
I think most software engineers want their software to help people and are very concerned by the idea that it might hurt someone. For example, a developer I know is working on a scholarship application form, and he expressed anxiety that a bug in his code might cause someone to lose out on thousands of dollars in tuition money. Consider how much more concerned we should be when hurting people is not an unfortunate bug but the goal of the software in question.
The title of this blog is a bit of ha ha only serious, but given the state of the world at present, now is the time to really think as an industry about the serious part of it.
Maciej Cegłowski has been calling on developers at large tech firms like Google and Facebook to think seriously about how their behavioral data could be used to hurt people, and asking employees to push for better protection of users’ privacy.
I think asking these questions is very important. It’s also very important as individuals to remember that you have a choice. If your company asks you to do something that hurts people, you can say no. I can’t promise you won’t lose your job, but making sure that you are helping people and not hurting them is more important.
You can find yourself another job.
You can’t find yourself another soul.
In the words of Solzhenitsyn:
You can resolve to live your life with integrity. Let your credo be this: Let the lie come into the world, let it even triumph. But not through me.
Addendum (Nov. 18): Please consider reading and signing the Manifesto for Responsible Software Development.
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
The other day at work I was trying to write a test using VCR, a popular Ruby tool (with ports to many other languages) that records and plays back HTTP calls to external services, so you can test your client without actually pinging the external server more than once. I ran into a problem, however. No matter what I did, VCR refused to record my test. Was it because I was testing a
POST request instead of
GET? Was it because I was using a file attachment? Were there bad file permissions on the cassette directory? Was it not properly hooking into my client’s HTTP library? I spent the better part of a Friday trying to solve the question of why VCR wasn’t recording my request and response.
On Monday, I sat down quickly found the problem: I had reorganized some of the code before writing the test and it was failing before a request could even be sent due to a missing import statement (thanks to Ruby for not making that a fatal error). The bug was embarrassingly easy to fix once I started to look for it. The problem was that I spent Friday looking for the wrong bug because I didn’t really understand how VCR works, so I couldn’t tell correct operation from incorrect operation.
Once at another job, we thought about using Django-polymorphic. Our use case was that we had base articles with certain basic features like title, body, and description plus specialized features like lead images or tags depending on the site. Using Django-polymorphic, we could make a query against the base article class and it would return a list of article sub-class instances based on each article’s type. Best of all, it did it all “magically”: no need to specify anything extra when you run the query, the correct results will automatically be returned by the ORM!
Within days of rolling out Django-polymorphic, a bug cropped up on our sites. We couldn’t trace the issue down, but we were convinced it must be a bug in Django-polymorphic. The magical wizard had betrayed us! Finally, in desperation, we ripped Django-polymorphic out, only to find… the problem was actually unrelated.
When we patched the problem, though, we didn’t put Django-polymorphic back. Why? Because we realized that as long as we were using it but didn’t understand it, we would end up blaming all our hard to understand bugs on it, whether the magic inside of Django-polymorphic was the real cause or not. Magic was just too unpredictable. It was better to stick with something less elegant but better understood.
Any sufficiently advanced technology is indistinguishable from magic.
If you don’t know how a technology works, it seems magical to you. The flip-side of Clarke’s quote is one from Joel Spolsky:
All non-trivial abstractions, to some degree, are leaky.
Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions.
Magic is wonderful when it works, but very few abstractions can be trusted not to break down occasionally. That’s not to argue that you can’t use abstractions. A modern computer system is nothing but a wobbling tower of poorly understood abstractions. It’s just when you aren’t aware of the abstractions your system is built on and something goes wrong, the best you can do is make like Lucy Lawless and say
A wizard did it.
Using “a wizard did it” as a plot contrivance is considered a bad way of writing fiction because the audience never knows what the stakes are. If the heroes get into trouble could a wizard just randomly save them? Or if they are about to win, could a wizard just randomly defeat them? The existence of wizards removes all narrative tension because the universe has no certainties.
Krupal Shah wrote a good summation of my thoughts on abstraction:
I don’t think that libraries and frameworks should provide the abstraction in such way that hides the details of the underlying system. Rather, they should provide the abstraction in a way that people can actually understand the underlying system. Too much abstraction becomes a barrier for most people, especially beginners to actually understand what the system does. It makes them think more about doing things rather than going deep and understanding things. That’s not a good thing. We want people to actually understand the details of stuff under the hood, not to learn the ways of using things.
People are willing to put up with magic in TV and movies as long as there seem to be rules that guide what the wizards can or cannot do. We’re in the middle of genre boom for world building fiction (which probably goes too far as it tries to fulfill some sort of psychic need for order in early twenty-first century America) in part because people like learning about system that’s can be fully understood and mastered.
Julia Evans just gave a talk at Strange Loop where she argues instead of sitting around and waiting for a wizard to do it, you can be the wizard!
I used to think to debug things, I had to be really really smart. I thought I had to stare at the code, and think really hard, and magically intuit what the bug was.
It turns out that this isn’t true at all! If you have the right tools, fixing bugs can be really easy. Sometimes with just a little more information, you can figure out what’s happening without being a genius at all.
By mastering debugging tools the let you go deeper into your stack, you can reverse engineer the magic behind whatever is happening your program. You can turn magic back into sufficiently advanced technology. You can follow the leaky abstractions back to the source pipe.
Steve Jobs said,
Everything around you that you call life was made up by people that were no smarter than you and you can change it, you can influence it, you can build your own things that other people can use.
That’s not really true, but in the world of software, it’s close enough to true. One of the real turning points in programming is when you look at some framework or magical tool and say, “I could have made that too. I just didn’t have the time or opportunity.” Once you realize that you could have made any one of the tools you use (but you didn’t because no one has that kind of time), you won’t be afraid anymore to look behind the curtain and find out who the wizard really is:
This is a nice note about one programmer’s evolution on the path to automating all the things. He evolves from just keeping a text file with some shell commands in it to using Ansible and Invoke. It’s worth reading both as a retrospective and as prompt to think about ways that more things can be automated.
There was also a good talk at PyCon this year about how there’s still low hanging fruit in the automation world waiting for us to pluck it:
In my experience, the hardest part of automating something is often just realizing that it’s possible at all. For example, traditionally when developing a webpage, one would make some changes, save the file, run a command to rebuild the site, go to the browser, and press refresh to see the changes. This was not an onerous workflow, but today, with browsersync and related technologies, we can automate it and speed up development.
Another difficulty is to actually take the time to do the automation. After all, we have real work that needs to be done on a deadline, so why faff about with pointless tooling changes? Confusing the matter further is that there are a similar set of changes that really are a waste of time: moving from one technology to an equivalent technology that is marginally better, for example from Apache to Nginx or from MySQL to PostgreSQL. Those changes can and should be put on the back burner because they are only going to provide incrementally better performance or ease of use. But automating is an on-going force multiplier. Automation should be done as soon as feasible so that you can start reaping the rewards of automation sooner rather than later. Automating makes your future development process faster, so it should be prioritized for sooner rather than later, but it looks like the kind of thing that should be prioritized for whenever you get around to it.
In conclusion, XKCD: