A Wizard Did It

Monday, September 19, 2016

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

— Brian W. Kernighan

The other day at work I was trying to write a test using VCR, a popular Ruby tool (with ports to many other languages) that records and plays back HTTP calls to external services, so you can test your client without actually pinging the external server more than once. I ran into a problem, however. No matter what I did, VCR refused to record my test. Was it because I was testing a POST request instead of GET? Was it because I was using a file attachment? Were there bad file permissions on the cassette directory? Was it not properly hooking into my client’s HTTP library? I spent the better part of a Friday trying to solve the question of why VCR wasn’t recording my request and response.

On Monday, I sat down quickly found the problem: I had reorganized some of the code before writing the test and it was failing before a request could even be sent due to a missing import statement (thanks to Ruby for not making that a fatal error). The bug was embarrassingly easy to fix once I started to look for it. The problem was that I spent Friday looking for the wrong bug because I didn’t really understand how VCR works, so I couldn’t tell correct operation from incorrect operation.

Once at another job, we thought about using Django-polymorphic. Our use case was that we had base articles with certain basic features like title, body, and description plus specialized features like lead images or tags depending on the site. Using Django-polymorphic, we could make a query against the base article class and it would return a list of article sub-class instances based on each article’s type. Best of all, it did it all “magically”: no need to specify anything extra when you run the query, the correct results will automatically be returned by the ORM!

Within days of rolling out Django-polymorphic, a bug cropped up on our sites. We couldn’t trace the issue down, but we were convinced it must be a bug in Django-polymorphic. The magical wizard had betrayed us! Finally, in desperation, we ripped Django-polymorphic out, only to find… the problem was actually unrelated.

When we patched the problem, though, we didn’t put Django-polymorphic back. Why? Because we realized that as long as we were using it but didn’t understand it, we would end up blaming all our hard to understand bugs on it, whether the magic inside of Django-polymorphic was the real cause or not. Magic was just too unpredictable. It was better to stick with something less elegant but better understood.

Any sufficiently advanced technology is indistinguishable from magic.

— Arthur C. Clarke

If you don’t know how a technology works, it seems magical to you. The flip-side of Clarke’s quote is one from Joel Spolsky:

All non-trivial abstractions, to some degree, are leaky.
Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions.

Magic is wonderful when it works, but very few abstractions can be trusted not to break down occasionally. That’s not to argue that you can’t use abstractions. A modern computer system is nothing but a wobbling tower of poorly understood abstractions. It’s just when you aren’t aware of the abstractions your system is built on and something goes wrong, the best you can do is make like Lucy Lawless and say

A wizard did it.

Using “a wizard did it” as a plot contrivance is considered a bad way of writing fiction because the audience never knows what the stakes are. If the heroes get into trouble could a wizard just randomly save them? Or if they are about to win, could a wizard just randomly defeat them? The existence of wizards removes all narrative tension because the universe has no certainties.

Krupal Shah wrote a good summation of my thoughts on abstraction:

I don’t think that libraries and frameworks should provide the abstraction in such way that hides the details of the underlying system. Rather, they should provide the abstraction in a way that people can actually understand the underlying system. Too much abstraction becomes a barrier for most people, especially beginners to actually understand what the system does. It makes them think more about doing things rather than going deep and understanding things. That’s not a good thing. We want people to actually understand the details of stuff under the hood, not to learn the ways of using things.

People are willing to put up with magic in TV and movies as long as there seem to be rules that guide what the wizards can or cannot do. We’re in the middle of genre boom for world building fiction (which probably goes too far as it tries to fulfill some sort of psychic need for order in early twenty-first century America) in part because people like learning about system that’s can be fully understood and mastered.

Julia Evans just gave a talk at Strange Loop where she argues instead of sitting around and waiting for a wizard to do it, you can be the wizard!

I used to think to debug things, I had to be really really smart. I thought I had to stare at the code, and think really hard, and magically intuit what the bug was.
It turns out that this isn’t true at all! If you have the right tools, fixing bugs can be really easy. Sometimes with just a little more information, you can figure out what’s happening without being a genius at all.

By mastering debugging tools the let you go deeper into your stack, you can reverse engineer the magic behind whatever is happening your program. You can turn magic back into sufficiently advanced technology. You can follow the leaky abstractions back to the source pipe.

Steve Jobs said,

Everything around you that you call life was made up by people that were no smarter than you and you can change it, you can influence it, you can build your own things that other people can use.

That’s not really true, but in the world of software, it’s close enough to true. One of the real turning points in programming is when you look at some framework or magical tool and say, “I could have made that too. I just didn’t have the time or opportunity.” Once you realize that you could have made any one of the tools you use (but you didn’t because no one has that kind of time), you won’t be afraid anymore to look behind the curtain and find out who the wizard really is:

Automate It!

Friday, June 17, 2016

Curtis Lassam - Automation for the People

This is a nice note about one programmer’s evolution on the path to automating all the things. He evolves from just keeping a text file with some shell commands in it to using Ansible and Invoke. It’s worth reading both as a retrospective and as prompt to think about ways that more things can be automated.

There was also a good talk at PyCon this year about how there’s still low hanging fruit in the automation world waiting for us to pluck it:

Alex Gaynor - The cobbler’s children have no shoes, or building better tools for ourselves (slides)

In my experience, the hardest part of automating something is often just realizing that it’s possible at all. For example, traditionally when developing a webpage, one would make some changes, save the file, run a command to rebuild the site, go to the browser, and press refresh to see the changes. This was not an onerous workflow, but today, with browsersync and related technologies, we can automate it and speed up development.

Another difficulty is to actually take the time to do the automation. After all, we have real work that needs to be done on a deadline, so why faff about with pointless tooling changes? Confusing the matter further is that there are a similar set of changes that really are a waste of time: moving from one technology to an equivalent technology that is marginally better, for example from Apache to Nginx or from MySQL to PostgreSQL. Those changes can and should be put on the back burner because they are only going to provide incrementally better performance or ease of use. But automating is an on-going force multiplier. Automation should be done as soon as feasible so that you can start reaping the rewards of automation sooner rather than later. Automating makes your future development process faster, so it should be prioritized for sooner rather than later, but it looks like the kind of thing that should be prioritized for whenever you get around to it.

In conclusion, XKCD:

[<img src="/post/img/2016-06-17-is_it_worth_the_time.png" alt=“Is It Worth the Time?” title=“Don’t forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now.” width=“571”

]xkcd

June link round up

Friday, June 10, 2016

Here are few quick links to things I’ve been thinking about recently:

Jason Goldstein - The Dark Side of Bespoke Cover Stories (And How We Tried to Solve It)
My former colleague at The Atlantic, Jason, writes about how they do custom templates for feature stories:
To solve this problem, we made a small tool called Enhancements.
The way web frameworks (i.e., Django) render a page is by taking a context (information from the database, like the article’s text, authors, and images) and applying it to a template, producing HTML. Enhancements provides a way to modify the context just before the page renders. This means:
1. The version of the article we keep in the database is clean. If you take the customizations away it looks like any other big story, complete with large art, but none of the bespoke styles.
2. On the web, for as long as we choose to maintain it, I can do any transformations I want.
That’s given us the freedom to layer headlines over photos, make custom pullquotes that fit the tone of the piece, tweak caption styles, and other small design changes that help capture the tone of the story.
Lately, I’ve really been getting religion around the idea that the database should always be kept pristine, and if you need to do some sort of denormalization or a translation layer, you should store it in memcache or otherwise away from the rest of your data.
Ashley Nelson-Hornstein - On Heroes
Tells the fascinating story of Annie Easley’s career at NASA.
Code Switch - Coding While Black: Hacking The Future Of The Tech Industry
“I was actually in a meeting — a very important meeting,” he begins. “And I get a call from my resident director and says you need to leave your meeting now and you need to come down to the Atlanta Check Cashing outlet on Forsyth Street.”
One of his Code Start students had tried to cash his monthly $500 stipend, but the clerk suspected the postal money order was fake. She took his identification and told him to call the police on himself.
When Sampson arrived, the 19-year-old was in handcuffs in the back of a police car. And when Sampson spoke up for his student, officers immediately began to grill him about the money order. “And so they were like, ‘Do you have the receipt?’ At first I was like, I don’t have to prove that I purchased something. But here I am pulling out a receipt on Forsyth Street in downtown Atlanta, showing these officers that I purchased all of these money orders, so I was a little uncomfortable doing that.”
After 30 minutes of back and forth, the student was released from police custody. He got his money order and ID back, but the incident shook him so much that he dropped out of the program.
Ugh.

Review: Geek Sublime

Tuesday, May 10, 2016

Vikram Chandra’s Geek Sublime: The Beauty of Code, the Code of Beauty is an audaciously weird book. It seems to be in part:

A meditation on the conflict between Hackers and Painters and Dabblers and Blowhards
An explanation of the basic mechanisms of digital computers in terms of logic gates and booleans
A sociological history of programming as a profession, especially with respect to the exclusion of women
A memoir of author’s childhood in India and the writing of his first novel
A reflection on the effect of colonialism on contemporary Indians
An explanation of rasa theory in Anandavardhana and Abhinavagupta

It succeeds marvelously at several of these goals, and it mostly also succeeds at another goal:

An attempt to sell to a Western audience a book written in a traditional Indian style instead of the contemporary Western style of introduction, thesis, support, conclusion

Python Packaging

Friday, May 6, 2016

EuroPython 2015 - Less Known Packaging Features and Tricks (Slides)

This is a nice presentation because it presents strong opinions about how to do packaging in Python. In a way, what those opinions are do not matter. What matters is that someone has thought about the tradeoffs involved and come to a decision, so that you don’t have to spend more of your own time thinking about those tradeoffs.

Ionel Mărieș has also done a series of blog posts on packaging in Python, including The problem with packaging in Python:

In a way, packaging in Python is a victim of bad habits — complex conventions, and feature bloat. It’s hard to simplify things, because all the historical baggage people want to carry around. […] Concretely what I want is […] one way to do it. Not one clear way, cause we document the hell out of it, but one, and only one, way to do it.

Preach it.

Packaging is a hard problem in any language and made more so because all modern languages are expected to interface with C somehow or another and C has terrible build systems that always break unless you babysit them. That said, Python does a worse than average job at it:

Python itself is highly portable, but because so many Python packages rely on C extensions, in practice Python is as difficult to get installed as C due to DLL hell.
There are really only three places one would like to install a Python package: globally, user locally, or in a specific folder. Pip, however, is hostile to all but the choice of global installation, so users have to use virtual environments to get around this limitation.
Installation uses both files of pure metadata (MANIFEST.in) and files of arbitrary code (setup.py), giving one the disadvantages of both.
One wants soft requirements for libraries and hard requirements for apps, so there are two incompatible means of specifying requirements (requirements.txt and setup.py), but the hard requirements file uses the soft requirements list unless you remember to use --no-deps when running it.
&c. &c. &c.

More Thoughts on Machines That Think

Wednesday, April 13, 2016

Two quick links with follow-up on the last post:

Julia Evans - Looking inside machine learning black boxes
I talked to someone at a conference a while ago who worked on automated trading systems, and we were talking about how machine learning approaches can be really scary because you fundamentally don’t know whether the ML is doing a thing because it’s smart and correct and better than you, or because there’s a bug in the data.
He said that they don’t use machine learning in their production systems (they don’t trust it). But they DO use machine learning! Their approach was to
- have experts hand-build a model
- have the machine learning team train a model, and show it to the experts
- the expert says “oh, yes, I see the model is doing something smart there! I will build that in to my hand-built system”
I don’t know if the this is the best thing to do, but I thought it was very interesting.
This is an interesting way to address the problem that AI’s can’t be improved because they are black boxes.
Public Books - Justice for “Data Janitors”
The emergence of the digital microwork industry to tend artificial intelligence shows how labor displacement generates new kinds of work. As technology enterprises attempt to expand the scope of culture they mediate, they have had to grapple with new kinds of language, images, sounds, and sensor data. These are the kinds of data that flood Facebook, YouTube, and mobile phones—data that digital microworkers are then called on to process and classify. Such microworkers might support algorithms by generating “training data” to teach algorithms to pattern-match like a human in a certain domain. They might also simply process large volumes of cultural data to prepare it to be processed in other ways. These cultural data workers sit at computer terminals, transcribing small audio clips, putting unstructured text into structured database fields, and “content moderating” dick pics and beheadings out of your Facebook feed and Google advertisements.
Computers do not wield the cultural fluencies necessary to interpret this kind of material; but people do. This is the hidden labor that enables companies like Google to develop products around AI, machine learning, and big data. The New York Times calls this “janitor work,” labeling it the hurdle, rather than the enabling condition, of our big data futures. The second machine age doesn’t like to admit it needs help.
So maybe there will still be jobs in our post-AI future, they’ll just be the equivalent of this for desk work:

AlphaGo and Our Dystopian AI Future

Saturday, April 9, 2016

a cool thing to remember is that whenever someone says "A.I." what they're really talking about is "a computer program someone wrote"
— Allison Parrish | @aparrish@friend.camp (@aparrish) March 26, 2016

Mankind has been defeated in Go. This may turn out to be a big deal. I had been an AI skeptic before, but now I am worried that AI might be for real.

Lockhart told me that his “heart really sank” at the news of AlphaGo’s success. Go, he said, was supposed to be the “the one game computers can’t beat humans at. It’s the one.”

— New Yorker - In the Age of Google DeepMind, Do the Young Go Prodigies of Asia Have a Future?

The Paperwork Explosion

Thursday, March 31, 2016

Machines should work; people should think.

— The Paperwork Explosion, a trippy marketing film for IBM by Jim Henson

C.f. David Graeber - Of Flying Cars and the Declining Rate of Profit:

If we do not notice that we live in a bureaucratic society, that is because bureaucratic norms and practices have become so all-pervasive that we cannot see them, or, worse, cannot imagine doing things any other way.
Computers have played a crucial role in this narrowing of our social imaginations. Just as the invention of new forms of industrial automation in the eighteenth and nineteenth centuries had the paradoxical effect of turning more and more of the world’s population into full-time industrial workers, so has all the software designed to save us from administrative responsibilities turned us into part- or full-time administrators. In the same way that university professors seem to feel it is inevitable they will spend more of their time managing grants, so affluent housewives simply accept that they will spend weeks every year filling out forty-page online forms to get their children into grade schools. We all spend increasing amounts of time punching passwords into our phones to manage bank and credit accounts and learning how to perform jobs once performed by travel agents, brokers, and accountants.

Graeber further elaborates this theme in his book The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy. The book is really just a collection of essays and doesn’t totally hold together, but it’s worth a read.

Ideally, the amount of bureaucracy in the world pre- and post- computer should have been the same but completed more quickly in the computerized world. In reality, however, computers made it practical to centralize the management of things that had been handled informally before. Theoretically, this is good because one innovation in the center can be effortlessly distributed to the periphery, but this benefit comes with the Hayekian cost that the periphery is closer to the ground truth than the center, and there may not be sufficient institutional incentives to transmit the information to the center effectively. The result is a blockage that the center tries to solve by mandating that an ever increasing number of reports be sent to it: a paperwork explosion.

Share memory by communicating

Sunday, February 28, 2016

If you’ve learned about the Go programming language at all, you’ve probably come across the koan, “Don’t communicate by sharing memory; share memory by communicating.” It’s a snappy little bit of chiasmus, but what does it actually mean? The natural inclination is to say, “It means ‘channels good; mutexes bad.’ ” Certainly, that’s not too far off the mark as a first order approximation of its meaning. But it’s actually a bit deeper than that.

What Happens Next Will Amaze You

Friday, September 25, 2015

Source: idlewords.com