Python Packaging

YouTube embed

EuroPython 2015 - Less Known Packaging Features and Tricks (Slides)

This is a nice presentation because it presents strong opinions about how to do packaging in Python. In a way, what those opinions are do not matter. What matters is that someone has thought about the tradeoffs involved and come to a decision, so that you don’t have to spend more of your own time thinking about those tradeoffs.

Ionel Mărieș has also done a series of blog posts on packaging in Python, including The problem with packaging in Python:

In a way, packaging in Python is a victim of bad habits — complex conventions, and feature bloat. It’s hard to simplify things, because all the historical baggage people want to carry around. […] Concretely what I want is […] one way to do it. Not one clear way, cause we document the hell out of it, but one, and only one, way to do it.

Preach it.

Packaging is a hard problem in any language and made more so because all modern languages are expected to interface with C somehow or another and C has terrible build systems that always break unless you babysit them. That said, Python does a worse than average job at it:

  • Python itself is highly portable, but because so many Python packages rely on C extensions, in practice Python is as difficult to get installed as C due to DLL hell.
  • There are really only three places one would like to install a Python package: globally, user locally, or in a specific folder. Pip, however, is hostile to all but the choice of global installation, so users have to use virtual environments to get around this limitation.
  • Installation uses both files of pure metadata (MANIFEST.in) and files of arbitrary code (setup.py), giving one the disadvantages of both.
  • One wants soft requirements for libraries and hard requirements for apps, so there are two incompatible means of specifying requirements (requirements.txt and setup.py), but the hard requirements file uses the soft requirements list unless you remember to use --no-deps when running it.
  • &c. &c. &c.

More Thoughts on Machines That Think

Two quick links with follow-up on the last post:

  • Julia Evans - Looking inside machine learning black boxes

    I talked to someone at a conference a while ago who worked on automated trading systems, and we were talking about how machine learning approaches can be really scary because you fundamentally don’t know whether the ML is doing a thing because it’s smart and correct and better than you, or because there’s a bug in the data.

    He said that they don’t use machine learning in their production systems (they don’t trust it). But they DO use machine learning! Their approach was to

    • have experts hand-build a model
    • have the machine learning team train a model, and show it to the experts
    • the expert says “oh, yes, I see the model is doing something smart there! I will build that in to my hand-built system”

    I don’t know if the this is the best thing to do, but I thought it was very interesting.

    This is an interesting way to address the problem that AI’s can’t be improved because they are black boxes.

  • Public Books - Justice for “Data Janitors”

    The emergence of the digital microwork industry to tend artificial intelligence shows how labor displacement generates new kinds of work. As technology enterprises attempt to expand the scope of culture they mediate, they have had to grapple with new kinds of language, images, sounds, and sensor data. These are the kinds of data that flood Facebook, YouTube, and mobile phones—data that digital microworkers are then called on to process and classify. Such microworkers might support algorithms by generating “training data” to teach algorithms to pattern-match like a human in a certain domain. They might also simply process large volumes of cultural data to prepare it to be processed in other ways. These cultural data workers sit at computer terminals, transcribing small audio clips, putting unstructured text into structured database fields, and “content moderating” dick pics and beheadings out of your Facebook feed and Google advertisements.

    Computers do not wield the cultural fluencies necessary to interpret this kind of material; but people do. This is the hidden labor that enables companies like Google to develop products around AI, machine learning, and big data. The New York Times calls this “janitor work,” labeling it the hurdle, rather than the enabling condition, of our big data futures. The second machine age doesn’t like to admit it needs help.

    So maybe there will still be jobs in our post-AI future, they’ll just be the equivalent of this for desk work:

    YouTube embed

AlphaGo and Our Dystopian AI Future

Mankind has been defeated in Go. This may turn out to be a big deal. I had been an AI skeptic before, but now I am worried that AI might be for real.

Lockhart told me that his “heart really sank” at the news of AlphaGo’s success. Go, he said, was supposed to be the “the one game computers can’t beat humans at. It’s the one.”

New Yorker - In the Age of Google DeepMind, Do the Young Go Prodigies of Asia Have a Future?

Read more…

The Paperwork Explosion

YouTube embed

Machines should work; people should think.

The Paperwork Explosion, a trippy marketing film for IBM by Jim Henson

C.f. David Graeber - Of Flying Cars and the Declining Rate of Profit:

If we do not notice that we live in a bureaucratic society, that is because bureaucratic norms and practices have become so all-pervasive that we cannot see them, or, worse, cannot imagine doing things any other way.

Computers have played a crucial role in this narrowing of our social imaginations. Just as the invention of new forms of industrial automation in the eighteenth and nineteenth centuries had the paradoxical effect of turning more and more of the world’s population into full-time industrial workers, so has all the software designed to save us from administrative responsibilities turned us into part- or full-time administrators. In the same way that university professors seem to feel it is inevitable they will spend more of their time managing grants, so affluent housewives simply accept that they will spend weeks every year filling out forty-page online forms to get their children into grade schools. We all spend increasing amounts of time punching passwords into our phones to manage bank and credit accounts and learning how to perform jobs once performed by travel agents, brokers, and accountants.

Graeber further elaborates this theme in his book The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy. The book is really just a collection of essays and doesn’t totally hold together, but it’s worth a read.

Ideally, the amount of bureaucracy in the world pre- and post- computer should have been the same but completed more quickly in the computerized world. In reality, however, computers made it practical to centralize the management of things that had been handled informally before. Theoretically, this is good because one innovation in the center can be effortlessly distributed to the periphery, but this benefit comes with the Hayekian cost that the periphery is closer to the ground truth than the center, and there may not be sufficient institutional incentives to transmit the information to the center effectively. The result is a blockage that the center tries to solve by mandating that an ever increasing number of reports be sent to it: a paperwork explosion.

Share memory by communicating

If you’ve learned about the Go programming language at all, you’ve probably come across the koan, “Don’t communicate by sharing memory; share memory by communicating.” It’s a snappy little bit of chiasmus, but what does it actually mean? The natural inclination is to say, “It means ‘channels good; mutexes bad.’ ” Certainly, that’s not too far off the mark as a first order approximation of its meaning. But it’s actually a bit deeper than that.

Read more…

What Happens Next Will Amaze You

Maciej Cegłowski - What Happens Next Will Amaze You

Another great presentation by Maciej Cegłowski. This one is interesting because he has six concrete legal proposals for the internet:

  1. Right To Download
  2. Right To Delete
  3. Limits on Behavioral Data Collection
  4. Right to Go Offline
  5. Ban on Third-Party Advertising
  6. Privacy Promises

I think these ideas are great, and politicians should start trying to implement them in law.

(A seventh proposal that is only needed in the US is that sales tax law should be uniform for stores online, since they no longer need the weird, special tax break due to unenforceable collection rules that they have for some reason.)

Also worth thing about is his section on the importance of not giving up hope:

It’s easy to get really depressed at all this. It’s important that we not let ourselves lose heart.

If you’re over a certain age, you’ll remember what it was like when every place in the world was full of cigarette smoke. Airplanes, cafes, trains, private offices, bars, even your doctor’s waiting room—it all smelled like an ashtray. Today we live in a world where you can go for weeks without smelling a cigarette if you don’t care to.

The people in 1973 were no more happy to live in that smoky world than we would be, but changing it seemed unachievable. Big Tobacco was a seemingly invincible opponent. Getting the right to breathe clean air back required a combination of social pressure, legal action, activism, regulation, and patience.

It took a long time to establish that environmental smoke exposure was harmful, and even longer to translate this into law and policy. We had to believe in our capacity to make these changes happen for a long time before we could enjoy the results.

I use this analogy because the harmful aspects of surveillance have a long gestation period, just like the harmful effects of smoking, and reformers face the same kind of well-funded resistance. That doesn’t mean we can’t win. But it does mean we have to fight.

Pessimism is a kind of a luxury enjoyed by those who know that they won’t be hurt as deeply by the entrenchment of the unacceptable status quo. Let’s not give up on the internet yet.

Source: idlewords.com

How to scrape an old PHP (or whatever) site with wget for use in Nginx

If you’re like me, in your youth you once made websites with PHP that have uncool URLs like /index.php?seemed-like-a-good-idea=at-the-time. Well, time has passed and now you want to stop using Apache, MySQL, and PHP on your LAMP server, but you also don’t want to just drop your old website entirely off the face of the internet. How can you migrate your old pages to Nginx?

The simple solution is to use wget. It’s easy to install on pretty much any platform. (On OS X, try installing it with homebrew.) But there are a few subtleties to using it. You want to keep your ugly old URLs with ? in them working, even though you don’t want them to be dynamically created from a database any more. You also want to make sure Nginx serves your pages with the proper mime-type of text/html because if the mime-type is set incorrectly, browsers will end up downloading your pages instead of displaying them.

Here’s what to do.

First, use FTP or whatever to copy the existing site onto your local machine. (These are old sites, right? So you don’t have version control do you? 😓) This step is to ensure you have all the images, CSS files, and other assets that were no doubt haphazardly scattered throughout your old site.

Next, go through and delete all the *.php files and any hidden .whatever files, so Nginx doesn’t end up accidentally serving up a file that contains your old Yahoo mail password from ten years ago or something because it seemed like a good idea to save your password in plaintext at the time.

Now, cd into the same directory as your copy of the files on the server and use this command with wget to add scraped copies of your dynamic pages:

    wget \
         --recursive \
         --no-clobber \
         --page-requisites \
         --html-extension \
         --domains example.com \
         --no-parent \
             www.example.com/foobar/index.php/

Here’s what the flags mean:

  • --recursive is so you scrape all the pages that can be reached from the page you start with.
  • --no-clobber means you won’t replace the files you just fetched off the server.
  • --page-requisites is somewhat redundant, but it will fetch any asset files you may have missed in your copy from the server.
  • --html-extension is a bit of a wrinkle: it saves all the files it fetches with a .html extension. This is so that Nginx will know to serve your pages with the correct mimetype.
  • --domains example.com and --no-parent are so you only scrape a portion of the site that you want to scrape. In this case, the root of example.com would be left alone. Your case may be different.
  • The final argument is the address of the page to start fetching from.

wget will save these pages with two wrinkles that you’ll need to tell Nginx about. First, as mentioned, Nginx needs to know to ignore the .html on the end of the file names. Second, you’ll need to be able to serve up URLs with ? in the file name. To do both of those things, in Nginx, add this directive to the server block for your new thingie try_files $uri $uri/index.html $request_uri.html =404;. try_files tells Nginx to try multiple files when serving a URL in the order specified. $uri is the plain URL (e.g. for your CSS/JS/image assets), $uri/index.html serves up index pages, which wget will create whenever a URL ends in a slash. $request_uri.html serves up files including ? in the middle with a final .html as was appended by wget.

Here’s a minimally complete Nginx configuration example:

    http {

        server {
            server_name www.example.com;
            server_name  example.com;

            listen 80;

            root /path/to/sites/example;
            error_page 404 403 /404.html;
            try_files $uri $uri/index.html $request_uri.html =404;
        }
    }

See the Nginx HTTP server boilerplate configs project for a complete example. (Note that this example assumes you have a 404.html to serve up for missing pages.)

Programming is pure logic

Programming is pure logic. It won’t work if there’s an inconsistency in the logic. The errors aren’t produced inside the system. They are produced from the outside. If the system doesn’t work, it’s definitely your fault.

The funny thing is that every programmer thinks his logic will work when they finish coding a program. It never does, but at that moment, everyone believes there’s no error in the logic they have written, and confidently hits the “Enter” key.

— Satoru Iwata in Hobo Mainichi with Shigesato Itoi and Shigeru Miyamoto

Source: 1101.com

What Do We Save When We Save the Internet?

Think about regret as if it were sin. Some regrets are mild, but acute. The regret associated with choosing the wrong supermarket checkout lane, or buying an outfit that you notice goes on sale the next week—these seem woeful. They chafe, but their pains are pin pricks that soon subside. These are venial regrets.

Regret is more severe when it steeps in sorrow rather than in misadventure, when it becomes chronic—mortal rather than venial. But counter-intuitively, mortal regrets are less noticeable than venial ones, because they burn slow and low instead of hot and fast: the regret of overwork and its deleterious effects on family. The regret of sloth in the face of opportunity. The regret of acquiescence to one’s own temperament.

Mortal regrets are tender, and touched inadvertently they explode with affective shrapnel. Venial regrets shout, “alas!” but mortal regrets whisper, “if only.”

Ian Bogost - What Do We Save When We Save the Internet?

Source: The Atlantic