How to scrape an old PHP (or whatever) site with wget for use in Nginx

Wednesday, September 9, 2015

If you’re like me, in your youth you once made websites with PHP that have uncool URLs like /index.php?seemed-like-a-good-idea=at-the-time. Well, time has passed and now you want to stop using Apache, MySQL, and PHP on your LAMP server, but you also don’t want to just drop your old website entirely off the face of the internet. How can you migrate your old pages to Nginx?

The simple solution is to use wget. It’s easy to install on pretty much any platform. (On OS X, try installing it with homebrew.) But there are a few subtleties to using it. You want to keep your ugly old URLs with ? in them working, even though you don’t want them to be dynamically created from a database any more. You also want to make sure Nginx serves your pages with the proper mime-type of text/html because if the mime-type is set incorrectly, browsers will end up downloading your pages instead of displaying them.

Here’s what to do.

First, use FTP or whatever to copy the existing site onto your local machine. (These are old sites, right? So you don’t have version control do you? 😓) This step is to ensure you have all the images, CSS files, and other assets that were no doubt haphazardly scattered throughout your old site.

Next, go through and delete all the *.php files and any hidden .whatever files, so Nginx doesn’t end up accidentally serving up a file that contains your old Yahoo mail password from ten years ago or something because it seemed like a good idea to save your password in plaintext at the time.

Now, cd into the same directory as your copy of the files on the server and use this command with wget to add scraped copies of your dynamic pages:

    wget \
         --recursive \
         --no-clobber \
         --page-requisites \
         --html-extension \
         --domains example.com \
         --no-parent \
             www.example.com/foobar/index.php/

Here’s what the flags mean:

--recursive is so you scrape all the pages that can be reached from the page you start with.
--no-clobber means you won’t replace the files you just fetched off the server.
--page-requisites is somewhat redundant, but it will fetch any asset files you may have missed in your copy from the server.
--html-extension is a bit of a wrinkle: it saves all the files it fetches with a .html extension. This is so that Nginx will know to serve your pages with the correct mimetype.
--domains example.com and --no-parent are so you only scrape a portion of the site that you want to scrape. In this case, the root of example.com would be left alone. Your case may be different.
The final argument is the address of the page to start fetching from.

wget will save these pages with two wrinkles that you’ll need to tell Nginx about. First, as mentioned, Nginx needs to know to ignore the .html on the end of the file names. Second, you’ll need to be able to serve up URLs with ? in the file name. To do both of those things, in Nginx, add this directive to the server block for your new thingie try_files $uri $uri/index.html $request_uri.html =404;. try_files tells Nginx to try multiple files when serving a URL in the order specified. $uri is the plain URL (e.g. for your CSS/JS/image assets), $uri/index.html serves up index pages, which wget will create whenever a URL ends in a slash. $request_uri.html serves up files including ? in the middle with a final .html as was appended by wget.

Here’s a minimally complete Nginx configuration example:

    http {

        server {
            server_name www.example.com;
            server_name  example.com;

            listen 80;

            root /path/to/sites/example;
            error_page 404 403 /404.html;
            try_files $uri $uri/index.html $request_uri.html =404;
        }
    }

See the Nginx HTTP server boilerplate configs project for a complete example. (Note that this example assumes you have a 404.html to serve up for missing pages.)

Programming is pure logic

Tuesday, July 14, 2015

Programming is pure logic. It won’t work if there’s an inconsistency in the logic. The errors aren’t produced inside the system. They are produced from the outside. If the system doesn’t work, it’s definitely your fault.
The funny thing is that every programmer thinks his logic will work when they finish coding a program. It never does, but at that moment, everyone believes there’s no error in the logic they have written, and confidently hits the “Enter” key.

— Satoru Iwata in Hobo Mainichi with Shigesato Itoi and Shigeru Miyamoto

Source: 1101.com

Ocarina of Time Speedrun

Wednesday, October 1, 2014

Cosmo - Zelda: Ocarina of Time Speedrun

I usually don’t go much in for speed runs of Ocarina of Time, since they’re so glitch based, but this one does a really great job of explaining how the glitches work and the science behind their discovery, so it’s really entertaining.

Source: youtube.com

What Do We Save When We Save the Internet?

Monday, May 26, 2014

Think about regret as if it were sin. Some regrets are mild, but acute. The regret associated with choosing the wrong supermarket checkout lane, or buying an outfit that you notice goes on sale the next week—these seem woeful. They chafe, but their pains are pin pricks that soon subside. These are venial regrets.
Regret is more severe when it steeps in sorrow rather than in misadventure, when it becomes chronic—mortal rather than venial. But counter-intuitively, mortal regrets are less noticeable than venial ones, because they burn slow and low instead of hot and fast: the regret of overwork and its deleterious effects on family. The regret of sloth in the face of opportunity. The regret of acquiescence to one’s own temperament.
Mortal regrets are tender, and touched inadvertently they explode with affective shrapnel. Venial regrets shout, “alas!” but mortal regrets whisper, “if only.”

— Ian Bogost - What Do We Save When We Save the Internet?

Source: The Atlantic

On using Go channels like Python generators

Sunday, January 5, 2014

I really like Go, and I really like Python, so it’s natural that I apply patterns I’ve learned in Python to Go… even when it’s maybe not appropriate.

For example, in Python, it’s common to use generators to process a stream of data. Take this simple (unrealistic, inefficient) odd number generator:

>>> def odds(from_, to):
...     for n in range(from_, to):
...         if n % 2:
...             yield n
... 
>>> print(*odds(0, 10), sep=", ")
1, 3, 5, 7, 9

How would you do something like that in Go? The natural approach is to use channels. Go’s channels are a language primitive that allows you to send values from one light-weight green-ish thread (“go-routine”) to another. Many other languages would make these into methods in a standard library threading package, but since they’re language primitives in Go, they get used more often, even in simple programs.

Here is an incorrect first attempt at simulating generators using channels:

package main

import "fmt"

func Odds(from, to int, c chan int) {
    if from%2 == 0 {
        from += 1
    }
    for i := from; i < to; i += 2 {
        c <- i
    }
}

func main() {
    c := make(chan int)
    go Odds(0, 10, c)
    for v := range c {
        fmt.Println(v)
    }
}

This program ends with fatal error: all goroutines are asleep - deadlock! because the loop in for v := range c never quits. It never quits because the channel is never closed. An even worse version of this would be one where go Odds(0, 10, c) was just Odds(0, 10, c). That ends with a deadlock before even printing anything.

What is a non-terrible version of this code? We should move more of the responsibility into the Odds function. The Odds function should be in charge of creating the channel, spawning a go-routine, and closing the channel when the go-routine is done, so that the caller can be freed of that responsibility.

Here’s what that looks like:

package main

import "fmt"

func Odds(from, to int) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            c <- i
        }
    }()
    return c
}

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
    }
}

There are a couple of thing to notice here. First, the caller doesn’t need to use the go statement because the Odds function now has an anonymous function inside of it that it uses to spawn a go-routine. (This use of an anonymous function looks a bit like Javascript to me.)

Second, the caller doesn’t need to create its own channel to give to our pseudo-generator. Not only that, since the type of the channel is <-chan int, it’s perfectly clear that the channel is only for receiving ints, not for sending, and this is enforced at compile time.

This leads to the most important point: that it doesn’t deadlock at the end! That’s because the channel gets closed. Generally speaking, in Go the one doing the sending on a channel should also be the one to do the closing of the channel when it’s done sending. (Double closing a channel causes a runtime panic.) Since the Odds function returns a receive-only channel, the compiler actually enforces that the close cannot be done in the main function.

But notice how it gets closed, with defer close(c). In Python, the with statement is used for resource management things like remembering to close files and sockets. In Go, defer is used to ensure that a certain command happens whenever a certain function ends (whether in a normal return or an exception-like panic). As a convention, it’s a good idea to defer the close of something as soon as it’s applicable.

So far so good, but what about this scenario:

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

The channel returned by Odds wasn’t exhausted, so it was never closed. Since the function returning was main, this isn’t a big deal in this case, but this is actually a somewhat subtle bug that can affect otherwise correct-looking code. It’s a memory leak, or more specifically a go-routine leak. Python and Go are both garbage collected, but there’s a subtle difference here. In Python if I write gen = odds(0, 10) and then next(gen) the resulting generator will be cleaned up shortly after gen goes out of scope. Similarly, in Go the channel created by c := Odds(0, 10) will be garbage collected once all references to it go away. Go-routines, however, only go “out of scope” when they return or panic, since it can have side-effects independent of whether anything in the main thread has a reference to it. (E.g.)

The standard way to prevent go-routine leaks is through the use of “quit channels.” However, this looks like it’s just going to re-introduce a lot of the complexity that we got rid from the first/incorrect example. We need to make up a new channel to pass around, figure out what value to send as a quit signal, etc.

A quit channel is more complicated than just letting a go-routine leak or run to exhaustion, but there are a few tricks we can take advantage of to make it less annoying:

package main

import "fmt"

func Odds(from, to int, quit <-chan struct{}) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            select {
            case <-quit:
                return
            case c <- i:
            }
        }
    }()
    return c
}

func main() {
    quit := make(chan struct{})
    defer close(quit)
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

First of all, what is <-chan struct{}? type struct {} is a type for hanging methods off of with no associated data. chan struct{} is a channel that passes no data. You could also write chan bool but that has two potential values, true or false. struct{} lets your readers know, “There’s no data being passed here.” The only reason for the channel to exist is to synchronize things. No other information is being sent. The arrow in front of <-chan struct{} further specifies that the caller is the one who will be in charge of the quit channel, not the function.

Next, select (not to be confused with switch, which also has cases) is a statement in Go that lets you send or receive from multiple channels at once. If multiple channels are ready simultaneously, the runtime will pseudo-randomly select just one to use, otherwise the first channel to be ready will fire. In this case, it means “either send the value of i on the output channel or return if the quit channel has something.”

When you close a channel, from then on anyone listening to that channel receives the blank value for that type (0 for int, "" for string, etc.) Instead of using close, you could also send an empty struct:

func main() {
    quit := make(chan struct{})
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            quit <- struct{}{}
        }
    }
}

close(quit) is less ugly than quit <- struct{}{}, but the real reason I think it’s better to use close gets at my next couple of points.

Why didn’t I put the make for the quit channel into Odds like I did for the chan int in the second example? It certainly could be done, but fundamentally a caller knows better than the function itself when the go-routine should quit, so the caller should have the responsibility. This lets us do shortcuts like this:

func main() {
    for v := range Odds(0, 10, nil) {
        fmt.Println(v)
    }
}

Since the caller knew the channel would be drained to exhaustion, it didn’t need to make a quit channel at all. In Go, nil channel never sends, but there’s nothing illegal or panic-inducing about listening to a nil channel, so Odds works just fine.

Furthermore, we can reuse the same quit channel for multiple functions. If we were sending the quit signal with quit <- struct{}{}, we’d have to send one message for each go-routine that reuses the same quit channel and needs to be closed. However because closed channels are always “sending” their blank value, we can stop all of the go-routines that are listening to the same quit channel by just doing close(quit) once.

Put it all together and this approach allows for more advanced (and Python generator-like) usages of channels. The second Project Euler problem is “By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.” Here is an over-engineered solution using Go’s channels:

package main

import "fmt"

func sendOrFail(n int64, out chan<- int64, quit <-chan struct{}) bool {
    select {
    case out <- n:
        return true
    case <-quit:
        return false
    }
}

func FibGen(quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        var last, current int64 = 1, 2

        if ok := sendOrFail(last, out, quit); !ok {
            return
        }
        for {
            if ok := sendOrFail(current, out, quit); !ok {
                return
            }
            last, current = current, last+current
        }
    }()
    return out
}

func Evens(in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n%2 == 0 {
                if ok := sendOrFail(n, out, quit); !ok {
                    return
                }
            }
        }
    }()
    return out
}

func TakeUntil(max int64, in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n > max {
                return
            }
            if ok := sendOrFail(n, out, quit); !ok {
                return
            }
        }
    }()
    return out
}

func main() {
    var total int64

    quit := make(chan struct{})
    defer close(quit)
    for n := range Evens(TakeUntil(4000000, FibGen(quit), quit), quit) {
        total += n
    }
    fmt.Println("Total:", total)
}

Compare to similar Python code:

>>> def fibgen():
...     last, current = 1, 2
...     yield last
...     while True:
...         yield current
...         last, current = current, last + current
... 
>>> def take_until(max, gen):
...    for n in gen:
...       if n > max:
...           return
...       yield n
... 
>>> def evens(gen):
...    for n in gen:
...       if n%2 == 0:
...           yield n
... 
>>> sum(evens(take_until(4000000, fibgen())))
4613732

The Go version isn’t as simple as a basic approach using Python generators. It’s 73 lines versus only 20 lines of Python. (15 lines just for all of those closing }.) Still, Go has certain advantages, like being concurrent and type safe. In real code, you could imagine making and processing multiple network calls concurrently using techniques like this. The lowest level generator opens a socket to get some data from somewhere on the internet, then it pipes it results out to a chain of listeners that transform the data. In a case like that, Go’s easy concurrency could be an important advantage over Python.

The Future of Programming

Wednesday, July 31, 2013

Bret Victor - The Future of Programming

“Hey Steve, it’s Marvin. Your cousin, Marvin Wozniak. You know that new information paradigm you’ve been looking for? Well listen to this!”

Via GvR

Source: vimeo.com

In theory, it’s simple.

Tuesday, April 30, 2013

Source: apelad.blogspot.com

Toward what end, toward what end?

Friday, February 22, 2013

Every reader should ask himself periodically “Toward what end, toward what end?” – but do not ask it too often lest you pass up the fun of programming for the constipation of bittersweet philosophy.

— Harold Abelson and Gerald Jay Sussman with Julie Sussman, Structure and Interpretation of Computer Program (via Peter Hosey)

Source: boredzo.org

Google Go: The Good, the Bad, and the Meh

Monday, February 11, 2013

“Go is not meant to innovate programming theory. It’s meant to innovate programming practice.” — Samuel Tesla

Normally, I just blog about… uh, stuff?… but I feel like writing about the Go programming language (or “Golang”) today, so instead today’s topic is computer stuff. For the record, the language I’ve programmed the most in has been Python, so that’s the perspective I’m analyzing it from.

The Good

One of the best things about Go is that it was clearly designed for working on large projects in a team with a modern version control system.

For example, one of the “controversial"† aspects of Go is that it uses case to specify visibility. Instead of having a system of public, private, and shared class members, visibility is at the level of the package. Lowercase names like var a are visible only from the current package, but uppercase names like var A are publicly visible, and so on for types, functions, struct members, methods, etc. It is blisteringly obvious in hindsight that enforcing visibility at the package level instead of the class level is the right way to do it. Who cares if the other class poking around in your class’s internals is your subclass or whatever? The question is whether it’s part of the same package, in which case, you can be expected to keep the classes in sync since you wrote them both. If it’s in a different package, then the other person wants to know what’s in the public facing "API” of the class. In Python something similar is done with the convention of appending _ to the front of names that are meant to be private, but Go’s convention is simpler and easier. It does take a little bit of time getting used to the idea that in name.Name the first thing is definitely a package name and the second thing may or may not be a type (unlike Python, where classes are usually the only thing with CamelCase names), but once you adjust, it’s fine and leads to better API design. When you write something, if you want someone importing your package to use it, use uppercase. If you want them to leave it alone, lowercase. Done.

† Controversial with [Internet controversialists]. People who complain about this are complainers. Features like this are useful because they let you know who likes to complain about things for no good reason. Go has a lot of features like this. [Edit: Based on comments, I have changed the word “idiots” to “Internet controversialists” in reflection if the fact that you can be a smart person and still love debating the color of the bike shed.]

Another example of Go’s modernity is the decision to make the directory the fundamental unit of packaging. While you can just compile and run an individual file using go run filename.go, Go encourages you to set the environmental variable $GOPATH and then put your projects into $GOPATH/src/projectname/. When you run go install project, it will compile that directory and put the executable into $GOPATH/bin/. Why is this useful? It leads to a couple of nice advantages. First, since it’s done by directory, you can make each package you write its own git (or whatever) repository. Second, if your project file gets too long, you can break it up into multiple files that automatically get stitched into one at compile time. Not a big deal, but nice. Third, there’s no need for something like Python’s virtualenv. If you need to have different versions of the same library installed simultaneously for two different projects, just switch between two different $GOPATHs. Fourth, there’s no question of Python’s relative import headaches. Imports are always either rooted in $GOPATH/src/ or in the built-ins ($GOROOT/src/pkg/).

Fifth, this leads to the convention of using domain names as directory names. So, if you have a github account, you’d store your local stuff in $GOPATH/src/github.com/username/project/. Now, the genius of this is that the go tool has the ability to download from most (all?) modern VCS systems. So, if you run go get github.com/userA/project/ it will use git to download the project from github.com and put it into the right place. Even better, if that project contains the line import "bitbucket.com/userB/library", the go tool can also download and install that library in the right place. So, in one stroke, Go has its own elegant solution to packaging and a modern, decentralized version of CPAN or PyPI: the VCS you were already using!

Deployment in Go is very easy too, since everything compiles down to a single binary. It’s practical to cross-compile things, so that you don’t have to install anything Go-related on your production server except the one binary you compiled on your development machine. That greatly simplifies the issue of library management.

The go tool in general can’t be praised enough. It compiles quickly (sorry XKCD fans), and it comes with a couple of awesome features. First, go fmt. Gofmt is also “controversial” with [Internet controversialists]. It’s great. It automatically makes your code line up with the Go standard. Some people complain that the standard doesn’t put the braces where they want them. Those people are why we need gofmt. Gofmt basically does for Go what syntactic whitespace does for Python: it makes sure that the average code you run into is written to be readable.

Second, go doc. Godoc autogenerates documentation based on the comments in your code. Since Go is a compiled language, I tend to treat it as a replacement for using help(whatever) at the REPL in Python. It works well. It only tells you about exported things (things with Uppercase names), but that’s all you should need to know if the API works. It has a couple of modes for giving you the docs, the most helpful of which is a local webserver that gives you the docs for everything you have installed in your current $GOPATH.

Third, go test. It’s a built in testing and benchmarking framework. It’s not mind blowing, but it works.

At this point, I suppose I should say something about the Go language itself. What I like about Go is that it is very readable. Let’s say I want to declare a pointer to a variable length array (what Python calls a list and Go calls a slice) of pointers to FooType objects:

var foo *[]*FooType

It reads very simply, left-to-right: “Variable ‘foo’ is a pointer to a 'slice’ of pointers to FooType objects.” From the perspective of languages in the C-Java continuum, this is backwards, since the type follows the variable name, but once you get used to it it makes much more sense than vice versa. Functions also read very naturally:

func Foo(x int, y *Bar) error { ... }

“Function foo takes x, an int, and y, a pointer to a Bar, and returns an error (or a nil of type error).”

Of course, the two big pains of static typing are that sometimes you have to write out the type stuff even when it’s obvious what it has to be and that sometimes you just can’t easily do what you want to do. Fortunately, Go has basic type inference, so the first problem is not as big of problem. The := operator is used when you want the compiler to just figure out the type stuff for itself. So, in err := Foo(1, &Bar{}), the compiler knows that I want err to be of type error, since that’s the type that function Foo returns. Similarly, the second problem can be routed around by using the universal type interface {} and the runtime reflection tools. Throw in a broad enough interface and the compiler will usually just shut up and let you shoot yourself in the foot if you really need to. (But see also my complaint about generics in the “Bad” section below.)

Go also has a simpler solution to the whole nonlocal headache in Python. Since Python doesn’t have a way to declare a variable apart from giving it a value with =, there’s a question of what you want to do in the case of a closure: how do you know which variables to initialize and which not? The Python solution of nonlocal is inelegant, but Go’s closures are straightforward. Adding := would be a great simplification for Python 4000, but based on what I’ve read in the mailing lists, it’ll never happen.

Here’s a complete Go solution to Paul Graham’s Accumulator Generator Challenge:

package main

import "fmt"

func accgen(n int) func(int) int {
    return func(i int) int {
        n += i
        return n
    }
}

func main() {
    f := accgen(0)
    fmt.Println(f(1), f(1), f(1)) //Prints 1, 2, 3
}

The static typing adds a little noise, but it makes sense if you read it out loud: “Function accgen takes an int n and returns a function that takes an int and returns an int.” It can be confusing, but no more so than the concept of an accumulator in the first place.

Incidentally, another nice feature of Go is the Go Playground linked to above. It’s a combination of pastebin and a sandbox, and it helps ease the pain of not having a REPL. If you wonder, “Will this work in Go?” you can write a quick proof of concept on the Playground, run it (on Google’s server, which means it works on an iPad or whatever), then press “Format” to pretty it up and “Share” to generate a link to it.

In general, I really like the attitude that Go’s designers have towards the type system. For example, they straightened out the confusing mess of longs and shorts that had accumulated over thirty years with straightforward names: int8, int16, int32, and int64. There are also unsigned variants (uint8, etc.). By itself, int is 32-bits on most systems now, but they plan to change it to be 64-bits in future versions. Go also makes it easy to use the type system to do rudimentary sanity checks. For example, you can easily define a type like this:

type celsius float64

And then you can write your functions so that they only take celsius and not fahrenheit temperatures.

Or if you want a simple options bit-flag, you can do something like this:

package main

import "fmt"

type BFlag uint8

const (
    Grapes BFlag = 1 << iota
    Apples
    Bananas
    Mangos
)

func OrderBreakfast(flag BFlag) {
    if flag&Apples != 0 {
        fmt.Println("Apples it is!")
    } else {
        fmt.Println("All we had was apples.")
    }
}

func main() {
    f := Grapes | Apples | Bananas
    OrderBreakfast(f)
}

iota is a semi-magical variable for use in constant declarations that goes up by one each time. Since the other fruits are in the same constant declaration block but don’t have anything declared, Go automatically gives them the same type and assignment as the first constant (BFlag and 1 << iota), so Grapes is 0001, Apples is 0010, Bananas is 0100, and Mangos is 1000. The bit flag is simple enough, but since BFlag is its own type, you don’t have worry about someone accidentally sending a number to OrderBreakfast that’s just the result of adding up the tabs or something.

The designers of Go agree with the collective experience of the last twenty years of programming that there are three basic data types a modern language needs to provide as built-ins: Unicode strings, variable length arrays (called “slices” in Go), and hash tables (called “maps”). Languages that don’t provide those types at a syntax level cannot be called modern anymore. (And what’s up with all the languages that claim all you need are linked lists? I’m sorry, this is not 1958, and you are not John McCarthy.) Go strings are UTF-8 because Go was designed by the guys who invented UTF-8, so why not?

The designers of Go also seem to have come to the conclusion that object oriented programming had some good ideas but was mostly overblown. So, they don’t provide inheritance. (!) But before you freak out, however, they do provide embedding and interfaces, which make it possible to do most of the stuff you would want to do with inheritance anyway. I think at this point, it’s well accepted that multiple inheritance was mostly a headache for little payoff, so Go just takes that insight to the next level. If you just want to copy the behavior of a class with a few overrides, it’s pretty simple.

Here’s an example that illustrates what I mean:

package main

import "fmt"

type FooBaz interface {
    Foo()
    Baz()
}

First I declare the FooBaz interface. Something is a FooBaz if it can Foo() and Baz(). Note that this is basically statically compiled duck-typing. All you need to do is have these methods and you’re in. There’s no need to declare that you implement FooBaz beyond just implementing it.

type A struct{}

func (a A) Foo() {
    fmt.Println("I'm an A! Foo.")
}

func (a A) Baz() {
    fmt.Println("I'm an A! Baz.")
}

Next I declare the A type. Just for simplicity, I make it an empty struct. Then I give A two methods, Foo and Baz. The way you add a method to a type is like a function declaration, except that before the function name (method name), you put the type in parentheses as well as a name for the receiver (like this or self in other languages). Unlike Python’s convention of always using self, idiomatic Go doesn’t always use the same receiver name, but instead the receiver name tends to be based on the type’s name. In this case, my receiver name is a, but I never use it (and type A is an empty struct anyway, so I can’t modify its instances), so it doesn’t really matter. Because methods are treated basically like functions with the privilege of overloading the type of the first argument (Go doesn’t allow other forms of overloading), Go lets you put method names wherever you want, as long as its in the same package as the type its the method of.

type B struct {
    A
}

func (b B) Baz() {
    fmt.Println("I'm a B! Baz.")
    fmt.Println("I'm also an A!?")
    b.A.Baz()
}

Here’s where it starts to get interesting. B is a struct with an A embedded in it. That means when you call a B object with .Foo that call gets passed into the A that the B is holding, since there is no B.Foo method. On the other hand, since there is a B.Baz method, it calls that when you do .Baz() on the object. The Baz method then calls the A in itself (like a superclass call in inheritance languages). Of course, this isn’t required, but I just do it to show that you can.

func FooAndBaz(f FooBaz) {
    fmt.Println("Calling!")
    f.Foo()
    f.Baz()
}

func main() {
    var a A
    var b B
    FooAndBaz(a)
    FooAndBaz(b)
}

Finally, we have a function that takes anything that fulfills the FooBaz interface, so I send it a and b which are of types A and B. The final output is:

Calling!
I'm an A! Foo.
I'm an A! Baz.
Calling!
I'm an A! Foo.
I'm a B! Baz.
I'm also an A!?
I'm an A! Baz.

This doesn’t have all the complexity and nuances of a true inheritance system… And that’s a good thing! If you’re just trying to override a couple of methods, it’s easy to do and easy to understand. If you want the nuance and complexity of a traditional OO inheritance system, you can specify exactly what you want it to do yourself.

Interfaces are one of the key features of Go. As I said before, it’s basically compile time duck typing. As long as you have walk and quack methods with the right signatures, good enough.

The final good thing to mention about Go is the concurrency. It works like you thought concurrency should work before you learned about concurrency. It’s nice. You don’t have to think about it too much. If you want to run a function concurrently, just do go function(arguments). If you want to communicate with the function, you can use channels, which are synchronous by default, meaning they pause execution until both sides are ready.

A simple example:

package main

import "fmt"

func printer(ch chan int, done chan struct{}) {
    for i := range ch {
        fmt.Println(i)
    }
    done <- struct{}{}
}

func main() {
    ch := make(chan int)
    done := make(chan struct{})
    go printer(ch, done)
    ch <- 1
    ch <- 2
    ch <- 3
    close(ch)
    <-done
}

The printer function prints the values it receives from a channel of ints. The main function spawns the printer as its own goroutine (green thread, basically) then sends it some work. There’s some nuances to this (for example, in this case, 2 won’t finish sending until the printer is ready for it), but again, it basically works like you think it should. It makes it fairly simple to get things go concurrently and/or in parallel without ripping your hair out. Since execution halts when the main function is complete, I threw in the done channel as a quick and dirty way to make sure that everything gets printed before the program ends. You could also use a sync.WaitGroup in a more realistic example.

The Bad

Those are the things I like about Go. What do I dislike?

This is just a small thing, but for some reason, Go doesn’t suppose the use of a #! on the starting line, which means you can’t just use uncompiled files as utility scripts.

In general, I think that Go tends not to be very DRY. Go is meant to make performance issues pretty obvious, so it doesn’t have a lot of sugar for things. Where this especially shows up is in the lack of generics. Idiomatic Go code has a couple of different tricks for using interfaces so that you don’t need generics… but sometimes you just have to bite the bullet and copy-and-paste a function three times so you have three differently typed versions of it. If you’re trying to make your own data structure, you can just use interface {} as a universal object type, but then you lose compile time type safety. If you want to make a universal sum function, on the other hand, there’s no really good way to do it. Probably the easiest thing is to add a header like type MyNum int32 at the top, write your functions to take MyNums, and then if you later decide to use a float instead, at least you don’t have to rewrite the type of all of your functions, but that’s not as convenient as real generics.

Along the same lines, Go can be a little too low level in spots. For example, while it has a string type and all of the same ways of manipulating strings as Python, in Go (like in really old versions of Python), these are functions in the strings package instead of just being methods on string objects. I guess they do this for performance/memory reasons,‡ but it seems chintzy to me.

‡ People sometimes complain that a basic “Hello World!” program in Go is too large (like 2MB or something, IIRC), but that’s because it statically compiles in the fmt package, which has Unicode support. If all strings required the equivalent of the strings package, Hello World would be even bigger.

Finally, Go is still a new language, which sometimes means that library you need hasn’t been implemented yet or it’s still too immature for practical use. Fortunately, this problem is rapidly solving itself.

The Meh

What am I passionately neutral about in Go?

Some people get very worked up about how Go “doesn’t have exceptions.” First of all, this is untrue. Go has panic, which works almost exactly like an exception. However, idiomatic Go code isn’t supposed to use panic for expected exceptions. Instead you’re supposed to return an error-type object along with your normal return type using multiple return values. In practice, Go’s error handling system is basically the same as Java’s. panic is like an unchecked exception: divide by zero in the wrong place and the whole stack blows up on you. error objects are like checked exceptions: if something you call returns an error you either have to ignore the error entirely (bad), do something about it (good), or pass the buck to someone else (neutral). The only difference is that these response are respectively written as

//Ignore
value, _ = package.MightReturnAnError()

//Deal with it
value, err = package.MightReturnAnError()
if err != nil {
    do something 
    ...
}

//Pass the buck
value, err = package.MightReturnAnError()
if err != nil { return nil, err }

instead of being written in terms of try and catch or except. I do wish the compiler was a little stricter though. It’s perfectly legal to write

fmt.Println("Hello")

even though Println returns an integer and an error. I sort of wish they made you explicitly silence the error by writing _, _ = fmt.Println("Hello") instead. Then again, maybe that would be a pain? On top of that, since basically everything returns errors of the interface type error, it can sometimes be a little hard to figure out exactly which concrete errors something will return. For example, the documentation on fmt.Println says “It returns the number of bytes written and any write error encountered.” OK, but which exact errors should I be prepared to deal with?

That brings me to my next bit of neutralness: the Go compiler has no warnings, only errors. Now, obviously, this is The Right Thing to Do. We all know that a compiler warning ought to be treated like an error, and ignoring warnings is wrong, and it’s better for the compiler to just force you to fix it than for it to enable you to do the wrong thing. …But sometimes you really want to do the wrong thing anyway. The other day there was a big stink online about how the Go compiler won’t let you have any unused variables or unused imports. Technically speaking, this isn’t true. If you just throw in some _, the compiler will shut up and let you import stuff you don’t use or create variables and never use them. On the other hand, it is kind of annoying having to shut up the compiler when you’re just hacking things out. On the third hand though, the temptation to do things the wrong way is exactly why the compiler shouldn’t let you do things the wrong way “just for now.” So, in the end (on my fourth hand), I’m pretty neutral about it.

Finally, Go doesn’t have operator overloading, function/method overloading, or keyword arguments. You can see why—those features are bad for performance and they lead to “clever” code that does bad things—but you also sometimes miss them. It’s a tradeoff, and I see why they made the choice they did, but I also see the value of the other choice.

Conclusion

Go is rad. It’s not my everyday language (that’s still Python), but it’s definitely a fun language to use, and one that would be great for doing big projects in. If you’re interested in learning about Go, I recommend doing the tour then reading the spec while you put test programs into the Playground. The spec is very short and quite readable, so it’s great way to learn Go.

Iwata Asks - Producers of the Wii U Miiverse

Tuesday, November 13, 2012

Mizuki: When you hear the phrase "real life social graph," you might think it includes all real human relationships, but if you really think about it, real life doesn't only include actual acquaintances. That's because you also interact with strangers a lot in the real world.
Kondo: Yes. Also—especially on America's social networks—I feel like there is an idea that you should have one personality. But the reality is different.
Mizuki: Yeah, that's right!
Iwata: An American friend of mine said, "On the Internet, people are always watching, so you have to be consistent about what kind of a person you are, so it's difficult to let loose even in private time". In that respect, we all live in a visible world because of how common social networks have become. While that world may be convenient, it can also be constraining in ways that didn't exist before.
Mizuki: I understand. You punch in bland comments like "Happy Birthday!" without a thought. (laughs)
Kondo: Right, right! (laughs) But I think that such variety—having a self who interacts with friends and a self within video games—is good.
Iwata: I suppose a lot of people hit the OFF switch when they don't have to interact with real life acquaintances and assume different personalities.
Mizuki: For that reason, there is a trend among social networks that you must use your real name. The Miiverse, however, doesn't specify that. You can simply display your Mii nickname.
Iwata: While it's extremely rare, communities can break down because of rude people taking advantage of anonymity. Using real names is a way to put the brakes on that, but if you are tied together by empathy, I feel like real names become less important. I think it is important to be tied together by shared sentiment that way.
Mizuki: Yes.

Source: iwataasks.nintendo.com