On using Go channels like Python generators

Sunday, January 5, 2014

I really like Go, and I really like Python, so it’s natural that I apply patterns I’ve learned in Python to Go… even when it’s maybe not appropriate.

For example, in Python, it’s common to use generators to process a stream of data. Take this simple (unrealistic, inefficient) odd number generator:

>>> def odds(from_, to):
...     for n in range(from_, to):
...         if n % 2:
...             yield n
... 
>>> print(*odds(0, 10), sep=", ")
1, 3, 5, 7, 9

How would you do something like that in Go? The natural approach is to use channels. Go’s channels are a language primitive that allows you to send values from one light-weight green-ish thread (“go-routine”) to another. Many other languages would make these into methods in a standard library threading package, but since they’re language primitives in Go, they get used more often, even in simple programs.

Here is an incorrect first attempt at simulating generators using channels:

package main

import "fmt"

func Odds(from, to int, c chan int) {
    if from%2 == 0 {
        from += 1
    }
    for i := from; i < to; i += 2 {
        c <- i
    }
}

func main() {
    c := make(chan int)
    go Odds(0, 10, c)
    for v := range c {
        fmt.Println(v)
    }
}

This program ends with fatal error: all goroutines are asleep - deadlock! because the loop in for v := range c never quits. It never quits because the channel is never closed. An even worse version of this would be one where go Odds(0, 10, c) was just Odds(0, 10, c). That ends with a deadlock before even printing anything.

What is a non-terrible version of this code? We should move more of the responsibility into the Odds function. The Odds function should be in charge of creating the channel, spawning a go-routine, and closing the channel when the go-routine is done, so that the caller can be freed of that responsibility.

Here’s what that looks like:

package main

import "fmt"

func Odds(from, to int) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            c <- i
        }
    }()
    return c
}

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
    }
}

There are a couple of thing to notice here. First, the caller doesn’t need to use the go statement because the Odds function now has an anonymous function inside of it that it uses to spawn a go-routine. (This use of an anonymous function looks a bit like Javascript to me.)

Second, the caller doesn’t need to create its own channel to give to our pseudo-generator. Not only that, since the type of the channel is <-chan int, it’s perfectly clear that the channel is only for receiving ints, not for sending, and this is enforced at compile time.

This leads to the most important point: that it doesn’t deadlock at the end! That’s because the channel gets closed. Generally speaking, in Go the one doing the sending on a channel should also be the one to do the closing of the channel when it’s done sending. (Double closing a channel causes a runtime panic.) Since the Odds function returns a receive-only channel, the compiler actually enforces that the close cannot be done in the main function.

But notice how it gets closed, with defer close(c). In Python, the with statement is used for resource management things like remembering to close files and sockets. In Go, defer is used to ensure that a certain command happens whenever a certain function ends (whether in a normal return or an exception-like panic). As a convention, it’s a good idea to defer the close of something as soon as it’s applicable.

So far so good, but what about this scenario:

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

The channel returned by Odds wasn’t exhausted, so it was never closed. Since the function returning was main, this isn’t a big deal in this case, but this is actually a somewhat subtle bug that can affect otherwise correct-looking code. It’s a memory leak, or more specifically a go-routine leak. Python and Go are both garbage collected, but there’s a subtle difference here. In Python if I write gen = odds(0, 10) and then next(gen) the resulting generator will be cleaned up shortly after gen goes out of scope. Similarly, in Go the channel created by c := Odds(0, 10) will be garbage collected once all references to it go away. Go-routines, however, only go “out of scope” when they return or panic, since it can have side-effects independent of whether anything in the main thread has a reference to it. (E.g.)

The standard way to prevent go-routine leaks is through the use of “quit channels.” However, this looks like it’s just going to re-introduce a lot of the complexity that we got rid from the first/incorrect example. We need to make up a new channel to pass around, figure out what value to send as a quit signal, etc.

A quit channel is more complicated than just letting a go-routine leak or run to exhaustion, but there are a few tricks we can take advantage of to make it less annoying:

package main

import "fmt"

func Odds(from, to int, quit <-chan struct{}) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            select {
            case <-quit:
                return
            case c <- i:
            }
        }
    }()
    return c
}

func main() {
    quit := make(chan struct{})
    defer close(quit)
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

First of all, what is <-chan struct{}? type struct {} is a type for hanging methods off of with no associated data. chan struct{} is a channel that passes no data. You could also write chan bool but that has two potential values, true or false. struct{} lets your readers know, “There’s no data being passed here.” The only reason for the channel to exist is to synchronize things. No other information is being sent. The arrow in front of <-chan struct{} further specifies that the caller is the one who will be in charge of the quit channel, not the function.

Next, select (not to be confused with switch, which also has cases) is a statement in Go that lets you send or receive from multiple channels at once. If multiple channels are ready simultaneously, the runtime will pseudo-randomly select just one to use, otherwise the first channel to be ready will fire. In this case, it means “either send the value of i on the output channel or return if the quit channel has something.”

When you close a channel, from then on anyone listening to that channel receives the blank value for that type (0 for int, "" for string, etc.) Instead of using close, you could also send an empty struct:

func main() {
    quit := make(chan struct{})
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            quit <- struct{}{}
        }
    }
}

close(quit) is less ugly than quit <- struct{}{}, but the real reason I think it’s better to use close gets at my next couple of points.

Why didn’t I put the make for the quit channel into Odds like I did for the chan int in the second example? It certainly could be done, but fundamentally a caller knows better than the function itself when the go-routine should quit, so the caller should have the responsibility. This lets us do shortcuts like this:

func main() {
    for v := range Odds(0, 10, nil) {
        fmt.Println(v)
    }
}

Since the caller knew the channel would be drained to exhaustion, it didn’t need to make a quit channel at all. In Go, nil channel never sends, but there’s nothing illegal or panic-inducing about listening to a nil channel, so Odds works just fine.

Furthermore, we can reuse the same quit channel for multiple functions. If we were sending the quit signal with quit <- struct{}{}, we’d have to send one message for each go-routine that reuses the same quit channel and needs to be closed. However because closed channels are always “sending” their blank value, we can stop all of the go-routines that are listening to the same quit channel by just doing close(quit) once.

Put it all together and this approach allows for more advanced (and Python generator-like) usages of channels. The second Project Euler problem is “By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.” Here is an over-engineered solution using Go’s channels:

package main

import "fmt"

func sendOrFail(n int64, out chan<- int64, quit <-chan struct{}) bool {
    select {
    case out <- n:
        return true
    case <-quit:
        return false
    }
}

func FibGen(quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        var last, current int64 = 1, 2

        if ok := sendOrFail(last, out, quit); !ok {
            return
        }
        for {
            if ok := sendOrFail(current, out, quit); !ok {
                return
            }
            last, current = current, last+current
        }
    }()
    return out
}

func Evens(in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n%2 == 0 {
                if ok := sendOrFail(n, out, quit); !ok {
                    return
                }
            }
        }
    }()
    return out
}

func TakeUntil(max int64, in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n > max {
                return
            }
            if ok := sendOrFail(n, out, quit); !ok {
                return
            }
        }
    }()
    return out
}

func main() {
    var total int64

    quit := make(chan struct{})
    defer close(quit)
    for n := range Evens(TakeUntil(4000000, FibGen(quit), quit), quit) {
        total += n
    }
    fmt.Println("Total:", total)
}

Compare to similar Python code:

>>> def fibgen():
...     last, current = 1, 2
...     yield last
...     while True:
...         yield current
...         last, current = current, last + current
... 
>>> def take_until(max, gen):
...    for n in gen:
...       if n > max:
...           return
...       yield n
... 
>>> def evens(gen):
...    for n in gen:
...       if n%2 == 0:
...           yield n
... 
>>> sum(evens(take_until(4000000, fibgen())))
4613732

The Go version isn’t as simple as a basic approach using Python generators. It’s 73 lines versus only 20 lines of Python. (15 lines just for all of those closing }.) Still, Go has certain advantages, like being concurrent and type safe. In real code, you could imagine making and processing multiple network calls concurrently using techniques like this. The lowest level generator opens a socket to get some data from somewhere on the internet, then it pipes it results out to a chain of listeners that transform the data. In a case like that, Go’s easy concurrency could be an important advantage over Python.

The Ethically-Trained Programmer