Creating Domain Specific Error Helpers in Go With errors.As

Friday, August 28, 2020

The other day, I was reading a website about some historical documents, when I saw that it had an error message on top:

Screenshot of a WordPress site with an error message — Screenshot from the site

Some quick searching online for the error message revealed that it was caused by a mismatch between the site’s versions of PHP and WordPress. Older versions of WordPress had a bug in the switch statement of a certain localization component, and later versions of PHP dump a warning about this bug out to the end user HTML. When I came back to the site a few days later, it had been fixed.

The ultimate reason for my seeing for the error message as a random reader is that PHP has too many ways to deal with errors:

Builtin PHP functions, and therefore any PHP project, have a whole range of error handling mechanisms — errors, warnings, returning error values, and exceptions. At every point, calling code needs to know which system will be used to handle errors.

PHP often chooses to send warnings like this right out to the end user because it doesn’t trust operators to actually read their Apache logs. Such a practice would be very out of place in the Go programming language.

In Go, errors are values. While it’s true that Go has both errors and panics, so that Go is theoretically the same as Java with checked and unchecked exceptions, there are also important differences.

Because errors are normal values instead of a special form of control flow, they have the same flexibility (and inflexibility) as other values. As Rob Pike explains,

Values can be programmed, and since errors are values, errors can be programmed.

Second, because the error interface is pervasively used instead of concrete error types, it is possible to change underlying error implementations without having to change the types returned all the way up a call chain. With checked exceptions, if today my function can only throw a FileException, then tomorrow I cannot start throwing a URLException without breaking any callers depending on my function having only one possible exception type. (The lead architect of C# cited this problem as one of his reasons for not adding checked exception to that language.) In Go, the use of the simple error interface everywhere prevents callers from being too dependent on the exact type of errors that a function returns.

Go is a statically typed language, but the pervasive use of the error interface allows for runtime dynamic type introspection. The dynamic nature of errors can lead to problems if misused, but overall, has allowed a number of community experiments in annotating errors, culminating in the inclusion of the errors.As function in the Go 1.13 standard library in September 2019. The docs for errors.As explain:

func As(err error, target interface{}) bool
As finds the first error in err’s chain that matches target, and if so, sets target to that error value and returns true. Otherwise, it returns false.
The chain consists of err itself followed by the sequence of errors obtained by repeatedly calling Unwrap.

And it provides an example:

if _, err := os.Open("non-existing"); err != nil {
    var pathError *os.PathError
    if errors.As(err, &pathError) {
        fmt.Println("Failed at path:", pathError.Path)
    } else {
        fmt.Println(err)
    }
}

So while a particular value may be statically typed as error, it may also contain a more useful type dynamically available in its Unwrap chain for consumers to programmatically introspect. This gives Go the ability to create inheritance trees for errors without the baggage of an actual classical object inheritance system.

I have created two libraries for working errors that use errors.As since it was announced last year, called exitcode and resperr. It may be useful for me to explain the philosophy behind them and how to use them here, since I think they could inspire similar projects in other domains.

First, let me explain package exitcode. When you run a process in a Unix-like system, it has an “exit code”. Zero indicates that the program ran successfully, and any other code indicates a failure. There have been various attempts to standardize general purpose exit codes, but none have stuck. Most programs either use only 0 and 1 or they have custom set of codes. For example, curl defines 25 as “upload failed” and 47 as “too many redirects” and so forth on up to 96 “QUIC connection error”.

The exitcode package is a simple library to help you write a CLI in Go that returns a proper exit code. Of course, the simplest helper would just be function that returns 0 if error is nil and 1 if it is non-nil, but we can do more than that, thanks to errors.As.

Package exitcode documents a Coder interface extension to error:

type Coder interface {
    error
    ExitCode() int
}

This lets you define an error type and provide a custom exit code to associate with your error. exitcode.Get is defined to return 0 for nil, return 1 for unknown errors, and use errors.As to search through the Unwrap chain of errors for anything defining a Coder. If it finds one, it returns that custom return value.

To make it more convenient, package exitcode also has a helper function called exitcode.Set(error, int) error which wraps an error in an unexported Coder implementation, so that you can easily set a custom exit code without having to define your own custom error type.

So, for example, if you were rewriting curl in Go, you might create an http.Client with a CheckRedirect policy that returns exitcode.Set(err, 47) if it sees that a request has been redirected too many times. Other error handlers in the chain between the redirect checker and the one line main function can just pass the error along without being aware that it has a custom exit code associated with it. At the top level the CLI can bottom out in a call to exitcode.Exit(error), which is a convenience function for os.Exit(exitcode.Get(error)).

Package exitcode is a simple example of what is possible by treating errors as dynamically typed values, but package resperr takes it further. To understand the thinking behind resperr, I first need to talk about a blog post called Failure is your Domain by Ben Johnson (no relation). The post builds on Rob Pike’s Error handling in Upspin by documenting a philosophy for dealing with errors. Johnson writes that

The tricky part about errors is that they need to be different things to different consumers of them. In any given system, we have at least 3 consumer roles—the application, the end user, & the operator.

The article is worth reading in full to think about how these roles interact, but suffice it to say, towards that goal, Johnson defines a struct containing different fields meant for the application, end user, and operator:

// Error defines a standard application error.
type Error struct {
    // Machine-readable error code.
    Code    string

    // Human-readable message.
    Message string

    // Logical operation and nested error.
    Op      string
    Err     error
}

Core to Johnson’s proposal is a set of application error codes, which he argues ought to be worked out for the specific domain of an application. In his case, they look like this:

// Application error codes.
const (
    ECONFLICT   = "conflict"   // action cannot be performed
    EINTERNAL   = "internal"   // internal error
    EINVALID    = "invalid"    // validation failed
    ENOTFOUND   = "not_found"  // entity does not exist
)

Johnson’s article predates errors.As, so in it he explains how to dig through an error chain manually to retrieve machine codes and user messages from error interface values, instead of relying on the existence of the errors.As mechanism.

One last quote from the article:

Error handling is a critical piece of your application design and is complicated by the variety of different consumer roles that require error information. By considering error codes, error messages, and logical stack traces in our design we can fulfill the needs of each consumer. By integrating our Error into our domain, we give all parts of our application a common language to communicate about when unexpected things happen.

Failure to think clearly about the separate roles of the application, the end user, and the operator in dealing with errors is exactly what led old PHP applications to dump potentially dangerous error messages about database failures or application bugs out to the final HTML to end users instead of logging them for operators. Those systems were built without thinking about the difference between the information a operator needs to debug an overloaded server and the information needed by a website reader (or a website attacker!).

Johnson’s article was very influential on my thinking as I was building a web application with a Go HTTP JSON backend. As I worked on it over a series of months, I realized two things: first that my failure domain just was the set of HTTP status codes, and second that in a majority of cases (but not quite all), my user message was a restatement of the status code. I wrote package resperr with these realizations in mind.

Package resperr defines two interfaces to extend errors: one for HTTP status codes and another for user messages.

type StatusCoder interface {
    error
    StatusCode() int
}

type UserMessenger interface {
    error
    UserMessage() string
}

This is similar to package exitcode with its Coder interface, but an important difference is the relationship between the two interfaces. The HTTP status codes have default user messages associated with them already, which are the “reason phrases” of RFC 7231. Go provides the http.StatusText(int) string function to look up the status text from a status code. Putting these together, the docstring for resperr.UserMessage(error) string looks like this:

UserMessage returns the user message associated with an error. If no message is found, it checks StatusCode and returns that message. Because the default status is 500, the default message is "Internal Server Error". If err is nil, it returns "".

Finally, let’s look at a short demonstration of how resperr could be used to write an HTTP JSON API.

First, we need to write a short helper function to send errors to our logging system for capture while also returning them to end users:

func replyError(w http.ResponseWriter, r *http.Request, err error) {
    logError(w, r, err)
    code := resperr.StatusCode(err)
    msg := resperr.UserMessage(err)
    replyJSON(w, r, code, struct {
        Status  int    `json:"status"`
        Message string `json:"message"`
    }{
        code,
        msg,
    })
}

Then in our handler, we call the helper any time something goes wrong:

func myHandler(w http.ResponseWriter, r *http.Request) {
    // ... check user permissions...
    if err := hasPermissions(r); err != nil {
        replyError(w, r, err)
        return
    }
    // ...validate request...
    n, err := getItemNoFromRequest(r)
    if err != nil {
        replyError(w, r, err)
        return
    }
    // ...get the data ...
    item, err := getItemByNumber(n)
    if err != nil {
        replyError(w, r, err)
        return
    }
    replyJSON(w, r, http.StatusOK, item)
}

In the functions that the handler is calling, we can set appropriate errors, like a 404 Not Found for item not found while falling back to 500 Internal Server Error for unexpected errors:

func getItemByNumber(n int) (item *Item, err error) {
    item, err := dbCall("...")
    if err == sql.ErrNoRows {
        // this is an anticipated 404
        return nil, resperr.New(
            http.StatusNotFound,
            "%d not found", n)
    }
    if err != nil {
        // this is an unexpected 500!
        return nil, err
    }
    // ...
}

Similarly, hasPermissions can return 403 Forbidden and getItemNoFromRequest can return 400 Bad Request as needed. But for 400 Bad Request, we may want a more extensive user message:

func getItemNoFromRequest(r *http.Request) (int, error) {
    ns := r.URL.Query().Get("n")
    if ns == "" {
        return 0, resperr.WithUserMessage(
            resperr.New(
                http.StatusBadRequest,
                "missing ?n= in query"),
            "Please enter a number.")
    }
    n, err := strconv.Atoi(ns)
    if err != nil {
        return 0, resperr.WithCodeAndMessage(
            err, http.StatusBadRequest,
            "Input is not a number.")
    }
    return n, nil
}

In real code, getItemNoFromRequest would probably just be part of the handler unless multiple routes needed the same query handling.

So that’s how I made package exitcode and package resperr, but the great thing about this pattern is it’s widely applicable. You could make your own package (perhaps in an afternoon?) for gRPC errors, FTP errors, STMP errors, LDAP errors, CORBA errors, or even SOAP errors. If your application has its own set of error conditions, you could make up custom error codes just for your application, as Upspin does and Ben Johnson recommends.

One thing that package resperr doesn’t handle yet are redirects because I’ve just been using it for a JSON API. Someone using it for a traditional server side rendered HTML web application might want to add that functionality.

The key is that errors.As makes it easy to create error systems that work for your particular applications, users, and operators without being straitjacketed by the language into a one-size-fits-all approach that inadvertently exposes users to the internal operations of your system. Don’t let your end users be distracted by irrelevant warning messages. Handle errors properly by thinking about their roles and domain within your application.

The Ethically-Trained Programmer