Error Handling in Elixir Libraries

There was an interesting discussion yesterday on the Elixir Slack about how libraries should handle errors. This is a more thought-through and elaborate expression on my views on the matter. In the post, I’ll present an idealised version of how I think a public API for functions that may produce errors should look like.

Errors vs Exceptions

In most applications we can distinguish two kinds of situations where some error occurs:

actionable errors - i.e. expected errors. When those happen we need to handle them gracefully, so that our application can continue working correctly. A good example of this is invalid data provided by the user or data not present in the database. In general we want to handle this kind of errors by using tuple return values ({:ok, value} | {:error, reason}) - the consumer can match on the value and act on the result.
fatal errors - errors that place our application in undefined state - violate some system invariant, or make us reach a point of no return in some other way. A good example of this is a required config file not present or a network failing in the middle of transmitting a packet. With that kind of errors we usually want to crash and lean on the supervision system to recover - we handle all errors of this kind in a unified way. That’s exactly what the let it crash philosophy is about - it’s not about letting our application burn in case of any errors, but avoiding excessive code for handling errors that can’t be handled gracefully. You can read more about what the “let it crash” means in the excellent article by Fred Hebert of the “Learn You Some Erlang for Great Good” fame - “The Zen of Erlang”.

Now comes the tough realisation - when we’re writing libraries, we often don’t know if something is an actionable or a fatal error - it’s not up to our library to decide, but up to the consumer. We could see this with the absent config file example - an absent file is hardly a fatal error, but an absent config file definitely is. Therefore, we need to design a flexible API that allows the user of the library to handle the errors the way they need to. The rest of the article focuses exactly on that.

Ok/Error tuples

We already mentioned those. They are extremely common in both the Elixir and Erlang libraries - and there’s a reason for that! They are, in a way, an elixir version of the Either or Result types present in many statically-typed languages. I will dare to say that functions returning ok/error tuples should be the main interface of libraries. Why?

It’s easy to pattern match on them in a case expression and act accordingly in case the reported error is of the “actionable” type.
It’s equally easy to convert the tuple return value into a crash if the reported error is, for our application, of the “fatal” type: e.g.

{:ok, value} = YourLibrary.may_fail(foo, bar, baz)

An important thing to consider here, is to not only make the entire return value easily matchable, but to apply the same rule to the error reasons. It might happen that one kind of error may be “actionable”, while another one is “fatal”. An easy way to achieve that is to return atoms or tuples with atoms and variables instead of returning strings. A string is the least structured type one can imagine, and it’s generally very hard to do something with it. Because of that, instead of returning {:error, "unknown value: :foo"} one would prefer to return {:error, {:unknown, :foo}}.

A common approach to solve the problem of returning easily-matchable errors and providing nice messages at the same time, that is found in Erlang libraries is for the module to export an additional format_error/1 function, where we can pass the reason from the {:error, reason} tuple and format it as a nice string. Continuing with the previous example, we would implement a function:

def format_error({:unknown, value}),
  do: "unknown value: #{inspect value}"

Deeply recursive situations

I can almost hear some people ready to explode with things like: “Ok/error tuples are tedious in recursive functions” or “The constant wrapping and unwrapping is slow!”. And indeed, I fully agree that the wrapping and unwrapping can become tedious and that there are situations, especially in tight loops, where this could have a significant impact on performance.

Fortunately, Elixir has a feature that can help us here - catch and throw. We can wrap our call in a catch expression and use throw from our deeply nested code to signal errors. This allows us to gain the convenience the non-local error reporting gives, while maintaining the nice ok/error tuple interface in our public functions.

Let’s see a practical example - Ecto.UUID.dump/1 function:

def dump(<< a1, a2, a3, a4, a5, a6, a7, a8, ?-,
            b1, b2, b3, b4, ?-,
            c1, c2, c3, c4, ?-,
            d1, d2, d3, d4, ?-,
            e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12 >>) do
  try do
    << d(a1)::4, d(a2)::4, d(a3)::4, d(a4)::4,
       d(a5)::4, d(a6)::4, d(a7)::4, d(a8)::4,
       d(b1)::4, d(b2)::4, d(b3)::4, d(b4)::4,
       d(c1)::4, d(c2)::4, d(c3)::4, d(c4)::4,
       d(d1)::4, d(d2)::4, d(d3)::4, d(d4)::4,
       d(e1)::4, d(e2)::4, d(e3)::4, d(e4)::4,
       d(e5)::4, d(e6)::4, d(e7)::4, d(e8)::4,
       d(e9)::4, d(e10)::4, d(e11)::4, d(e12)::4 >>
  catch
    :error -> :error
  else
    binary ->
      {:ok, binary}
  end
end
def dump(_), do: :error

defp d(?0), do: 0
# many more cases
defp d(_),  do: throw(:error)

Each of the d/1 calls could fail - handling this through wrapping and unwrapping tuples would be a nightmare - using throws gives us an elegant and efficient solution. Beware! When using throws, it’s important to be specific in the throws we’re catching and usually contain them locally, within one module. This allows avoiding accidental errors and lowers the complexity of understanding non-local returns.

Bang functions

Have you noticed, that until this moment, I have mentioned neither exceptions nor the bang functions (foo! vs foo)? That’s because, I consider them to be an “additional” API a library can offer. They replace pattern matching on the return value and MatchErrors that would be raised in case a {:ok, value} = ... match, that I proposed couple paragraphs above, fails.

Provided we already defined the format_error/1 function mentioned above adding a “bang” version of a function in our public interface is fairly easy:

def foo!(arg1, arg2) do
  case foo(arg1, arg2) do
    {:ok, value} -> value
    {:error, reason} -> raise format_error(reason)
  end
end

We could also consider defining a custom exception struct for our library and instead use raise YourLibrary.Error, format_error(reason), or pass the raw reason to the exception struct and format it only in the exception’s message/1 callback. This keeps the exception struct easily pattern-matchable, should such a need arise.

defmodule YourLibrary.Error do
  defexception [:reason]

  def exception(reason),
    do: %__MODULE__{reason: reason}

  def message(%__MODULE__{reason: reason}),
    do: YourLibrary.format_error(reason)
end

Of course that kind of unwrapping of the errors in every function is rather repetitive - I agree with that. But something you need to consider, when devising solutions for this repetition - is the code you’re writing right now is going to be read or written more often? Especially in a library, the code will be read much more often, than it will be written - and some repetition in things like that, while a bit more laborious to write, can make the code much easier to understand when reading.

Ok/error unwrapping macro

So, you still want to avoid that duplication? Let’s see what is, in my opinion, the least bad way to do this. It’s important to use a macro here, instead of a function - here’s why. The unwrapping function is often called in a tail position. Because of Erlang’s tail call optimisation, this call would remove the current function from the stacktrace, so when we reach the real raise - the stacktrace wouldn’t include the function, where the real error actually happened (!) - only the unwrapping function. That’s not really helpful for the user. Using a macro solves that issue.

defmacrop unwrap_or_raise(call) do
  quote do
    case unquote(call) do
      {:ok, value} -> value
      {:error, reason} -> raise YourLibrary.Error, reason
    end
  end
end

# we now can implement foo! in terms of bangify and foo
def foo!(arg1, arg2) do
  unwrap_or_raise foo(arg1, arg2)
end

This still leaves a bit of repetition that could be further removed with some more complex macros, but I think this strikes a good balance between clear and not an extremely verbose code.

Composing ok/error tuples in with

One issue you may notice with the ok/error tuples and format_error function is: What happens in with pipelines, when you combine functions from different modules? How to decide, which format_error function to call?

That’s a valid concern, though for many use cases, I would say that handling this in the end-user code by wrapping groups of library functions is acceptable. Nonetheless, if you’re concerned about this, there are basically two paths you could take:

return errors in the shape {:error, {__MODULE__, reason}} - this tags the error with the module name, where you can find the formatting function. This is the approach taken, for example, by yecc, leex and rebar3.
return exception structs instead of errors. For example, in the code above, we would return {:error, YourLibrary.Error.exception(reason)} and later use the exception struct to raise, so our unwrapping function would have a clause: {:error, exception} -> raise exception In this case the responsibility of the formatting function is taken over by the message/1 callback of the exception module. This is the approach taken, for example, by postgrex and db_connection.

Is this needed, or which approach is better? I’m not sure there’s one good answer that fits all. I leave this decision to you.

Full example

Let’s see how an example library, following all the rules specified above, would look like:

defmodule YourLibrary do
  defmodule Error do
    defexception [:reason]

    def exception(reason),
      do: %__MODULE__{reason: reason}

    def message(%__MODULE__{reason: reason}),
      do: YourLibrary.format_error(reason)
  end

  def get_one(1),
    do: {:ok, "one"}
  def get_one(value),
    do: {:error, {:not_one, value}}

  def get_one!(arg) do
    case get_one(arg) do
      {:ok, value} -> value
      {:error, reason} -> raise Error, reason
    end
  end

  def format_error({:not_one, value}),
    do: "#{inspect value} is not one"
end

Of course this silly example omits type-specs and documentation, both of which are required for a good library, but I think it serves as a good example of the ideas provided in this post.

Comment from Andrea Leopardi

Posted on February 10, 2017

Out of the options that Michał outlined, I think the most flexible and Elixir-y way of handling errors is returning {:error, exception} tuples. This has multiple benefits in my opinion.

The returned exception can be formatted in a uniform way with Exception.message/1, whatever library this exception comes from. This eliminates the need for various __MODULE__ “hacks” in order to know which format_error/1 function to call.

It plays great with with: in a with pipeline, you can either have just one else clause that matches {:error, exception} and treat exception uniformly (for example, Logger.error(Exception.message(exception))), or handle errors from each library specifically.

with {:ok, tokens} <- Redix.command(:redix, ~w(SMEMBERS tokens)),
     {:ok, _} <- Postgrex.execute(:pg, "SELECT * FROM users", []) do
  :ok
else
  {:error, %struct{} = exception} when struct in [Redix.Error, Redix.Connection.Error] ->
    Logger.error "Redis error: #{Exception.message(exception)}"
    :error
  {:error, %Postgrex.Error{} = exception} ->
    Logger.error "Postgres error: #{Exception.message(exception)}"
    :error
end

Since exceptions are just structs, they can still be documented publicly alongside their fields. This way, you can still have something like a :reason field in your exception that allows users of your library to pattern match on specific reasons (for example, %Redix.Connection.Error{reason: :timeout}) like Michał mentioned in the article.

All in all, this pattern requires a bit more typing than the {:error, atom} approach, but it has a few advantages, like a uniform interface and disambiguation (think of {:error, :timeout} in the with example above, would it come from Redis or Postgres?).