My OTP 21 Highlights

OTP 21 is out! 🎉 In this post I’m going to list things that, I think, will matter the most for Elixir users.

Faster compiler

The compiler is about 10-20% faster. There are lots of factors that contribute to that result - the BEAM emulator is faster itself, the compiler received some performance improvements and the file system was completely overhauled to use NIFs and dirty schedulers instead of port drivers.

The results? For example, compiling ecto with all dependencies from a clean directory (just after rm -rf _build) takes about 10 seconds on OTP 21, while it takes about 12 seconds on OTP 20.3. This might not seem like much, but it’s a noticeable difference - especially on even bigger projects when running tests - after all the test modules are compiled each time they are executed. Taking ecto as example one more time, on OTP 20.3 the test suite completes in 6.5s and on OTP 21.0 in 5.6s.

New callback in GenServer: handle_continue/2

If you’ve written some GenServers, I’m pretty sure you were faced with the “lazy initialization” problem. A situation where you want to return from the init/1 callback to unblock start_link/1 and allow the parent supervisor so start following children, yet you still wanted to execute some, potentially long-running, code before accepting the first message. Using a 0 timeout or sending yourself a message were common solutions to this problem, but they are both prone to race conditions where some other message arrives first (which is very easy for a named server that is restarting).

With the new handle_continue/2 callback the solution is very clean and simple:

defmodule Server do
  use GenServer

  def init(arg) do
    state = execute_sync_initialization(arg)
    {:ok, state, {:continue, :init}}
  end

  def handle_continue(:init, state) do
    state = execute_async_initialization(state)
    {:noreply, state}
  end
end

This will do just what you wanted from the beginning - execute the async initialization code before accepting any message, but allow parent supervisor to continue execution. No race conditions possible - “it just works” ™️!

Optimised creation of small maps

We use maps a lot in our Elixir programs. This means we also create a lot of maps.

The Erlang compiler and runtime optimise “literals” to reside in constant memory - creating such data is basically only a pointer assignment. Additionally this data is excluded from garbage collection and is not copied when sent to other processes in messages. Unfortunately, for maps, this only works when all the keys and values are known at compile-time - a relatively rare occurrence.

Early this year I contributed an optimisation to the compiler, that enhanced this optimisation for maps. The optimisation took advantage of the fact that small maps (under 32) keys are effectively implemented as two tuples sorted in key order - one for keys and one for values. This means that if we know all the keys at compile-time (and this is very frequent and true for all structs), we can allocate the “keys” tuple as a literal and take advantage of all the optimisations literals benefit from in general. This means structs (and all maps in general) created with literal %Foo{....} constructors will need only about half the memory and those maps will be much faster to create.

As a benchmark, let’s use the following program:

defmodule Test do
  fields = for i <- 1..15, do: :"field#{i}"
  defstruct fields

  value = quote(do: value)
  field_values = Enum.zip(fields, Stream.cycle([value]))

  def new(unquote(value)) do
    %__MODULE__{unquote_splicing(field_values)}
  end
end

Benchee.run(%{"new" => fn -> Test.new(1) end}, time: 30)

On OTP 20.3, on my machine1 I get 2.8M iterations per second, while on OTP 21.0 I get 3.5M iterations per second. This obviously, also incorporates some other optimisation work in this release, but I’m convinced the bulk of the speed-up comes from this one. Additionally, when we use :erts_debug.size_shared/1 to measure memory (accounting for structural sharing resulting from the optimised key literals), we notice that a map constructed with Test.new/1 is much smaller:

:erts_debug.size_shared(Test.new(1))

on OTP 20.3 it returns 36 words and on OTP 21 it returns just 19 words (on a 64 bit machine a word is 8 bytes).

This might also mean some other indirect performance gains - for example when comparing structs, the keys tuple will be pointer-compared skipping the entire comparison routine.

The following benchmark:

map1 = Test.new(1)
map2 = Test.new(1)
Benchee.run(%{"compare" => fn -> map1 == map2 end}, memory_time: 0.1, time: 30)

On OTP 20.3 results in 4.3M iterations per second, while on OTP 21.0 I get 5.5M iterations - over a 25% speed-up in such a basic operation!

New guard functions for maps

Another contribution of mine, that I managed to squeeze just before code-freeze for the OTP 21 release is introducing two new guard functions - map_get/2 and is_map_key/2. Their functionality is equivalent to :maps.get/2 and :maps.is_key/2 respectively. But because they are allowed in guards, they enable one of the long requested features - a is_struct/2 guard function. Bear in mind, we’re not yet exactly sure, how those will be exposed in elixir, but it should allow for code at least similar to this:

def process(pattern)
    when is_binary(pattern) or is_struct(pattern, Regex) do
  do_something_with_pattern(pattern)
end

This does not give new capabilities (since it was always possible to do such matching in function bodies), but it should make elixir more expressive.

Those functions are also available in ETS match specs, which means it’s now possible to extract just one field from a map stored in an ETS table, for example:

iex> table = :ets.new(:test, [])
iex> :ets.insert(table, {:key, %{foo: 1, bar: 2}})
iex> ms = [{ {:key, :"$1"}, [], [{:map_get, :"$1", :foo}]}]
iex> :ets.select(table, ms)
[1]

This should help in optimising some programs that store large maps in ETS tables and need just one field - they no longer need to copy the entire map into the process just to extract this field.

New file implementation and enhanced IO scalability

As I mentioned already before in the compiler section, the file system received a rewrite and uses NIFs and dirty schedulers instead of port drivers. This allowed dramatically increasing performance in some situations - even as much as 3 times in some situations.

Additionally, the way that VM polls for new IO events (for example a TCP socket signalling it has data to read) was overhauled as well. This should have positive impact on large, concurrent, servers. It would be interesting to see some benchmarks of Phoenix applications comparing performance on OTP 20 and 21 - I haven’t seen any so far.

And many more…

That’s just some of the changes that I’ve picked up - but there there are many more that will improve our ecosystem. To learn more about this latest release, see full release notes. I also recommend taking a look at the post on the OTP team’s blog by Lukas Larsson talking about the highlights from the perspective of the OTP team.

And there’s already some awesome stuff scheduled for OTP 22!


  1. My machine is:

    Operating System: macOS"
    CPU Information: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
    Number of Available Cores: 8
    Available memory: 16 GB
    Elixir 1.7.0-dev
    Erlang 21.0 (or 20.3)
    

Where is my comment box!?

I don't do traditional comments, but you're welcome to send me an email to michal at muskala dot eu and I'll publish it at the bottom of the article as a comment.