DEV Community

Cover image for Hot Code Reloading in Elixir
Jeff Kreeftmeijer for AppSignal

Posted on • Originally published at blog.appsignal.com

Hot Code Reloading in Elixir

Through its roots in Erlang, Elixir's robustness and reliability are often mentioned as its greatest advantages. The ultimate example of this being its ability to upgrade an application without having to restart it.

Being one of Erlang's most amazing features, hot code reloading is sometimes compared to replacing the wheels on a driving car and it's majestically demonstrated with phone calls and hot bug fixes in Erlang: The Movie. But, how does it work under the hood?

In this edition of Elixir Alchemy, we'll dive into hot code reloading to see how Erlang's code server handles seamless code upgrades in Elixir. To understand how all this works, we'll start at the module level and work our way up. Let's get started!

Upgrading Modules

The first part of the magic of hot code reloading is Erlang's code server's ability to run multiple versions of a module simultaneously. It allows existing processes to run to completion without having to be restarted or having their running code changed.

To illustrate this, let's look at an example of a module named Counter. As the name implies, it counts up from 0 using a recursive function that sleeps for a second, prints the current number and calls itself with the number incremented by 1.

defmodule Counter do
  def count(n) do
    :timer.sleep(1000)
    IO.puts("- #{inspect(self())}: #{n}")
    count(n + 1)
  end
end

After starting IEx ($ iex -S mix), we spawn a process to start the counter loop in a separate process. We pass the module (Counter), function name (:count), and the arguments ([0]) to the spawn/3 function.

iex(1)> spawn(Counter, :count, [0])
#PID<0.107.0>
- #PID<0.107.0>: 0
- #PID<0.107.0>: 1
- #PID<0.107.0>: 2
- #PID<0.107.0>: 3
…
iex(2)>

While keeping the counter running, we update the counter module to increment the number by 2 instead of 1 in lib/counter.ex. After that's done and the file is saved, we recompile the module in the IEx session.

…
- #PID<0.107.0>: 2
- #PID<0.107.0>: 3
iex(2)> r Counter
{:reloaded, Counter, [Counter]}
- #PID<0.107.0>: 4
- #PID<0.107.0>: 5
- #PID<0.107.0>: 6
…
iex(3)>

The module gets recompiled, but the existing counter still increments by one, meaning the old version of the code is still running in this process. If we spawn a new process that runs a counter, it will be incremented by two, thus it will be running the new version.

…
- #PID<0.107.0>: 4
- #PID<0.107.0>: 5
- #PID<0.107.0>: 6
iex(3)> spawn(Counter, :count, [0])
#PID<0.114.0>
- #PID<0.107.0>: 7
- #PID<0.114.0>: 0
- #PID<0.107.0>: 8
- #PID<0.114.0>: 2
- #PID<0.107.0>: 9
- #PID<0.114.0>: 4
…
iex(5)>

This example shows Erlang's code server in action. By keeping the old version of the module present, the first process (#PID<0.107.0>) continues running as it did before, but newly spawned processes (#PID<0.114.0>) automatically use the new version.

The Erlang Code Server

Erlang's code server handles loading compiled code in a running system. At any one time, the code server can keep two versions of a module in memory. When a module is loaded, it becomes the current version of that module. If a previous version of that module was already present, it's marked old.

Both current and old versions of a module can run at the same time, but the exported functions from the old version are replaced by the ones from the new version. This ensures that every external function call always calls functions on the current version of the module.

If a process is already running when a new version of a module is loaded, it will linger on the old version, and all of its local function calls will be handled by the module's old version.

Hot Reloading GenServers

Let's take this a step further by turning our example into a GenServer. Like the Counter module in the previous example, the CountServer counts up by incrementing its state every second.

defmodule CountServer do
  use GenServer

  def start_link do
    GenServer.start_link(__MODULE__, 0)
  end

  def init(state) do
    Process.send_after(self(), :increment, 1000)
    {:ok, state}
  end

  def handle_info(:increment, n) do
    incremented = n + 1
    IO.puts("- #{inspect(self())}: #{incremented}")

    Process.send_after(self(), :increment, 1000)

    {:noreply, incremented}
  end
end

Since the CountServer is a GenServer, we don't need to start it in a separate process manually. Instead, we call CountServer.start_link/0 in a new IEx session to start the counter.

iex(1)> {:ok, pid} = CountServer.start_link()
{:ok, #PID<0.130.0>}
- #PID<0.130.0>: 1
- #PID<0.130.0>: 2
- #PID<0.130.0>: 3
…
iex(2)>

Let's try updating it like we did in the last example. We update the CountServer to increment by 2 instead of one. Then, in the running IEx session, we recompile the module.

…
- #PID<0.130.0>: 2
- #PID<0.130.0>: 3
iex(2)> r CountServer
{:reloaded, CountServer, [CountServer]}
- #PID<0.130.0>: 5
- #PID<0.130.0>: 7
- #PID<0.130.0>: 9
…
iex(3)>

This time, the running GenServer did update. After recompiling the module, the counter started incrementing by 2 instead of one without having to be restarted or starting a new counter.

Local and External Function Calls

The first example had a recursive function spawned in a process, while the second had a GenServer which spawned a process to keep its state.

As we learned while deconstructing GenServers, the GenServer's module and its spawned state are run in separate processes. In the second example, the state, which was kept in the GenServer process, was updated by calling out to the CountServer module.

This difference is important for code reloading. Local function calls, like the first example, in which a module calls its own function, are executed on the old version of the module. External function calls, like the GenServer process calling out to the CountServer module, are always done on the current version of the module.

This explains why the first example didn't reload the existing module, while the second one immediately did when the new module was loaded.

Transforming State

Although the state in the GenSever example got transformed correctly by the reloaded version of the CountServer module, there's one more scenario to look at. What happens when the new version of the implementation requires a different state?

As an example, let's say we need our CountServer to only produce even numbers from now onward. Our current implementation increments by 2 every second, so most of the work is already done.

However, if we have a process running that still increments by 1, we run the risk of upgrading at the wrong second, causing it to produce odd numbers instead. We need to make sure to update the state when we upgrade the module to the new version.

Elixir's GenServer module has a callback named code_change/3, that is used for updating the state in the event of a code change.

defmodule CountServer do
  use GenServer

  # ...

  def code_change(_old_vsn, state, _extra) when rem(state, 2) == 1 do
    {:ok, state - 1}
  end

  def code_change(_old_vsn, state, _extra) do
    {:ok, state}
  end
end

This example implements the code_change/3 callback. If the state is an odd number, it will subtract 1 from the current state, making it an even number.

The _old_vsn argument holds the module's old version that we're upgrading from. It can be used to upgrade from a specific version, and the _extra argument can be used for extra arguments while upgrading. For brevity, both of these are ignored here.

To invoke the code_change/3 callback, we have to explicitly change the code for a process. To do that, we temporarily suspend the process, run the code change and resume it again.

…
- #PID<0.130.0>: 7
- #PID<0.130.0>: 9
iex(3)> :sys.suspend(pid)
:ok
iex(4)> r CountServer
{:reloaded, CountServer, [CountServer]}
iex(5)> :sys.change_code(pid, CountServer, nil, [])
:ok
iex(6)> :sys.resume(pid)
:ok
- #PID<0.130.0>: 10
- #PID<0.130.0>: 12
- #PID<0.130.0>: 14
…
iex(3)>

NOTE: When releasing code, this is done automatically for each module in your app. There's no need to explicitly call the change_code/4 function outside of IEx.

Backward Compatibility

Having external function calls executed on the new version of the module allows for gradual switching over to new versions of the modules in your app. Forcing processes that are lingering on old versions of modules to call out to the current version when doing external function calls ensures that the lingering processes don't spawn even more lingering processes. However, mixing old and new code in a running system can cause problems when the new code is not backward compatible with the old modules.

Let's improve our counter a bit. Instead of hard coding the added value, we'll allow it to be passed as an argument.

defmodule CountServer do
  use GenServer

  def start_link do
    GenServer.start_link(__MODULE__, 1)
  end

  def init(state) do
    send(self(), {:increment, 1})
    {:ok, state}
  end

  def handle_info({:increment, value}, state) do
    new_state = state + value
    IO.puts(new_state)
    Process.send_after(self(), {:increment, 1}, 1000)

    {:noreply, new_state}
  end

  # ...
end

This example allows passing a value to increment by. We've also made sure to update both messages in init/1 and handle_info/2 to make sure they use the new format.

- #PID<0.130.0>: 12
- #PID<0.130.0>: 14
iex(2)> r CountServer
{:reloaded, CountServer, [CountServer]}
iex(3)>
15:09:01.313 [error] GenServer #PID<0.130.0> terminating
** (FunctionClauseError) no function clause matching in CountServer.handle_info/2
    (odd) lib/count_server.ex:13: CountServer.handle_info(:increment, 14)
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: :increment
State: 14
** (EXIT from #PID<0.128.0>) shell process exited with reason: an exception was raised:
    ** (FunctionClauseError) no function clause matching in CountServer.handle_info/2
        (odd) lib/count_server.ex:13: CountServer.handle_info(:increment, 14)
        (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
        (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
        (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Interactive Elixir (1.7.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

This example results in a FunctionClauseError, because the old version of the code passes :increment as a message. Because we removed the clause that accepted it, the CountServer crashes, as there's no handle_info/2 to handle this case.

Instead, we should have kept a clause that accepts the :increment message so as to provide backward compatibility for the previous version to do a clean upgrade.

defmodule CountServer do
  use GenServer

  # ...

  def handle_info(:increment, n) do
    handle_info({:increment, 2}, n)
  end

  def handle_info({:increment, value}, n) do
    incremented = n + value
    IO.puts("- #{inspect(self())}: #{incremented}")

    Process.send_after(self(), {:increment, 2}, 1000)

    {:noreply, incremented}
  end

  # ...
end

A First Look at Code Reloading

The logic required for hot code reloading is often already present but abstracted away. For instance, GenServers implement the code_change/3 callback with a stub that returns the unchanged state.

In this episode, we made observations from the perspective of the module being upgraded. In a next episode, we'll look at upgrading whole applications, production releases and Phoenix applications. Subscribe to Elixir Alchemy to get the next episode delivered straight to your inbox.

Top comments (5)

Collapse
 
rhymes profile image
rhymes

Thank you Jeff, all this stuff is amazing. Love that the final version is more functional, passing the state in the recursive function.

I'll have to learn Elixir at some point.

You inadvertently reminded me of NGINX's own upgrade functionality, in which it spawns a new master process which in turn spawns a new set of workers and then tells the old ones to finish up what they are doing and then die.

Another thing you reminded me is that Go from 1.11 supports loading third party modules in different versions in the same application/binary. I guess someone will come up with hot module reloading for that feature too creating a pluggable system.

It's awesome that all of this is "free" in Elixir/Erlang.

Collapse
 
sergio profile image
deleteme deleteme

Bookmarking to digest this more later after work. Hot reloading is one of those things that's magic about Elixir, you hear what it does and you're like: "What I can do that?!"

Also, if you need Elixir app monitoring and performance metrics, AppSignal is the best I've used hands down. It's really easy to set up with Phoenix projects.

 
rhymes profile image
rhymes

Well. Honestly, it all sounds like “we can nail this screw with an iron.” Sure we can.

Ah ah sure ;-) I was spitballing ideas just for the sake of them. It makes way more sense to build the server on top of Elixir if you need Erlang VM's features

Collapse
 
edh_developer profile image
edh_developer

This put me in mind of the Windows Installer service, oddly enough. DLL updates work in much the same way, allowing running applications to keep using the old version of the DLL, while any new processes will load the new one.

Collapse
 
rhymes profile image
rhymes

I wouldn't consider it stillborn but right now it's pretty hard if not impossible yes.

Go doesn't have a "runtime VM" nor the ability to load multiple versions of the same plugin.

But, let's enumerate a few possible useful "legos" to build such functionality:

So, a giant maybe, you could ship a server with multiple binaries handled by a different goroutine each, all monitored by the supervisor tree and then tell the supervisor tree it needs to tell the goroutine to die (with channels) when done and spawn another one with the new shipped binary.

Does it make sense? It's definitely not the same thing but if the app is organized in this way it might come a little closer to what Elixir and Erlang have.