Introduction
In today's building software systems that can gracefully handle failures and maintain uninterrupted operation is crucial. Elixir, a powerful and fault-tolerant programming language, offers a range of strategies for managing processes when they encounter issues. These process restart strategies, including :permanent
, :temporary
, and :transient
, play a pivotal role in ensuring system reliability and resilience. In this guide, we'll explore the concepts and best practices behind these restart strategies, equipping you with the knowledge to design robust and dependable software systems in Elixir.
When do you use which option?
There are three options for :restart
:
-
Use
:permanent
when:- The process is critical for the system's overall operation, and its failure would severely impact the system's functionality.
- You want the process to be automatically restarted upon failure to maintain system availability.
- Examples might include database connections, core components, or critical services.
-
Use
:temporary
when:- The process is not essential for the system's continuous operation, and its failure can be tolerated without significant disruption.
- You want to minimize automatic restarts for non-critical processes to avoid excessive resource usage.
- Examples might include non-essential background tasks, loggers, or metrics collectors.
-
Use
:transient
when:- The process is not critical on its own but is part of a group of processes or a subsystem where consistency among processes is important.
- You want to ensure that dependent processes are restarted along with the transient process to maintain overall system integrity.
- Examples might include worker processes in a job processing system or components of a distributed system.
This option allows you to configure the recovery behavior of the GenServer in case of a failure, providing greater control over the system. In summary, the choice of restart strategy depends on the criticality of the process to your system and its dependencies. Careful consideration of these factors helps you design robust and resilient systems that can recover from failures effectively while avoiding unnecessary restarts for non-critical components.
Using the :restart Option
To utilize the restart strategies for a process in Elixir, you typically need to work within a supervision tree, which is a hierarchical structure used to manage and supervise processes. Here's how you can use the different restart strategies (:permanent
, :temporary
, and :transient
):
-
Creating a Supervisor:
- First, you need to create a supervisor using the Supervisor module. You can use
Supervisor.start_link/2
orSupervisor.child_spec/2
to configure the supervisor.
- First, you need to create a supervisor using the Supervisor module. You can use
-
Adding Child Processes:
- You then add child processes (which can include
GenServer
processes) to the supervisor's supervision tree using theSupervisor.child_spec/2
function. In the child specification, you can specify the:restart
strategy.
- You then add child processes (which can include
-
Defining Restart Strategies:
- In the child specification, you specify the
:restart
strategy. You can set it to:permanent
,:temporary
, or:transient
, depending on how you want the process to be restarted in case of failures.
- In the child specification, you specify the
Here's a code example of how you might configure a supervisor with different restart strategies:
defmodule Dummy.Application do
# See https://hexdocs.pm/elixir/Application.html
# for more information on OTP Applications
@moduledoc false
use Application
@impl true
def start(_type, _args) do
children = [
# Starts a worker by calling: Dummy.Worker.start_link(arg)
%{
id: Dummy.Permanent,
start: {Dummy.Permanent, :start_link, [[]]},
restart: :permanent,
type: :worker
},
%{
id: Dummy.Temporary,
start: {Dummy.Temporary, :start_link, [[]]},
restart: :temporary,
type: :worker
},
%{
id: Dummy.Transient,
start: {Dummy.Transient, :start_link, [[]]},
restart: :transient,
type: :worker
}
]
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [
strategy: :one_for_one,
name: Dummy.Supervisor
]
Supervisor.start_link(children, opts)
end
end
Dummy.Permanent
:
defmodule Dummy.Permanent do
use GenServer
def start_link(_) do
GenServer.start_link(__MODULE__, :ok)
end
def init(:ok) do
{:ok, :initial_state}
end
end
Dummy.Temporary
:
defmodule Dummy.Temporary do
use GenServer
def start_link(_) do
GenServer.start_link(__MODULE__, :ok)
end
def init(:ok) do
{:ok, :initial_state}
end
end
Dummy.Transient
:
defmodule Dummy.Transient do
use GenServer
def start_link(_) do
GenServer.start_link(__MODULE__, :ok)
end
def init(:ok) do
{:ok, :initial_state}
end
end
Let's start the IEx and get the app running.
$ iex -S mix
If you take a look, we check the count of children in the Dummy.Supervisor
. The supervisor started, and the children too. Getting the pid
and pipe to Process.exit/2
with the :kill
reason. Check the Dummy.Supervisor
again. Supervisor children's only rest is Permanent
and Transient
.
$ iex -S mix
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.128.0") |> Process.exit(:kill)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 2, workers: 2, supervisors: 0, specs: 2}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
For this next one, it will be a bit bigger, but bear with me. In this next one, I will play along with the Transient
process. If I use any reason to kill the Transient
process, he will come back alive again. But if the reason is :shutdown
or {:shutdown, term}
, the process will not get back alive, and in this situation, we can restart the proccess manually using Supervisor.restart_child/2
. And if I kill the temporary and try to restart the process manually, I get a tuple with {:error, :not_found}
. One important thing here is that I've been using the child ID.
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.127.0") |> Process.exit(:kill)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.145.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.145.0") |> Process.exit(:normal)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.145.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.145.0") |> Process.exit(:shutdown)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 2, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, :undefined, :worker, [Dummy.Transient]}
]
iex> Supervisor.restart_child(Dummy.Supervisor, Dummy.Transient)
{:ok, #PID<0.146.0>}
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.146.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.128.0") |> Process.exit(:shutdown)
true
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Transient, #PID<0.146.0>, :worker, [Dummy.Transient]}
]
iex> Supervisor.restart_child(Dummy.Supervisor, Dummy.Temporary)
{:error, :not_found}
For the last and not least, When I play with the Permanent
and try to kill using any kind of reason, the process always gets me back alive.
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.129.0") |> Process.exit(:shutdown)
true
iex> Supervisor.which_children(Dummy.Supervisor)
[
{Dummy.Permanent, #PID<0.145.0>, :worker, [Dummy.Permanent]},
{Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
{Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
Conclusion
In conclusion, process restart strategies in Elixir are indispensable tools for crafting software systems that can provide uninterrupted services. By applying strategies like :permanent
, :temporary
, and :transient
. These strategies empower us to build resilient systems that recover gracefully from failures, ensuring a smoother and more reliable experience for end-users. As you continue to explore the world of Elixir and keep these restart strategies in your toolkit.
Top comments (2)
Thanks for the article, a clear explanation of the different restart strategies.
@eddy147 welcome, and glad this article found your way