lazy_static
is one of the foundational crates in the Rust ecosystem.
It lets us use static variables without an explicit initialization call.
I used it many times without giving its performance implications much thought. Putting it inside some deeply nested loop got me worried if all that lazy-static magic has some hidden cost.
The crate's docs explain the mechanics behind lazy_static!
macro as:
The Deref implementation uses a hidden static variable that is guarded by an atomic check on each access.
That sounds innocuous enough, but I still have questions:
- Is there any noticeable performance cost incurred by the atomic check on each access?
- If
lazy_static
is used in a sub-module, will it be re-initialized on every call to a function from that module? - Is it any slower than initializing a variable manually and passing it to other functions as a parameter?
Without understanding the implementation details of lazy_static
I figured it would be easier to benchmark it than to dig through its source code.
How to run
- grab project's source:
git clone https://github.com/rimutaka/empirical.git
- benchmarks:
cargo +nightly bench
- tests:
cargo +nightly test --benches
Results
$ cargo +nightly bench
test bad_rust_local ... bench: 40,608 ns/iter (+/- 9,239)
test lazy_static_backref ... bench: 27 ns/iter (+/- 1)
test lazy_static_external_mod ... bench: 27 ns/iter (+/- 0)
test lazy_static_inner ... bench: 27 ns/iter (+/- 1)
test lazy_static_local ... bench: 27 ns/iter (+/- 5)
test lazy_static_reinit ... bench: 26 ns/iter (+/- 1)
test once_cell_lazy ... bench: 26 ns/iter (+/- 2)
test vanilla_rust_local ... bench: 27 ns/iter (+/- 0)
The results looked pretty neat. The only outlier was a piece of bad code I put in the benches intentionally to set the baseline.
TL;DR: lazy_static!
is fine, but once_cell
may be better for new projects.
Benchmarks in detail
bad_rust_local()
This bench does something obviously stupid - it recompiles the regex within the loop.
b.iter(|| {
let compiled_regex = regex::Regex::new(LONG_REGEX).unwrap(); // <-- don't place this inside a loop
let is_match = compiled_regex.is_match(TEST_EMAIL);
test::black_box(is_match);
});
With 40,608 ns/iter it gives us the baseline for the recompilation cost.
vanilla_rust_local()
There was NO noticeable performance cost incurred by the atomic check on each access.
vanilla_rust_local
bench compiled the regex once and took exactly the same 27 ns/iter as the benches using lazy_static
.
let compiled_regex = regex::Regex::new(LONG_REGEX).unwrap(); // <-- compiled once only
b.iter(|| {
let is_match = compiled_regex.is_match(TEST_EMAIL);
test::black_box(is_match);
});
lazy_static_external_mod()
, lazy_static_inner()
, lazy_static_local()
, lazy_static_backref()
These benches relied on lazy_static
with the only difference in where it was declared:
- lazy_static_local: at the root level
- lazy_static_inner: at a sub-module level (same file)
- lazy_static_external_mod: at a module placed in a separate file
- lazy_static_backref: at the root level, used in a sub-module
The lazy_static
declarations were identical in all cases:
lazy_static! {
pub(crate) static ref COMPILED_REGEX: regex::Regex = regex::Regex::new(LONG_REGEX).unwrap();
}
The placement of lazy_static! { ... }
declaration made no difference:
- The static variable was initialized once only
- All these benches took ~27 ns/iter each.
lazy_static_reinit()
It is possible to initialize the static variable before or after its first use by calling
lazy_static::initialize(&STATIC_VAR_NAME);
There was no additional performance cost for calling initialize
for the first time or any number of times after that.
This is inline with the documentation that states:
Takes a shared reference to a lazy static and initializes it if it has not been already.
once_cell
once_cell
is just as elegant as lazy_static!
and is about 20% faster in a 32-thread test. It performed on the par with lazy_static!
in my basic tests.
The static declaration is a single line of code:
static COMPILED_REGEX_ONCE_CELL: once_cell::sync::Lazy<regex::Regex> =
once_cell::sync::Lazy::new(|| regex::Regex::new(LONG_REGEX).unwrap());
and the usage of the static variable is exactly the same as with lazy_static!
:
b.iter(|| {
let is_match = COMPILED_REGEX_ONCE_CELL.is_match(TEST_EMAIL);
test::black_box(is_match);
});
There is an RFC to merge once_cell
into std::lazy
making it part of the standard library. It may be a more future-proof choice if you are starting a new project.
lazy_static
alternatives that DO NOT work
Declaring a static variable
pub(crate) static STATIC_REGEX: regex::Regex = regex::Regex::new(LONG_REGEX).unwrap();
ERROR: calls in statics are limited to constant functions, tuple structs and tuple variants rustc E0015
Declaring a const function
const fn static_regex() -> regex::Regex {
regex::Regex::new(LONG_REGEX).unwrap()
}
ERROR: calls in constant functions are limited to constant functions, tuple structs and tuple variants rustc E0015
lazy_static
in depth
This section goes deep into the source code of lazy_static
to really understand how it works.
The demo code in this project is split into several parts:
- benches: the source for the benchmarks in this post
-
examples/expansion_base.rs: a minimal implementation to get expanded code from
lazy_static!
macro -
examples/expanded.rs: the expanded code generated by
lazy_static!
macro from expansion_base.rs - src/main.rs: a self-contained implementation based on the expanded code
Your IDE will be unhappy with some parts of the code if you are on stable channel. Switch to/from nightly with these commands to get rid of the IDE warnings:
rustup default nightly
rustup default stable
Macro's code
Running cargo expand --example expansion_base
outputs the code generated for the crate. You can find the full output in examples/expanded.rs file.
In short, it looks something like this:
#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2021::*;
#[macro_use]
extern crate std;
extern crate lazy_static;
use lazy_static::lazy_static;
#[allow(missing_copy_implementations)]
#[allow(non_camel_case_types)]
#[allow(dead_code)]
struct COMPILED_REGEX {
__private_field: (),
}
#[doc(hidden)]
static COMPILED_REGEX: COMPILED_REGEX = COMPILED_REGEX {
__private_field: (),
};
impl ::lazy_static::__Deref for COMPILED_REGEX {
type Target = regex::Regex;
fn deref(&self) -> ®ex::Regex {
#[inline(always)]
fn __static_ref_initialize() -> regex::Regex {
regex::Regex::new(".*").unwrap()
}
#[inline(always)]
fn __stability() -> &'static regex::Regex {
static LAZY: ::lazy_static::lazy::Lazy<regex::Regex> = ::lazy_static::lazy::Lazy::INIT;
LAZY.get(__static_ref_initialize)
}
__stability()
}
}
impl ::lazy_static::LazyStatic for COMPILED_REGEX {
fn initialize(lazy: &Self) {
let _ = &**lazy;
}
}
fn main() {
let _x = COMPILED_REGEX.is_match("abc");
}
The snippet above depends on some functions provided by lazy_static
and that's where most of the magic happens.
I distilled it to a simpler version that does not have lazy_static
as a dependency at all. Skim through it and go to the explanations that follow:
struct Lazy<T: Sync>(Cell<Option<T>>, Once);
unsafe impl<T: Sync> Sync for Lazy<T> {}
struct CompiledRegex {
__private_field: (),
}
static COMPILED_REGEX: CompiledRegex = CompiledRegex {
__private_field: (),
};
impl Deref for CompiledRegex {
type Target = regex::Regex;
fn deref(&self) -> ®ex::Regex {
static LAZY: Lazy<regex::Regex> = Lazy(Cell::new(None), Once::new());
LAZY.1.call_once(|| {
LAZY.0.set(Some(regex::Regex::new(LONG_REGEX).unwrap()));
});
unsafe {
match *LAZY.0.as_ptr() {
Some(ref x) => x,
None => {
panic!("attempted to dereference an uninitialized lazy static. This is a bug");
}
}
}
}
}
There are a few key features in the last snippet to pay attention to:
-
struct Lazy
holds genericstruct CompiledRegex
that is instantiated asstatic COMPILED_REGEX
that actually holds the compiled regex. - That long chain is unravelled inside
impl Deref for CompiledRegex
to give us the compiled regex as a static variable -
std::sync::Once::call_once()
is used to initialize the regex once only -
match *LAZY.0.as_ptr()
gets us the initialized regex from deep inside the chain of structs
If the above code is still a bit confusing, look up src/main.rs for the full version with detailed comments.
Running the program with cargo run
will execute this demo code from src/main.rs using the snippet above for the lazy-static part:
fn main() {
println!("Program started");
println!("{TEST_EMAIL} is valid: {}", COMPILED_REGEX.is_match(TEST_EMAIL));
println!("{TEST_NOT_EMAIL} is valid: {}", COMPILED_REGEX.is_match(TEST_NOT_EMAIL));
}
and produce this output:
Program started
Derefencing CompiledRegex
CompiledRegex initialized
name@example.com is valid: true
Derefencing CompiledRegex
Hello world! is valid: false
As you can see, COMPILED_REGEX
is initialized once only and is dereferenced every time COMPILED_REGEX
variable is used.
Q.E.D.? :)
Top comments (2)
Looks nice.
However, don't use
lazy_static
today. Useonce_cell
, as it's going to be integrated intostd
. Also it seems like it's faster?.It also has wider API, for example, non-thread-safe lazy.
In addition to that, the perf effects of lazy values are really hard to measure. Will it be performant? Probably yes, until the branch prediction will be unable to predict it for too many branches. Will parameters be faster? Maybe, but they may cause pressure on the register allocator which is very hard to benchmark, and tend to only show in real-world programs.
Thanks Chayim! I updated the post with your comment.