My friend Nathan pinged me today asking me for help optimizing the lexer for some weird language he's building. Well, actually, our conversation started,
Nathan Ringo
Sigh, rly wish CL had &mut references other than using macros instead of functions
phoebe Goldman
i do not wish that
but they’re easy to emulate using clojures or conses as something like Cell
Nathan Ringo
I mean I don't wish they were as pervasive as in rust
But in eg my parser I've got a bunch of state I'd like to be stack allocated, but shared between things
I brainstormed some ways he could accomplish that (the options I proposed were either putting the state in a struct and dynamic-extent
allocating an instance, or putting it in a let
and writing all the functions that operate on it in an inner labels
). But eventually he let slip that he actually wanted to get his lexer to run at 1MiB/s or faster, and stack-allocating its state was one of his thoughts of how he might do it.
Now, not to sound too full of myself, but I consider myself an expert in optimizing Common Lisp code. The first thing I told him to do was to compile with (optimize (speed 3) (safety 1) (debug 0))
. Oddly, the effects were very platform-dependent; on an Intel i7, that saw a speedup of a few percent, but on an Arm A72, it was a few percent slower, because somehow it garbage collected more. Keep in mind that we weren't particularly collecting rigorous data; we were looking for low-hanging fruit, not micro-optimizations.
I asked him to send me the source, and it turned out that all of his data was in CLOS classes. (More specifically, he'd worked out some whacky arcana with the metaobject protocol to make CLOS classes with read-only slots. I've always said "just don't mutate them," but hey, to each their own, I guess.) Now I love CLOS. But it's slow as hell. And it gets way slower as soon as you start messing with the MOP. The strategy I advocate for is, develop your prototype using CLOS, and then, if you need it to be faster, convert it to using defstruct
. So that's what I did.
Conveniently, defstruct
already supports defining structures with read-only slots, so we didn't even have to resort to my laissez-faire "just don't mutate it" attitude. I grepped for defclass
in his project, replaced every instance with a similar defstruct
, and then replaced all the make-instance
calls with structure constructors for good measure. He wasn't using multiple inheritance (or any inheritance at all, actually), so the changes were trivial, and purely syntactic. No changes to logic required.
So I ran his benchmark on my m1 MacBook Air, and the speedup was absurd. For the old version, SBCL's time
reported:
Evaluation took:
3.155 seconds of real time
3.112716 seconds of total run time (2.781656 user, 0.331060 system)
[ Run times consist of 0.358 seconds GC time, and 2.755 seconds non-GC time. ]
98.67% CPU
66 lambdas converted
1,614,459,488 bytes consed
The new version, just replacing CLOS classes with defstruct
structs, showed:
Evaluation took:
0.680 seconds of real time
0.664909 seconds of total run time (0.543139 user, 0.121770 system)
[ Run times consist of 0.125 seconds GC time, and 0.540 seconds non-GC time. ]
97.79% CPU
15 lambdas converted
405,298,304 bytes consed
So obviously, 3.155s is like 4.5x as long as 0.680s. But also, the CLOS version is consing roughly 4x as much to produce the same number of objects (or at least, objects visible to his code). This doesn't particularly surprise me, but I think it may be unexpected to people less familiar with the internals of Common Lisp. The short version is that CLOS is the most powerful object system on the market, and implementing it efficiently is really hard. In particular, supporting dynamic (re-)definition of classes and methods means there's a lot of state, a lot of lookups, and a lot of caching.
I'd like to point out, just in case: the lesson here is not "don't use CLOS because it's slow." CLOS is a fantastic tool, especially for interactive development. The lesson is, write your code with CLOS, and if you're concerned about needing it to be fast, stick to the subset of CLOS that corresponds to defstruct
. Then, once you get to a place of actually caring about performance, try replacing your classes with structs. Start with the classes that you construct many instances of and access a lot; rarely-used classes won't matter.
Now go forth and impress your friends with your ability to make their code run several times faster in 10 minutes of editing!
Top comments (0)