My friend Nathan pinged me today asking me for help optimizing the lexer for some weird language he's building. Well, actually, our conversation started,
Sigh, rly wish CL had &mut references other than using macros instead of functions
i do not wish that
but they’re easy to emulate using clojures or conses as something like Cell
I mean I don't wish they were as pervasive as in rust
But in eg my parser I've got a bunch of state I'd like to be stack allocated, but shared between things
I brainstormed some ways he could accomplish that (the options I proposed were either putting the state in a struct and
dynamic-extent allocating an instance, or putting it in a
let and writing all the functions that operate on it in an inner
labels). But eventually he let slip that he actually wanted to get his lexer to run at 1MiB/s or faster, and stack-allocating its state was one of his thoughts of how he might do it.
Now, not to sound too full of myself, but I consider myself an expert in optimizing Common Lisp code. The first thing I told him to do was to compile with
(optimize (speed 3) (safety 1) (debug 0)). Oddly, the effects were very platform-dependent; on an Intel i7, that saw a speedup of a few percent, but on an Arm A72, it was a few percent slower, because somehow it garbage collected more. Keep in mind that we weren't particularly collecting rigorous data; we were looking for low-hanging fruit, not micro-optimizations.
I asked him to send me the source, and it turned out that all of his data was in CLOS classes. (More specifically, he'd worked out some whacky arcana with the metaobject protocol to make CLOS classes with read-only slots. I've always said "just don't mutate them," but hey, to each their own, I guess.) Now I love CLOS. But it's slow as hell. And it gets way slower as soon as you start messing with the MOP. The strategy I advocate for is, develop your prototype using CLOS, and then, if you need it to be faster, convert it to using
defstruct. So that's what I did.
defstruct already supports defining structures with read-only slots, so we didn't even have to resort to my laissez-faire "just don't mutate it" attitude. I grepped for
defclass in his project, replaced every instance with a similar
defstruct, and then replaced all the
make-instance calls with structure constructors for good measure. He wasn't using multiple inheritance (or any inheritance at all, actually), so the changes were trivial, and purely syntactic. No changes to logic required.
So I ran his benchmark on my m1 MacBook Air, and the speedup was absurd. For the old version, SBCL's
Evaluation took: 3.155 seconds of real time 3.112716 seconds of total run time (2.781656 user, 0.331060 system) [ Run times consist of 0.358 seconds GC time, and 2.755 seconds non-GC time. ] 98.67% CPU 66 lambdas converted 1,614,459,488 bytes consed
The new version, just replacing CLOS classes with
defstruct structs, showed:
Evaluation took: 0.680 seconds of real time 0.664909 seconds of total run time (0.543139 user, 0.121770 system) [ Run times consist of 0.125 seconds GC time, and 0.540 seconds non-GC time. ] 97.79% CPU 15 lambdas converted 405,298,304 bytes consed
So obviously, 3.155s is like 4.5x as long as 0.680s. But also, the CLOS version is consing roughly 4x as much to produce the same number of objects (or at least, objects visible to his code). This doesn't particularly surprise me, but I think it may be unexpected to people less familiar with the internals of Common Lisp. The short version is that CLOS is the most powerful object system on the market, and implementing it efficiently is really hard. In particular, supporting dynamic (re-)definition of classes and methods means there's a lot of state, a lot of lookups, and a lot of caching.
I'd like to point out, just in case: the lesson here is not "don't use CLOS because it's slow." CLOS is a fantastic tool, especially for interactive development. The lesson is, write your code with CLOS, and if you're concerned about needing it to be fast, stick to the subset of CLOS that corresponds to
defstruct. Then, once you get to a place of actually caring about performance, try replacing your classes with structs. Start with the classes that you construct many instances of and access a lot; rarely-used classes won't matter.
Now go forth and impress your friends with your ability to make their code run several times faster in 10 minutes of editing!