Discussion on: Your unfinished pet projects are letdown or a gain?

View post

I developed a program to run through all possible chess moves. Of course this is an "impossible" problem, but I had a few ideas that might make it achievable in my lifetime. So I gave it a try.

I learned soooo many things about running high-throughput and long-running programs. I learned a lot about different concurrency models. I had to create my own custom binary serializer/deserializer to make the most compact representation of data (that I could come up with). I shoved trillions of rows into a table in Postgres. I ran the simulation for about a month and got the first ~ 1 million distinct checkmates.

Based on the size of the data, I believe I could store all possible (distinct) chess moves in about an exabyte of data. However, the compute time it would take to fill that data is less-straightforward to calculate.

Anyway, I learned these lessons specifically:

Agents and actors are better for "chunky" rather than "chatty" work.

Agent- or actor-based models (within the same machine) are slow because of the cost of copying messages.
Agent- or actor-based concurrency models may assume low throughput by default and crash when their mailboxes fill up.
Creating other actors to manage/limit queues drastically reduces performance even further.
The Disruptor concurrency model is better for high throughput scenarios. (I have a great application of it in mind for game programming.)

Postgres is awesome. I could kick myself for not using it sooner.

I did have to disable time-outs, because periodic auto-vacuum (or some other Postgres process) will hang up queries for a minute or more.

Linked-lists are not for hotly-used code paths.

Despite favorable big-o characteristics and usage semantics, linked lists are low-performing as a hot LRU cache.
Because linked list nodes are all over memory, you lose a lot of cpu time fetching-from/writing-to memory. Whereas array elements are all next to each other and are often fetched together.
I ended up having to use a less-accurate circular cache (FIFO rather than oldest out).
This might benefit from replacement with a Bloom filter, if I can figure out which hash functions to use.

I periodically pick up the program and play with it as I think of new ideas. And then it sits dormant for months. Started messing with it a couple years back.