For almost two years now, I've been part of a core team developing a platform for building AI assistants. The assistants built were always focused more on quality instead of quantity. That meant that our requests-per-second would be relatively low, while the assistant's functionalities would be complex and numerous.
Then came Covid-19.
The culture of our company is oriented towards bringing value, and we decided to do what we do best - create an assistant for Covid-19 and donate it to the government.
After around a week of intense work, we were finished with the first version of the assistant. But that assistant (as all we had developed) was intended to bring a lot of value to a small number of concurrent users. But this assistant needed to be like no other.
Our goal: throughput of 250 req/sec, with a latency of under 3 seconds.
Our platform is built from micro-services, and they are written in Python. We were always sticking to Python because, as this great talk suggests, it's mostly a bad idea to go around writing services in different languages. The guideline from the talk - "one programming language for every 4,000 engineers." We have less than 100 engineers!
Our challenge was that we needed to support accepting
250 HTTP req/s almost instantly, and we need to support creating
250 HTTP req/s almost instantly.
We tried developing it with Python but ran into too many problems. It would (kinda) work, but for some reason the async nature of the code made the logs not work properly. Sometimes the code also had problems accessing the cache. Things got weird and hard to debug.
Since we didn't have enough time, we decided to write those components in Go. That turned out to be a brilliant idea. The performance was on the spot, it didn't require too many resources, and it scaled without any problems.
Looking back, I'm pretty happy that we made that call, although it does put additional strain on the developers in terms of the required knowledge.
We made some great architectural designs in regards to consistency and simplicity. But, as it turns out, some of them were not so great in terms of performance. So, let's focus on just one of those designs.
We had this notion that the "assistant's brain" should have all the information about a user when making decisions about what to do next for him. The idea was that it would make the assistant very flexible and would support creating any crazy idea that the product could (but didn't) have.
Unfortunately, when you have a lot of data about a user, that puts a lot of strain on the database (MongoDB) and the messaging system (Kafka). And then, we had to do it 250 times per second. Some data just had to go!
Days of previous planning that allowed us such flexibility was overridden in a couple of hours of reckless, rushed coding. My heart was broken, but it had to be done.
Working on a project like this that is meant for the greater good unified us. It made us work as a hyper-productive team for weeks, with 12h+ workdays and no notion of weekends. We knew what our goal was, and we were charging forward.
A lot of changes were made driven by the needs of the project with disregard to any "self-imposed rules" from before. This injected a lot of creative new approaches that we hadn't thought of before. Some of them are one-time "cheats", but others serve as a great starting point for further improvements.
This experience has greatly improved my understanding of the world and our place in it. The solutions we once coined are elegant, but are not adequate anymore for the needs of our users.