I reinvented the wheel at work and it didn’t go well

#distributedsystemdes

Dave the Dev is so excited about his new task to build a new technology at work. It will be the coolest and shiniest thing ever created in my career, he thought. It’s hard but Dave managed to pull through and deliver. So what’s wrong then? Well, the tech doesn’t work as expected and Dave realizes later that a correct solution was already built. Oops.

You can forget about Dave now. That’s exactly what happened to me when I decided to “reinvent” a Distributed Lock.

Let’s see what this task is about. There are 2 requirements:

Mutual Exclusion : at any time, maximum 1 server can acquire the lock
Frequency : there must be approximately T seconds between 2 consecutive lock acquisitions

The lock is used to give access to a block of code that updates our database.

Coincidentally, by the time I received the task, I just finished reading “Redis in Action” by Dr. Josiah L Carlson. The book has a long section instructing how to implement Distributed Locking. I felt like a teenager who just bought his first motorbike and I could not think of anything else other than to be on the road.

By the way, the book is very good. I recommend fellow engineers to read.

Needless to say, I was determined to make it work. After reviewing my knowledge and some quick research, I chose to use Redis because not only it fits the solution that I learned from the book but it is also an available tool to use in my company.

It made sense.

Here is my plan, simplified.

I would setup Redis to store the lock and wrote all the necessary logic to ensure that only one server could acquire the lock. This satisfies Requirement 1. Readers can imagine the lock is stored as lock_key=a_unique _value.
I would also set the expire time of lock_key to be exactly T seconds. This satisfies Requirement 2. It actually not enough but at least, this is what I thought.
Additionally, some minor logic would need to be implemented to handle edge cases.

It seemed like a great plan, so I and the team spent about 2 weeks developing it into our existing backend. The due date came and I was eager to see the results. Little did I know what lies ahead…

Of course, the result was not what I had hoped for. Yes, only one server was able to acquire the lock, but the frequency was off, very off. My design could only satisfy Requirement 2 in my head, not in real life. It took a total of 3 long weeks to realize that the chosen approach was not delivering the desired outcome.

We thought of rolling back, fix it, and then re-release. But the change was already out there and rolling it back was not an option since other teams already released their changes based on ours. Even if we can rollback, we could not figure out a fix that won’t lead to other issues. It felt like our design is fundamentally wrong.

The frustration was real, and I felt like I had fallen into a deep hole, with no clear way out.

Just when I thought things couldn’t get any worse, it turned out that there was an existing solution that could have saved us all this trouble — ETCD. Had I explored this option earlier, we could have had a correct solution in just 1 week instead of the frustrating weeks we have endured. So the better way had been there all this time. Always.

ETCD works in this case because it has a built-in primitive called “compare and swap” to implement Distributed Lock. More details in this document Using etcd for distributed coordination. I plan to write about the usage of this soon, hope you will like it!

The project didn’t succeed. I also didn’t succeed technically. But I did grow up a little bit more in my career. Because this whole experience taught me a valuable lesson:

If I am about to reinvent the wheel, think again.

There are often existing solutions and technologies within reach that can save time, effort, and unnecessary complications. It’s important to take a step back, evaluate all available options, ask colleagues, and consider whether there’s already a suitable tool or system in place that can effectively address the problem at hand.

If above steps don’t work, give your self a second chance. Forget about the task for a while by using desperate measures like procrastination, take a walk, take a drink, hit the gym, talk to your dog, or cat,…

This lesson has now become a guiding principle for me in all future endeavors, and I hope that sharing this experience can help readers avoid similar pitfalls.

DEV Community

I reinvented the wheel at work and it didn’t go well

Top comments (0)

Read next

How to Prevent Race Conditions in a Node.js System Using Mutexes and RabbitMQ

Instruct LLMs to do what you want in Ruby ❤️

How to Create a Home Office That Boosts Productivity_20241220_205605

Understanding the BMI Calculator Code in React and Next.js