Nočnica Mellifera for Heroku

Posted on Apr 16, 2020

The over-engineering trap

#culture #strategy

how you can fail by making products too good

I’d like to talk about two opposing ideas that almost every software engineer believes. The first is that you should write code well the first time, since it’s much more expensive to re-engineer code later. The second belief is that you don’t need to solve problems you don’t yet have; e.g. there’s no need to build spam filters into a messaging app with two users. This second pitfall is called ‘over-engineering,’ an insidious problem that’s harder to define than ‘under-engineering’, its opposite.

In this essay I’ll discuss the dangers of over-engineering and how it can be a huge drain on your project. My brief example at the top covered the best definition I know: designing a system that is more robust than it needs to be, or solves problems you don’t have.

It’s hard to find lovable the over-engineered ‘smart’ home devices like this brush that can remind me my hair is a mess. Never in my life have I picked up a hairbrush and thought ‘I wish this came with an app that told me I have dandruff’.

On the other hand, I absolutely love this over-engineered power bank from genius Kennedy Liu. I can imagine a use for every knob, dial, and port.

What causes over-engineering?

Not invented here

A huge factor for organizations with an unwillingness to use a product that wasn’t created in-house. This was famously an issue at Sony, where leadership refused to leverage the good ideas of their competitors, and instead came up with their own over-complex solutions to solved problems.

Traumatic Memories

I worked at a software company where one weekend the CTO had to put $2,000 in an envelope and send it to the Ukraine to get our database back after a hacker exploited SQL injection on our site. Needless to say code review after that got… harsh. This one (albeit serious) failure caused almost total paralysis, with every new release containing dozens of security measures.

Of course a breach should inspire better security, but unsurprisingly we over-rotated toward total security. Quarterly reviews for the next year contained no new features and few bug fixes, just a long list of security improvements.

Solving problems you don’t have is a lot easier than solving the ones you do

An easy signup process for new years is a tricky thing to engineer. Getting authentication, user experience, and onboarding right are all hard problems! When faced with a userbase that’s failing to grow, it’s often easier to add new features, especially ones your current users are asking for. All well and good but if every board meeting is about stagnant user growth, new features for existing users are not the problem you need to solve.

Overconfidence

Experienced engineers who build every new microservice such that it can handle 10,000 simultaneous sessions before it has even one user. This is overconfidence. It’s overconfidence in the desirability of the product you’re selling. When you do this it’s hard to see what’s wrong with it: after all why not build the best tool for the job the first time? But you don’t know yet if anyone wants this tool, and you’re making it much more expensive to find out the answer. If you spend less time on each release, you have a better chance of making one that will end up succeeding.

Overconfidence pt. 2: I know what the users want

I came into tech from tech support so this one is a biggie. For years I took calls from users, 90% of whom wanted a single feature. While engineers planned epics and assigned points to features, somehow the few features that our customers needed got pushed back and pushed back. Confusing UI that generated at least one user call every single day was repeatedly marked as trivial and never fixed.

How do we avoid it?

Identifying sources is an excellent start, and some simple steps in prioritizing bugs correctly may help (I think measuring the number of support cases related to each bug is a great start); but there are also some technical steps to mitigate over-engineering.

Focus on the business, not the tech

Again, my support background is speaking here: the better you can get your engineering team to understand real customer needs, the more you can avoid over-engineering. This of course applies to bug fixing where it’s best to know exactly what’s troubling the users. But even for new features, a strong model of our current users, helps engineers predict what needs to be robust and what can be simplified.

Ephemeralize what you can

Products like Heroku are strong ways to place some responsibilities outside the organization. With Heroku, concerns like ‘how will we add capacity to X service?’ can often be shifted to Heroku’s problem, and left to ‘just scale the service from the Heroku Dashboard.’ A number of maintenance and updating tasks on the underlying container (‘Dyno’ on Heroku) are also no longer your concern. This frees your team to focus on things that will help the user experience, rather than trying to over-engineer the platform component.

Do not become experts at everything

When you move production containers onto Heroku Dynos, you are saying that your team will not become experts at hosting containers. This can feel like admitting defeat. Surely you can develop the expertise and run your platform yourself. And you can! But I will say when I look at even a simplified troubleshooting guide for orchestrating containers yourself, I… get a headache:

When you spend time improving your platform hosting you are over-engineering a part of your product that does not directly improve the customer experience.

Platform-as-a-service is a way out of over-engineering

Every engineer on your team who fundamentally understands what your customers want—and what will make them use more of your product—is a huge benefit to your team. When we understand business needs, over-engineering becomes much less of a problem. When you have engineers who are experts in how to host your services, they almost, by definition, cannot be focused on your customers’ needs. If we free those engineers and let them work on problems that directly affect your customers, your product can stay more dynamic and be a better fit for customers.

So how does PaaS get us there? Two significant ways:

Freeing you of worry about platforms

I hear over and over again from Heroku’s customers: by offloading tasks of platform maintenance, patching, and updating your servers, you’re no longer wasting time trying to become better and better at running servers. Hopefully on small teams no one has to maintain the platform full time, and that means all developers can have some contact with actual customer needs.

Flexibility to reinforce success

It is hard to know the future. That means when you’re developing new products and services, it’s hard to know which will succeed. Further, when something does succeed it’s hard to know how it’s going to be used. This is a primary cause of over-engineering, and the flexibility of PaaS can help you avoid it. The pattern looks something like this:

Services A, B, and C are all developed with minimal concern to performance
Service B is a huge hit, and starts suffering performance drops
By adding instances via Heroku’s horizontal scaling, the company meets demand
Now the engineering team has time to improve service B’s performance

The advantage here is that instead of needing to optimize three services, we only had to work on the one we knew was a success, and that, only after we know what areas are weakest.

If you’ve faced over-engineering in your own role, or think this is an unreasonable concern, let me know in the comments!

Top comments (1)

Mihail Malo • Apr 16 '20

For me, it's the fear of overengineering that causes not-invented-here :D
Vendor products always try to support way too many usecases, so it's difficult to estimate what will take longer - writing just the bit you need right now vs taking on a dependency and adapting it to your needs.