Embrace simple tech stacks and code generation in DevOps and data engineering

#devops #dataengineering #sitereliabilityengineering #operations

DevOps, data engineering, and other platform engineering teams must recognize that the choices they make with regards to their tech stacks have huge effects on the rest of the organization. While adding a tool to the tech stack may boost the productivity of the platform engineering team, it could negatively impact the overall productivity of the organization. This is due to the law of leaky abstractions, which states that no abstraction can completely hide the underlying technologies from engineers. Platform engineers' sense of productivity must shift from building increasingly complex tech stacks to iterating faster on simple ones using code generation tools like LLMs.

The benefits and 'leakiness' of abstractions

Over the past decades, computer engineers have generally been able to achieve greater productivity through higher levels of abstraction. Operating systems provided useful tools like process schedulers and freed programmers from having to worry about hardware specifics. High-level programming languages freed programmers from worrying about allocating and freeing memory. Libraries and frameworks further allow programmers to interact with databases, distributed systems, and an unlimited number of other objects without having to worry about low-level implementation details. The 1975 book The Mythical Man-Month states: "Programming productivity may be increased as much as five times when a suitable high-level language is used." DevOps engineers, data engineers, and other platform engineers are no different in that they can achieve greater productivity through more layers of abstraction.

In theory, abstractions are supposed to hide low-level implementation details from us and allow us to focus on solving high-level problems. Reality is messier. Though process scheduling has been abstracted away by the operating system, you still need to have at least a basic understanding of scheduling if you want to make an informed decision about multithreading vs multiprocessing vs asyncio for your application. Though memory allocation has been abstracted away by your programming language, you still have to have a basic understanding of garbage collection to avoid memory leaks in your program. Generally speaking, you can only make informed engineering choices if you have at least a basic understanding of all the layers your application is built on. Joel Spolsky popularized this idea in his 2002 article "The Law of Leaky Abstractions". The idea is that there's no such thing as a perfect or 'non-leaky' abstraction that perfectly hides away all underlying details in such a way that an engineer using it never has to worry about them. Some underlying detail will always have a huge impact on the performance and correctness of your program.

How leaky abstractions affect platform engineers

Platform engineers such as DevOps and data engineers must develop the understanding and the empathy that the implementation engineers they serve want to be able to make informed decisions and build performant and bug-free applications. The implementation engineers can only achieve this goal by understanding the underlying layers of the platform they're building on. In other words, the more layers platform engineers build, the more layers the implementation engineers have to learn. Therefore, platform engineers must consciously limit the number of tools and layers of abstraction they introduce for the good of the organization as a whole, even if it keeps that particular team from reaching peak productivity.

The most common objection to this line of thought is along the lines of "web engineers shouldn't have to know any DevOps" and "data scientists shouldn't have to know any data engineering", and that these teams should just submit tickets if they need help. I believe that in most organizations, this is a short-sighted approach that's bad for everyone, for the following reasons:

It's bad for the platform engineering teams because the implementation engineers now can't make a single decision without asking the platform teams first, drowning everyone in meetings.
It's bad for the implementation engineers because they lose the ability to make informed decisions and to debug their own issues. Any code the web engineers and data scientists write will never be able to take full advantage of the underlying technologies. Any assumptions they make will always have a chance of blowing up in their faces.
It's bad for the organization as a whole because it creates a culture of "not my problem", "throw it over the fence", and "we must have ten meetings before we can make a decision", reducing everyone's feelings of trust, productivity, and satisfaction.

I'd like to acknowledge that some organizations have needs so complex that they require this complexity and specialization, even at the cost of overall productivity. However, unless you have clear evidence that your own organization is such a behemoth, you must assume that it only requires a simple platform until proven otherwise.

A starting point for conversations: 1-2 abstractions over the minimum

Discovering the truth about your organization's platform needs starts with affirming that platform choices affect everyone. All the business needs must be made explicit and the voices of all engineering teams must be heard before arriving at the best path forward.

Use this starting point: Platform engineers must introduce only one or two additional layers of abstraction in their platform architecture over what the implementation engineers have to know at the minimum to use the platform.

Let's dissect this statement:

Implementation engineers have to know how to use the outermost interface of the platform. Learning how to use the platform is a non-negotiable part of their jobs.
If the sets of knowledge needed to use the platform and to develop the platform are basically the same (difference of zero layers of abstraction), then the platform engineers are leaving productivity on the table. There will almost certainly be a tool that could boost their productivity by introducing an additional layer of abstraction without being too much for the implementation engineers to learn just the basics of.
On the other hand, if the platform engineers introduce three or more layers of abstraction over the minimum needed to use the platform, then that could become too much for most implementation engineers to learn in addition to their own jobs.

Use 1-2 layers over the minimum as a starting point for your design decisions and conversations. Only add complexity if there is a clear business need for it, or if all the implementation engineering teams are willing to invest extra time into learning a more complex and productive tech stack. The goal is to get to a point where the platform engineers get big productivity gains while implementation engineers can still understand the platform well enough to innovate and debug mostly on their own.

How to keep platform engineers engaged?

Limiting DevOps engineers and data engineers to one or two layers of abstraction over the minimum can be good for the entire organization, but can leave these engineers feeling unsatisfied. The best engineers like to feel productive, and going up in layers of abstraction is generally how computer engineers increase their productivity. The brightest platform engineers will see ways to improve their own productivity with more layers of abstraction, but won't be able to act on these insights. How can we keep the best platform engineers from feeling bored and unsatisfied with a tech stack that average implementation engineers can learn on the side?

I believe that the future of platform engineering - in DevOps, data, site reliability engineering, analytics, and everything else - lies in building platforms that are simple enough for implementation engineers to understand, but then iterating on them faster with code generation tools, such as templating engines and AI large language models. Anyone can use LLMs, but the best platform engineers will be challenged to figure out how to use them while maintaining code quality, consistency, and security. The reduced number of requirements will make the pool of job candidates wider, making it easier to look for the best ones. Platform engineering repositories will become more democratic too, with engineers of all levels able to contribute code. The best platform engineers will just be able to leverage code generation to contribute code 5x-10x faster.

Let's take a look at the relevant factors from StackOverflow's survey "What makes developers happy at work" and how they can still be satisfied with this new paradigm:

Strong sense of productivity: Platform engineers will feel more productive if they can solve more problems without getting bogged down in meetings while also feeling that they're empowering their coworkers rather than being bottlenecks for them.
Many growth opportunities: Platform engineers will be challenged to use code generation and AI tools while maintaining code quality, still leading to theoretically unlimited growth.
Visible, direct impact: Producing output faster will likely lead to more visible and direct impact as compared to trying to build up more layers of abstraction.
Able to solve problems my way: Platform engineers will be encouraged to explore new ways to solve problems with existing tools as well as to pursue as many new code generation approaches as they want.
Positive, healthy work relationships: Platform engineers will be able to speak a common language with each other and other engineering teams, hopefully feeling more connected and included rather than siloed.

DEV Community

Embrace simple tech stacks and code generation in DevOps and data engineering

The benefits and 'leakiness' of abstractions

How leaky abstractions affect platform engineers

A starting point for conversations: 1-2 abstractions over the minimum

How to keep platform engineers engaged?

Top comments (0)

Read next

Understanding DevSecOps Principles

My first AI Food Assistant

Modern Traffic Management with Gateway API in Kubernetes

Understanding the MLOps Lifecycle