At the beginning of November a client asked me to join their Cloud Panel and talk on the topic of cloud transformation. You can find the slides on Slideshare.
This article is based on that presentation. So, let's talk about cloud native development and its impact on organisations. Most of my clients are large insurance or financial companies. And they are considering a migration to the cloud. And they are asking for help. The discussions are either based on FUD (fear uncertainty and doubt) or snake-oil.
It is not surprising that the truth lies in between.
This is the first of a two part series. Each introduces things we learnt while moving companies into the public cloud. I focus on culture and organisation. Not because tech is boring. Rather most discussions focus on technology and architecture without ever touching the more social aspects.
Keep in mind that what worked for me and for my clients may not work for you. Context matters.
This first article probes the questions around the organisation and effective collaboration. The follow up text looks at the mythical platform team and its implications.
Before we go into the details let's talk about why we migrate into the cloud.
In a nutshell, it's all about efficiency.
We need to be efficient because we do not know what our customers want. Nobody can specify in detail what is needed. Nobody can predict the future and especially our clients cannot. They don't even know what they want until they see it.
That means the only way to build the correct products, is to implement our ideas as fast as possible and to iterate on them. Improving our products step-by-step.
This leads to the conclusion that our businesses are only as efficient as our IT is. No longer can we treat our IT as a cost-centre. We have to move IT into the heart of our organisation, if we want to be and stay competitive.
And this is where the public cloud enters the game.
The cloud allows us to focus on the essentials. We use SaaS where possible. We do not build our own load-balancers or start hosting a SQL database ourselves. We replace hand-crafted assets with cloud products. E.g., use Google’s Cloud SQL instead of our own PostgreSQL instance. This reduces complexity and allows us to put more energy into our products. We are more efficient.
Marie Kondo is a Japanese organising consultant. She specializes in tidying up and reducing superfluous clutter. We can do the same to our IT. There are many strategies for transforming our IT to the cloud. The following four approaches are pretty common:
Lift-and-Shift: we take an asset and host it more or less 1:1 onto the cloud. E.g., taking a monolithic JEE application and move it to Google Cloud Compute VMs. We get rid of the underlying operations components and machines. But do not enjoy other cloud capabilities.
Re-architect: The prime example in every microservice book. We take an existing asset, such as a monolithic JEE application, and redesign it from the ground up. Effectively replacing it for example with a series of new cloud-native microservices. We can use all cloud capabilities, because we are rebuilding and redesigning everything.
Retire: My favourite. We identify assets and processes that we and our customers no longer need. We remove these assets.
Replace: Remember efficiency? "Replace" is all about efficiency. We replace something we took care of ourselves and use a SaaS offering instead. One example could be to replace a self-hosted Kafka with a managed version, e.g., using AWS MSK.
The effort and efficiency of each approach depends on the strategy for moving into cloud. “Lift-and-Shift” might be the best approach, if the goal is replacing a datacenter. If we want to reduce complexity and use SaaS as much as possible, then “Replace” would be the appropriate approach.
In the end, we will end up with a hybrid architecture. We build some assets for the cloud and some assets will stay on-premise, at least for some time.
We can draw two conclusions from this fact:
Firstly, we will have more complexity, at least temporarily. The original datacenter is still around. Maybe smaller and with fewer assets, but still a burden. Operations has to support the original environments and the new cloud environment. This increases effort and cost and we must take this into account from the start.
Secondly, the cloud-hosted assets usually depend on the on-premise assets. More often than not, the cloud-hosted assets need changes to the existing on-premise assets. Firewalls need to be changed, APIs need to be exposed or extended. And so on. This dependency leads to the first potential cultural and organisational trap.
The fact that we have two areas, that can move at different speeds led to something called the two-speed-IT, which we'll discuss next.
The idea of a two-speed-IT is not new. It has been around since circa 2014.
McKinsey describes the goal of a two-speed-IT as "A two-speed IT architecture will help companies develop their customer-facing capabilities at high speed while decoupling legacy systems for which release cycles of new functionality stay at a slower pace."
The underlying premise is that you can run your organisation in two different ways. One shiny, great and new. And the other rusty, dusty and old. I will not delve into all the aspects why this is problematic. I concentrate on the organisational part. But to give you a picture, two-speed-IT is like attaching extra rooms to your house because you cannot be bothered to clean up. Not a very sustainable approach, in my eyes.
Going back to the softer, non-technical aspects. With two-speed-IT the language around the transformation changes in an interesting way.
The cloud-assets are usually associated with a modern and lean technology stack. We use Go-lang, Node and Docker. The development process uses an agile process, such as Shape-Up. We speak of forward-leaning teams. We use "Speed Boats" as metaphors for teams working on these cloud-products.
On the other side of the fence lies the on-premise country. Here are the technologies of days-gone, Corba, Cobol, SOAP and EBCDIC. The process is heavy-weight, maybe even a waterfall with one or two releases per year. We speak of slow-moving tankers, with no ability to either change or react quickly. We even call this "legacy".
Why is this problematic?
As we have seen, the cloud-products usually need access or even changes to the existing assets. That means, we need collaboration between the different areas of engineering. Also let’s not forget the expertise of the people working on these systems. Documentation is outdated the moment it was written. The only way to understand systems is to have the human experts available.
Things become difficult, if the "on-premise-people" are not part of the cloud-transformation.
If people feel left behind and sidetracked, then we don’t get collaboration. Instead we get resentment. People may not be willing to help as much as we need their help. Or - in the worst case - people may end up sabotaging the cloud-transformation. Either knowingly or more often due to negligence. Why should someone support our efforts, if the person is going to be replaced by our project.
The solution to this dilemma is pretty straightforward. First we need to realise that nobody actually means to do harm or a bad job. Assume Best Intent is often the best way to operate. With this in place, we see that the root of our problem lies in fear.
Fear of being obsolete.
Fear of being left behind.
Fear of losing a job or importance.
We have to get rid of that unfounded fear.
Transparency and communication are key to removing fear.
Bring everybody on board. Mix cloud-product teams with on-premise experts into one end-to-end team. We retrain the staff, offer courses for people willing to learn. We create new roles and positions for our new engineering culture. We offer people a perspective for growing.
And we need to be transparent. We should communicate our rationale for the cloud transformation in clear terms. If we want to get rid of our self-hosted datacenter, then what is the plan for the people operating that datacenter now? How will they be retrained and up-skilled? Who hires the new skills we need? And so on. If we tackle these difficult topics openly, then we stop fear and gossip in their tracks.
If you want to silence the doubters and fear-mongers, delivery is the only option. Only working software in production will prove that the cloud journey is possible. But, one may ask, even if we bring everybody together and work on this, how can we bring an entire company into the cloud?
Well, one takes one step at a time.
Instead of trying to jump onto the mountain, we take the scenic route and enjoy the journey. We do not need to go all-in-serverless in the first couple of months. We can decide step-by-step what our realistic target actually is. Let's consider the following illustration.
We want to be opportunistic in some areas but full on cloud-native in others. Again, transparency is key. Everybody should understand why we move some areas to the cloud, while others are not.
I cannot stress this enough. We must find a thin-slice of business proving the technology and especially the new way of collaboration. The people will form a band of trust and cooperation that will act as a radiator in our organisation. The thin-slice should be something that adds to our area of business. Not a technical spike, not a proof-of-concept. Rather something essential. Only then will people feel committed and get involved.
Moving into the cloud involves architecture, technology but also organisation and culture. The concrete approach does not change the implications. Whether we lift-and-shift, re-architect, retire or replace, we will end up with a hybrid landscape of new and pre-existing assets.
Two-speed-IT was brought up as a concept around 2014 but has lost its footing in the last couple of years. Reality has caught up with the ideas. Organisations have seen the downsides and implications, some of which I mentioned in this text.
People are key.
Engineers will learn new technology and new architectures anyway. But learning to trust, to work together, to collaborate is so much harder. Especially during a pandemic, when you cannot go around the corner and grab a cup of tea.
Allowing people to take part and to get involved helps building bridges. We do not want any walls in our organisation. Not on a social level and not on a communication level. Software development is a team effort and teams need to trust each other.
The next article examines the question around the concrete teams. Like which skills are needed and how can we scale this in a reasonable way.