DEV Community

Zach Gover for DuploCloud

Posted on • Originally published at duplocloud.com on

Off-the-shelf Cloud Platforms vs DIY with Infrastructure-as-code

In this blog post, we compare two prevalent approaches to cloud infrastructure management. First is what we broadly classify as Infrastructure-as-code where engineers use a programming\scripting languages to build a set of scripts to achieve the desired topology on a cloud platform. Terraform, Cloud Formation, Chef, Puppet, Ansible are some popular ones. This technology comprises of a language to write script plus a controller that can run the scripts. Once satisfied with result the user would save the scripts in a code repository. Subsequently if a change is to be made then the files would be edited and the same process repeated.

The second category would be “Cloud Orchestrator” or “Platform”. This would typically be a thin abstraction over native cloud APIs. This would interface with the user as a web service and the user would connect to the service (via UI or API) and build the cloud topology within that web service itself. The topology built will be applied by the orchestrator and saved in it’s own database. The user does not need to explicitly save the configuration. When an update has to be update the user will again login to the system and make changes.

For smaller scale use cases a platform maybe too heavy. But at scale the former approach morphs into an in house platform. In this blog we argue that for larger scale infrastructure a better strategy is to use off-the-shelf platform that can be enhanced with infrastructure-as-code scripts when customization is required. Mega scale data centers like facebook, netflix are a different ball game and no content in this blog applies to them.

“Long Running Context”

The fundamental value that a platform based approach provides is what we call as a “long running context”. Typically people call it a “project” or a “Tenant”. A context could map to say an application or an environment like demo, test, prod or a developer sandbox or alike. The user when making updates to the topology always operates in this context. The platform would save the updates in its own database within this context before applying the same to cloud. One is always guaranteed that what is present in this database is what is applied to the cloud.

In the Infrastucture-as-code approach such a context in not provided natively and is left to the user. Typically this would translate to something like “which scripts needs to be run for which context” or maybe a “folder” in the code base that represents a configuration for a given “tenant” or “project”. Defining the context as a collection of code is harder because many of the scripts might be common across tenants. So most likely it comes down to the developers understanding of the code base.

A platform is a more declarative approach to the problem as it requires no or little coding as the system would generate the code based on the intent without requiring knowledge of low level implementation details while in case of scripting any changes require a good understanding of the code base especially when operating at scale. A user can come back and login to the same context a few days later and continue where he left without having to dig deep into the code to understand what was done before.

Diff between the code base and what is applied to the cloud

The second fundamental difference between the two is where Infrastructure-as-code is a multi-step process i.e. write the script, run it and merge it in the repo, while with a platform it is a one step process i.e. login to the context and make the change. During scripting it is possible that one might update a script, run the same to make a topology change but he may forget or postpone saving in the repository. Meanwhile another engineer made changes to the code base for his own side of topology and merged it. Now since many pieces of code are shared for the two use cases the first developer may find himself in a conflict which even if resolved by merging the code, lands him in a situation where what was run in the cloud is not what he has in the repo. So now he has to re run the merged code to validate the same, not withstanding the possibility of causing regression. So to avoid risk we need to now test the script in a QA environment.

All the “other” stuff

Scripting tools would enable deployments but there is so much more to running an infrastructure for a cloud software. We need an application provisioning mechanism, a way to collect and segregate logs and metrics per application, monitor health and raise alerts, audit trail, a authentication system to manage user access to infrastructure. Several tools are available to solve these individual problems but they need to be put together and integrated into an application context. Kubernetes, splunk, cloudwatch, signalfx, sentry, Elk, oauth providers are all examples of these tools. But one needs a coherent “platform” to bring all this together if they were to operate at a reasonable scale. This brings us to our next point.

“Lots” of Infrastructure-as-code is basically a home grown cloud platform

When talking to many engineers we hear the argument that Infrastructure-as-code combined with bash scripts of even regular programming languages like go, java and python provide all the hooks to overcome the above challenge. Plus there are indeed reusing a lot of pieces like Kubernetes, Jenkins etc. Of course I agree, its “code” one can build everything. But effectively you might building the same kind of platform yourself. Why not start from an existing platform and add customization through scripts. Again the definition of “lots” against an ROI is business specific. If you have tons of money or time to market or a mega scale use case like Facebook & Netflix then it’s a different context.

The second argument we have heard is that Infrastructure-as-code is most flexible and allows for deep customization while in a platform one might have to wait for the vendor to provide the same support. I think as we are progressing in technology where cars are driving themselves, platforms are way better and have great machine generation techniques to satisfy most, if not all, use case. Plus a good platform would not block one from customizing the part that is beyond it’s scope through the scripting tools. A well designed platform should provide the right hooks to consume scripts written outside the platform itself. Thus IMO this argument does not justify building a code base for majority of the tasks that are standard.

“There no platform that fits our needs”

This a also a common argument. This would mean a need for a good platform to solve a prevalent problem in a huge market. This is all goodness from our personal perspective. Through DuploCloud we believe to have built a good platform which addresses majority of the use cases while giving ability to integrate policies created and managed outside the system.

“The San Mateo Line!”

A hidden argument to build home grown platforms is that it is very cool project especially if the engineers are from a systems background. We live in the bay area and find a very interesting trend (not a generalization!) while talking to customers north and south of San Mateo. When we talk to infrastructure engineers in companies south, we find that they have a stronger urge to build platforms in house and they are quite clear as well that they are building a “platform” for their respective organizations and are not “scripting” as they would call it. Customization is the common argument against off-the-shelf tools. Hybrid cloud and on-premise is a very important use case. Open source components like K8, consul etc is common and thus the assertion that the wheel is not being reinvented. Yet the size of the team and time allocated for the solution is substantial. I, with my biases, am unable to see the ROI or the customization argument. In some cases the focus on building the platform overshadows the core business product that the company is supposed to sell.

A hidden argument to build home grown platforms is that it is very cool project especially if the engineers are from a systems background. We live in the bay area and find a very interesting trend (not a generalization!) while talking to customers north and south of San Mateo. When we talk to infrastructure engineers in companies south, we find that they have a stronger urge to build platforms in house and they are quite clear as well that they are building a “platform” for their respective organizations and are not “scripting” as they would call it. Customization is the common argument against off-the-shelf tools. Hybrid cloud and on-premise is a very important use case. Open source components like K8, consul etc is common and thus the assertion that the wheel is not being reinvented. Yet the size of the team and time allocated for the solution is substantial. I, with my biases, am unable to see the ROI or the customization argument. In some cases the focus on building the platform overshadows the core business product that the company is supposed to sell.

North of San Mateo we find mostly native cloud applications as against hybrid. The core talent is full stack. The nature of business is SAAS. The applications use so much of native cloud software (S3, Dyamo, Sqs, Sns) that it’s hard to be hybrid. They are happy to give the container to AWS ECS via API\UI to deploy it. They find no joy in either deploying or learning about Kubernetes. Hence the trend and depth of in-house customizations is much less.

North of San Mateo we find mostly native cloud applications as against hybrid. The core talent is full stack. The nature of business is SAAS. The applications use so much of native cloud software (S3, Dyamo, Sqs, Sns) that it’s hard to be hybrid. They are happy to give the container to AWS ECS via API\UI to deploy it. They find no joy in either deploying or learning about Kubernetes. Hence the trend and depth of in-house customizations is much less.

How many times and how many people will write the same code to achieve the same use. Time-to-market will eventually prevail.

The post Off-the-shelf Cloud Platforms vs DIY with Infrastructure-as-code appeared first on DuploCloud.

Top comments (0)