Dr Werner Voglers famously said, "No server is easier to manage than no server".
As he is the CTO of Amazon and so one of the people pushing the move to Serverless, you'd expect him to hold this view.
But in many ways, he is right. But this isn't the end of the story
Serverless or a server is a platform to run software on. It's worth remembering the things that are the purpose of software
- Whatever it is your customers are paying you for
- Unique value, differentiation
- Making your product automated, scalable, 24x7
And serverless can help with these
Whatever it is your customers are paying you for - that can appear quicker in a more responsive way than with using servers
The unique value, differentiation of the incredible software you are making should shine through - so let SaaS do the mundane stuff, you have more resource for specialization
Making your product automated, scalable, 24x7 - this is one of real "batteries included" features of SaaS
To manage a server you need to be expert at these things
- Simplicity in complex systems
- Instrumentation and visibility
- Graceful degradation
Being an expert at these things is server operations. Without these things your software isn't going to perform on a server
But remember serverless, does away with servers. And part of this is dealing with operational issues
A great way of understanding the advantage of serverless is to see how it can usurp people doing server operations.
- Scalability - serverless does this automatically
- Resilience - the cloud infrastructure has resilience
- Availability - cloud is 24x7
- Maintainability - cloud infrastructure has good maintenance features
- Simplicity in complex systems - cloud products are simple and reliable
- Instrumentation and visibility - basic metrics are available
- Graceful degradation - can be implemented
Of course, the serverless providers have teams of people doing this for you. And they can afford to hire the best people, due to economies of scale
Serverless isn't magic, it's specialisation.
Instead of you having to hire server operations people, this bit of work is supplied by the serverless provider.
However, if we dig in a little deeper, often serverless still needs operations people
Scalability - serverless does this automatically but within limits that must be understood. For example, in AWS S3 there is a limit of 3500 put operations per bucket per second. It's important to know the limits and understand the alternatives
Resilience - the cloud infrastructure has resilience usually as some “provision”. For example with AWS DynamoDB, there is a cost limited option or a "on demand" option that must be understood and set up
Availability - cloud is 24x7, except for very unusual events. For example in 2017, there was an outage on AWS S3. Even AWS itself was caught out by this as the service had previously been so reliable.
Maintainability - cloud infrastructure has good maintenance features, use of which has to be planned. For example, if using AWS API gateway as part of your stack, changes to the API have to be understood as a threat to stability and planned in ahead
Simplicity in complex systems - cloud products are simple and reliable. But emergent behaviour is still there. It's easy to add a microservice, and another, and another until there are many of them. Microservices architectures have the services calling each other. So to do some numbers, if each service has one input and one output and there are 10 of them and if they can be arbitrarily connected that's 9 factorial or 362880 different ways of connecting them. In the real world, most of these combinations would be unsuitable but the possibility is still there for misconnecting things. And when something like this does occur, it's down to you to find it, in production
Instrumentation and visibility - basic metrics are available. But application specific metrics have to be added. Going on from the example above with 10 cross-connected microservices, imagine how much easier it is to fix them if there is application level logging implemented for transactions. Companies like Honeycomb have a business supplying this extra, necessary level of traceability in the applications that are implemented on serverless or PaaS systems - which show emergent behaviour but do not monitor everything out of the box
Graceful degradation - can be implemented. And this can be difficult. The ideal form of graceful degradation is that valued customers got to use the part that is still working while the less important traffic is discarded. Maybe someone better informed than me can suggest a pattern for doing this within serverless
In summary: operational problems don't go away because you are running serverless!