Brian Tarbox for AWS Heroes

Posted on Jan 2, 2023

Time To Rethink Cattle vs. Pets (serverless)

#serverless #aws

AWS’s re:Invent conference always involves a large set of announcements, followed by weeks of experts, myself included, trying to figure out what it all means. Often times large debates are created in the process and this year is no exception.

One of the most interesting debates is about the definition of serverless. It used to be that serverless meant Lambda … after all Lambdas don’t have servers (we’re going to ignore the reductionist argument that eventually everything runs on something). Over the years the definition of serverless was expanded to include things like S3 or Aurora where we knew there were computers someplace but we didn’t have to pay attention to them. This year as AWS announced that almost all of their servers had at least the option to run without you the user owning the responsibility for the underlying hardware the debate expanded to ask the question if “serverless” had any meaning?

This whole discussion has been colored by the metaphor of “Cattle vs. Pets”, and I now think that base metaphor has to change.

Pets vs cattle was based on the notion that we care about our pets, we love them, name them, grieve for them when they die. Conversely, so the metaphor goes, we don’t name individual cattle and we don’t’ care when they die, we just replace them. For the past tens years or so we’ve just accepted this as both true and as a great metaphor for dedicated EC2 instances vs. AutoScaling Groups (ASG)/ Containers / Serverless.

I’m pretty sure that the metaphor wasn’t created by someone who had spent any time on a ranch or a farm. I have, and I’m here to tell you that the death of an individual cattle is not a non-event. One has to determine what caused the death, find out if more deaths should be expected, and you incur a variety of costs for the lost individual.

By the same token, losing a member of an ASG is also not a non-event. Pretty much the same analysis goes on to see if the failure was a one-off or the sign of a cascading failure.

I think it's time to change the base metaphor. I think a more accurate metaphor is “Pets vs. Cattle vs FastFood”. (I’m serious here, hear me out).

In this metaphor pets continue to map to dedicated EC2 instances, cattle maps to ASGs, and fast-food maps to serverless.

Think about a burger at a fast food restaurant. That burger has a whole lifecycle before it gets to your tray. It began as a cattle, got processed into hamburger, got shipped to a restaurant, got put on a grill, got assembled into a burger and then placed on your tray. The key point is that it didn’t become your burger until the very last step. You not only were aware of which patty was to become your burger, you in fact could not know. It was just a resource (spare cycles?) until it landed on your tray.

The analogy with serverless is that when you run Aurora or Lambda or S3 you fundamentally do not have access to the resources you will be given until they are in fact assigned to you. Even then, you have no way of knowing which fundamental hardware is servicing your system. To me, that should be the new definition of serverless: “a system is serverless to the degree that you do not have knowledge of the hardware/instance running your system.”. If you’re running a Lambda you have certain cpu/memory based on what you requested but that’s it. If you’re running ECS/EKS with Fargate you have access to your container but not to the host instance.

The shift in thinking is that for cattle/ASGs we don’t care about the particular individual, but we care very much about the collection made up by the individuals. And that caring translates into responsibility to observability, monitoring, troubleshooting and remediation of the set of individuals. If an availability zone runs out of an instance type needed by our ASG its up to us to take action. By contrast, if an AZ ran out of instances for a Lambda (which I’ve never even heard of happening), there really wouldn’t be anything we could do.

In other words, Cattle isn’t the end of the server/serverless spectrum, its actually a midpoint.

Once we understand that I think we’ll be in a better position to discuss just what serverless is now and what it might be in the future.

Latest comments (10)

flaviadiaz1 • Feb 2 • Edited

The article challenges the traditional "Pets vs. Cattle" metaphor in serverless computing, proposing a new analogy - "Pets vs. Cattle vs. FastFood." It emphasizes that serverless means users lack knowledge of the underlying hardware until resources are assigned, akin to a fast-food burger's lifecycle. The proposed definition of serverless is tied to this lack of visibility into the underlying infrastructure. The argument positions serverless as a spectrum, with "Cattle" representing a midpoint, reshaping discussions on the responsibilities and observability associated with different computing paradigms. Overall, it offers a compelling perspective on the evolving nature of serverless computing. Do you have knowledge on how to stop my dog from barking at night

Christian Bonzelet • Jan 6 '23

Liked the metaphor very much. I think the whole discussion around the "What is serverless?" and "When is something serverless?" needs such perspectives.

I never worked on a ranch but ate a lot of burgers in my live :) I immediately thought about an additional dimension that could be worth to discuss. It is the dimension of "value" and "demands".

Pets, Cattle, Burger - all have their reasons and purpose to exist. Each of them satisfy different needs for different people. What would this be in the cloud world? And is something like Amazon Opensearch serverless a category "cattle" or "burger" service"? Is this by the way a pattern of (natural) evolution?

What I learned out of our discussion within the 𝗔𝗪𝗦 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗕𝘂𝗶𝗹𝗱𝗲𝗿 program:
𝘾𝙤𝙣𝙨𝙞𝙙𝙚𝙧𝙞𝙣𝙜 𝙖 𝙨𝙚𝙧𝙫𝙞𝙘𝙚 𝙖𝙨 "𝙨𝙚𝙧𝙫𝙚𝙧𝙡𝙚𝙨𝙨" 𝙝𝙖𝙨 𝙢𝙖𝙣𝙮 𝙥𝙚𝙧𝙨𝙥𝙚𝙘𝙩𝙞𝙫𝙚𝙨 𝙖𝙣𝙙 𝙛𝙡𝙖𝙫𝙤𝙪𝙧𝙨.

I still see value in services that are called "serverless" ™️ but don't fulfill all the requirements we discussed and you highlighted in your post, Vincent Claes. I use an approach to give a service certain badges or labels that better describe what we can expect - each of them providing value in its own.

🏷️ 𝘚𝘊𝘈𝘓𝘌 𝘛𝘖 𝘡𝘌𝘙𝘖
🏷️ 𝘙𝘌𝘋𝘜𝘊𝘌 𝘖𝘞𝘕𝘌𝘙𝘚𝘏𝘐𝘗 / 𝘔𝘈𝘐𝘕𝘛𝘌𝘕𝘈𝘕𝘊𝘌
🏷️ 𝘗𝘈𝘠 𝘈𝘚 𝘠𝘖𝘜 𝘎𝘖
🏷️ 𝘙𝘌𝘋𝘜𝘊𝘌 𝘛𝘐𝘔𝘌-𝘛𝘖-𝘔𝘈𝘙𝘒𝘌𝘛
🏷️ 𝘗𝘙𝘖𝘝𝘐𝘋𝘌𝘚 𝘗𝘙𝘌𝘋𝘐𝘊𝘛𝘈𝘉𝘓𝘌 𝘗𝘌𝘙𝘍𝘖𝘙𝘔𝘈𝘕𝘊𝘌

While native serverless service like AWS Lambda should claim all available badges, serverless flavoured services like Amazon Redshift or Amazon OpenSearch might only receive a subset of those badges.

Depending on your unique situation this value is sometimes bigger - sometimes less. Prioritizing the importance of those badges for your unique situation is very important. Same like we handle traditional non-functional requirements in our system design. We evaluate the importance based on a given context. Now we can have a more meaning discussion and see things from different perspectives.

𝗜 𝘀𝗲𝗲 𝘃𝗮𝗹𝘂𝗲 𝗶𝗻 𝗔𝗺𝗮𝘇𝗼𝗻 𝗢𝗽𝗲𝗻𝘀𝗲𝗮𝗿𝗰𝗵 𝘀𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 - which is in the end more important than arguing whether the serverless suffix is appropriate or not. A personal story:

before this announcement I encouraged teams to think twice before using Amazon Opensearch in given situations. Why? Because managing nodes, indices and clusters can become an operational burden. How we have more options to decide making characteristics like 🏷️ 𝗥𝗘𝗗𝗨𝗖𝗘 𝗢𝗪𝗡𝗘𝗥𝗦𝗛𝗜𝗣 / 𝗠𝗔𝗜𝗡𝗧𝗘𝗡𝗔𝗡𝗖𝗘 a real thing and enabling teams to focus more on other more important parts of a workload.

Ben Stegeman • Jan 4 '23

I dig your metaphor, but I think it has a shelf life. It's too meat-centric for our decreasingly-carnivorous world.

How about a vehicle-centric metaphor? "Daily driver vs Rental vs rideshare?"

Then again, with self-driving cars on the horizon, maybe my metaphor also has a shelf life...

Stephen Sennett • Jan 2 '23

To your point that "a system is serverless to the degree that you do not have knowledge of the hardware/instance running your system", I'd take it to an additional step and say that it also has no consequence for me as a consumer.

OpenSearch Serverless being the latest example, we're still paying for the abstracted "Compute Units"; that underlying hardware during idle. At first, I thought "you can't make OpenSearch serverless, the architecture just doesn't work that way". Thinking about it though, I don't really care about the architecture and whether it's possible.

If AWS made a pricing scheme where I can pay simply for the data that I'm storing (without provisioning) and for each request that I invoke, that'd be fine. If I could have an 20GB index where I make 500 queries a month, I'd be expecting to pay for each of those 500 queries (e.g. $0.01 * 500 = $5.00), plus the storage for the files (e.g. $0.20 * 5GB = $1.00). No charges idling or indexing.

How AWS does it behind the scenes, I'd be curious, but practically wouldn't have to care; same as Lambda. As long as it meets my requirements (latency, availability, price per query, etc.), they can call it 'Serverless', and I'll be a happy camper.

Would be interested to hear your reflections on this!

Brian Tarbox • Jan 3 '23

@ssennettau I completely agree with you about "if it has no consequences for me"

Its hard (for me) to think of a master/replica DB as fully serverless because a failure of the master does have consequences for me. Is it "more serverless" than if I was running the DB on my own instances? Sure. Is it less serverless than lambda? Also, yes.

The main point I'm reaching for is that serverless is a spectrum.

Stephen Sennett • Jan 3 '23

That's a really solid point about the consequences being technical as well as financial. Serverless may have come too far for something to be a binary option option of "Serverful" or "Serverless" anymore, where that spectrum gets a lot broader.

Thanks for your thoughts, mate!

David Sol • Jan 2 '23

That is a good point. Yes, there is a "grey area" between traditional architectures and serverless, and yes, having ASGs is there.
But I have to say, I would never consider ASGs serverless. You are automating the management, but the need to define/handle/worry for the servers is still there.

Brian Tarbox • Jan 2 '23

100% agree, ASG is not serverless. They are however seen as at least cattle-like, in that we supposedly don't care if one of them fails. I think this whole area needs a rethink which is what I'm trying to nudge us into. :-)

Asa Gage • Jan 2 '23

You seem to be misunderstanding the key concept of the “Cattle vs Pets” metaphor. It is typically used to describe the configuration management practices of the infrastructure, not so much whether it uses autoscaling. However, if you are using autoscaling, you are likely using cattle-like management practices. Serverless infrastructure can be represented as cattle vs pets as well, depending upon how it is managed. If you create a lambda using the console you still have a pet. Likewise, you can have a single ec2 that is fully managed via infrastructure as code and it would qualify as cattle in a herd of 1. Add autoscaling to it and you have a more resilient herd.

Brian Tarbox • Jan 2 '23

You raise a fair point about autoscaling but I think you'll agree that its possible to create a pet via IOC. My point was more about failure modes. Consider Neptune, AWS's "fully managed" graph database or any other managed master/replica style system. A failure of a node really matters here.

After re:Invent the discussion was about what counts as serverless. I assert that a special EC2, an ASG, a managed master/replica DB and say Lambda represent different positions on the spectrum of serverless. I think pets v cattle encourages us to think dichotomy rather than spectrum. I think that is more salient than how the resource was created.

DEV Community

Time To Rethink Cattle vs. Pets (serverless)

Latest comments (10)

Read next

Autonomous employee onboarding chatbot for a large enterprise

Create call center transcript summary using AWS Bedrock and Lambda - Anthropic Haiku

Deploying Django Application on AWS with Terraform - Part 1

Comparison of Cloud Storage Services