George Koniaris

Posted on May 5, 2020 • Edited on Jul 11, 2020 • Originally published at gkoniaris.gr

Why you should never use sticky sessions

#webdev #devops #architecture

This post was originally published in my blog.

Sticky sessions grant the ability to route incoming requests to a specific server, based on their session identifier. We usually find them in applications that keep their state in a non-sharable service. An example is keeping state in memory or the server’s hard disk. In this article, we will discuss what sticky sessions are. We will also explain why they are “bad”, and how to design an application so we can fully avoid them.

How do sticky sessions work?

Binding a sessionID to a specific server on our infrastructure, overrides our default load-balance strategy. Below, you can see a diagram showing how a user can request a session from our servers. The load balancer routes the request in a random server, in our example, Server 2.

After getting a sessionID, the session is bounded to Server 2. In that case, the load balancer forwards each consecutive request to this server.

This usually happens by applications that keep their state in some local storage, like memory or an HDD. This way, Server 1 doesn’t have access to the actions performed by previous requests. Server 2 is now obliged to serve these requests.

Why you should avoid sticky sessions?

They can make our application go down easily

Imagine that one of our servers goes down. Because of the architecture we chose to follow, even if we use persistent storage like HDD for example, to keep our application’s state, we won’t be able to access this data from other servers. Furthermore, if we use the application’s memory as our state storage, we are doomed to lose all data that were bound to this server’s sessions in case of a server restart.

Our application doesn’t scale correctly

Another key point that we have to mention is, that sticky sessions don’t allow our application to scale correctly. When we bind a server to a specific request, we directly remove the ability to use our other servers’ capacity to fulfill consecutive requests. As a result, when we perform a CPU intensive task in the specific session, we force one instance of our infrastructure to handle all its load.

They can be dangerous.

Furthermore, binding sessions to specific servers can cause security concerns too. What if someone decides to create a session, and then perform a set of very CPU intensive requests on our application? See the example below.

The load balancer forwards each request to the server that the session is bounded to. This is called a DOS attack, and it greatly increases the server load. By using sticky sessions, an attacker can perform this operation with half the resources that would be required if we were not using them. That’s only true for the specific example. The bigger the infrastructure, the higher the chances that someone would bother to exploit this “vulnerability”. This would allow the attacker to take our servers down one by one, with a considerably lower cost. Also, it would require us to create extra monitoring rules to recognize this type of attack, because the total server load of our servers will not be that high when only one server is attacked.

Is there an alternative?

Yes, there is, and you should probably use it. An application can get rid of sticky sessions by using an external service for keeping its state. See the example below.

Each server performs actions, saving the process of each task or request to a shareable resource, like MySQL or Redis, making it available to all other servers of our infrastructure. This way, the load balancer forwards each request between all available servers, instead of just one. Also, servers can serve consecutive requests no matter which server served the previous one.

If you found this blog post useful, you can subscribe to my newsletter and get to know first about any new posts.

People vector created by pch.vector – www.freepik.com

Top comments (24)

Jebin • May 21 '20

How do we handle deployments? Say I have 4 servers. I have frontend assets with prefixed hash in each server. While deploying one server at a time, say p4, (I cannot upgrade all in one shot, downtime issue), the asset requests for multiple assets (a.js, b.js, c.js) is sent to all other servers(p1, p2, p3) which are not upgraded. This will result in 404. What is the right way to solve this?

George Koniaris • May 21 '20

Hey Jebin,

First of all, it's possible to upgrade all servers at once by building your frontend assets in a new folder and then switching your application to serve the new version in all servers at the same time. You can also solve this problem by using a CDN. If the changes in your frontend are incremental (not removing files mainly) it shouldn't cause a problem. The final solution would be to use canary deployments. This would require some kind of "stickiness" but you can limit these kinds of requests to frontend URLs only, and its something that you usually perform for a limited amount of time until you make sure that your deployment doesn't break anything.

Does this solution seem correct for your use case?

Jebin • May 22 '20

Thanks Mr Koniaris. Upgrading all servers will incur downtime, don't you think? I have docker containers. If I upgrade all servers at once with the new container, there will be a momentary downtime. It is undesirable. CDN is a viable solution but I wanted to see if we can solve without getting there. Also the changes are not incremental because webpack builds whole new set of assets with new hash prefixed files.

Do you recommend having stickiness only for assets when we don't use a CDN?

George Koniaris • May 22 '20

Hi again,

If you deploy the new docker containers along with the old ones, having the new ones in an idle state, and then switching your load balancer to the new containers the downtime will be almost zero, if not 100% zero. Keep in mind that this requires extra work to make it work correctly, but it also gives you the ability to easily rollback your frontend changes in case something goes awfully wrong, by just switching back to the old docker containers.

Now, if the changes are not incremental and each version contains major changes in javascript or the UI of the site, for example, you will have to have some kind of "stickiness". Although, you don't have to limit the new version to only one server. If you have 4 servers, you can balance the load of each version to 2 and 2 servers. So yes, it would require having stickiness. But you can avoid binding a user to only one server.

Jebin • May 22 '20

Load balancing is a fair point. Good one. Thanks again Mr. Koniaris

Charles Roth • May 20 '20

The article is a good basic intro, thanks -- certainly in the context of how to design a new app.

But a really LARGE app might need a really LARGE session, in which case keeping it in local memory and using sticky sessions can actually make sense. Sessions can still get stored to an external service (e.g. Redis) for failover if a server goes down. So "sticky" is not inherently bad, but one should definitely be aware of the pros and cons.

George Koniaris • May 20 '20

Hi Charles,

That's exactly what I am trying to explain in the article. There is always some "stickiness" when working with multiple server instances. The most common is deciding which server to use based on the client's location. The problem occurs when you treat server instances as the only place where data is persisted for the session. If you want to stick a request to a specific server for speed that's ok, but you have to make sure that your application keeps working as expected if it goes down for some reason. Maybe I should have picked another title containing the "application state" keyword inside it as well.

Charles Roth • May 20 '20

Fair enough, and point taken (and made in the original article).

I was just complaining about the "never" part of the title. :-)

George Koniaris • May 20 '20

I now see that the word "never" is a little bit absolute. In any case, your comment is well taken, along with every comment that provides constructive criticism. As I am not a professional writer, I try to express my opinions on these subjects as well as I can, and these kinds of comments make me express them in a better way in the next article.

Charles Roth • May 20 '20

You made me chuckle with "little bit absolute". Rather like "very unique". :-)

It's all good, and the article is very informative for people new to the question -- in fact I'm going to send it to one of my new minions, I mean, interns.

Kalinda Pride • May 21 '20 • Edited

Personally, I don't mind the occasional "always" or "never". Exceptions are normal, but an absolute word does a good job of communicating the point... and it also sparks conversations like this about what the exceptions could be!

Every important idea started out as "interesting but wrong", then got better through conversations.

...Well, maybe not every idea. ;)

Kalinda Pride • May 21 '20 • Edited

Thanks for the article, and congrats for making it into The Overflow newsletter where I found you!

"A really LARGE app might need a really LARGE session"... this got me thinking about my own situation. So here's a case study! Now that I've written it, our "sticky session" system is more like a distributed cache, but it bears some similarities. Thoughts?

I work on an app that serves procedurally generated content. When a user makes a request, we put together a video based on their history and request parameters, then we stream the video to them. Many videos get reused, but we put together a never-before-seen video for nearly half of the requests. We save a description of the video to a database so we can re-create it if needed, but we stream the video directly from the server instance that produces it. The streaming is, I believe, a sticky session.

We've considered copying the video to a shared S3 bucket and streaming it from there, but the initial lag from creating and copying the video would be too long. Next, we experimented with switching the source of the stream: start streaming from the server where the video is being created, but copy the video to S3 and switch to streaming from S3 once the video has been copied there. This wasn't actually better. We would have gotten performance and money gains from putting the video into a shared cache, and from freeing up disk space on the source server once the video was done copying, but these gains were fully offset by the increased CPU load from copying the file, the need for more S3 storage, and the extra network usage and complexity that we would have needed to pull this off.

Our setup suffers from the pitfalls of sticky sessions, but we've mitigated them somewhat.

When a server goes down, it interrupts all streams from that server. (Duh.) The user can get their video back, because we've saved the video description in our database, but it's kind of a pain. They have to request the same video again (which can take several clicks), then wait several seconds for us to re-create the video from its description.
Scaling could be better. When we change the number of servers, existing streams can continue uninterrupted, but any interruption (e.g. a flaky internet connection or redeploying a server) will result in the same annoying experience as a server going down, since the request will be redirected to a new server. Fortunately the app is just for recreational use. No one dies if we interrupt a video stream, and the interruptions from scaling are minor compared to the general flakiness of consumer internet.
DOS attacks are a concern, but our load balancing is opaque and random enough to make attacks on a single server unlikely. We balance the load modularly, based on the video description's ID in the database (e.g. multiples of 12 go to server 0, multiples of 12 plus 7 go to server 7). We don't expose the numerical ID, only a reversible hash, so from the point of view of an attacker (who hasn't cracked the hash), each request is assigned at random. If an attacker wanted to concentrate their DOS attack on server 0, they would either have to request the same video over and over (which is easy to counteract once we realize what's going on), or they'd have to know our hashing key and algorithm and know how many servers we have running so they could request the right video description IDs. They could try to request particularly CPU-intensive videos, but it wouldn't help much. The worst video description is only about 10 times as resource-intensive as normal, and the load balancing means that they can't force new video descriptions to be assigned to a particular server.

So, are these true "sticky sessions"? Is the way we've handled the sticky-session problems widely useful, or good for our use case only? (Or could we do better?)

George Koniaris • May 22 '20

Hey sciepsilon,

Thanks for providing a so detailed description!!! What you describe here is the same use case as having an open web socket connection. Of course, you are not going to renew the connection in every message just to avoid stickiness. Also, you can't have every video saved on every server. I haven't worked with video streaming, but a nice approach would be to have a master server for each video and at least one acting as a replica.

Based on your description, you are using AWS. I will make an assumption here, correct me if I am wrong. The assumption is that you use EBS to store your videos. So, one thing I would consider doing is having a mechanism to auto-mount a failed server's EBS storage (the place where you store videos) to a healthy server and use that server as the master one for these videos until the old one gets up again. This would require a good amount of DevOps so it's just a random idea by someone that doesn't know the internals of your web application.

Now, I don't consider streaming a video being a true "sticky session". The real problem would occur in case you didn't have a fallback mechanism (even if it takes a minute or two to rebuild the videos) in case of a server failure. I would really like to hear your opinion on this, as it's a special case I had never thought of.

Kalinda Pride • May 22 '20

I think you're right - my video-streaming example isn't really a sticky session. And yes, we're using EBS. :)

The "master and replica" idea is an interesting one. It's similar to copying the videos to S3 as they're created, but I'm assuming the replica would receive a copy of the request and generate its video independently, rather than receiving its video from the master. This would definitely increase reliability: when the master goes down or we redeploy to it, the replica can pick up right where it left off. With the right network architecture, I think we could even make the transition invisible, without the user having to make a new request or open a different socket connection.

Of course there's a cost too. Since we're generating each video twice, we would need double the amount of compute power for creating the videos. I don't think the tradeoff would be worth it for us, but it would be a clear win if we were a less error-tolerant application. For us, perhaps a hybrid solution is possible where we spin up replicas only during a deployment or when we're expecting trouble.

We've also taken a couple other steps to improve reliability that I didn't mention in my earlier comment. The biggest one is that we use dedicated servers for creating the videos, with all other tasks (including creating the video description) handled elsewhere. We deploy changes multiple times a day, but we only deploy to the video creation servers every few weeks, when there's a change that directly affects them. That separation, combined with the fact that our service is for recreation rather than, say, open-heart surgeries or high-speed trading, lets us be pretty lax about what happens during a deployment. :-P We also do some content caching with Cloudfront, but I don't think that really affects the issues we've been discussing.

I didn't know it was possible to mount a failed EBS server's storage to another server! I always assumed that a server was a physical machine with its CPU and storage in the same box. I don't think we'll actually do this, but I'd still like to learn more. Can I read about it somewhere?

George Koniaris • May 22 '20

Hi again!

I think that EBS storage can be mounted to another instance if it's first removed from the one that is mounted. I don't know what happens in the case that the server goes down, but if the failure is not in the EBS itself I think it can be mounted to the new one. Also, you don't have to generate the video in every replica, you can just use scp or rsync to clone the file to the server that you need. Of course that would double the cost of your EBS storage, but would greatly reduce the CPU load if you decided to use replicas. I think that this is the easiest way to keep replicas of your videos by just increasing the internal network load (as far as I know they have great internal networks so that wouldn't be a problem).

This is the first article that I bumped into, explaining how to mount EBS storage to a server.

devopscube.com/mount-ebs-volume-ec...

In the hypothetical case that you decide to mount the EBS storage to another server, you can also create a clone of the EBS itself, so you can mount the cloned version to the replica server. By doing this, when your master server goes up, the original EBS storage will still be mounted to the master server. Unfortunately, EBS is still priced even if the EBS volume is not mounted to an instance, so you may have to remove the cloned EBS after the problem has been resolved.

Personally, if I had to perform replication I would start by using rsync or scp as they are the easiest way and they don't require any extra devops.

Kalinda Pride • May 23 '20

Thanks!

John Mercier • May 6 '20

Interesting. I just read this yesterday which mentions these problems. The reasoning behind sticky sessions is explained here:

We noted the example code above is naïve as it doesn’t deal with the fact the session cache is not thread safe. Even this simple example is subject to race conditions that may result in the page count being incorrect. The solution to this problem is to use the traditional thread locks and synchronization features available in Java, but these features are only valid within a single JVM. This means we must ensure that client requests are always directed to a single Tomcat instance, which in turn means that only one Tomcat instance contains the single, authoritative copy of the session state, which can then ensure consistency via thread locks and synchronization. This is achieved with sticky sessions.

I think a lot of these decisions rely on how the application is written and how much performance is needed for a single session. When scaling there is a whole new level of race conditions to consider which could result in rewriting the entire application. I think sticky sessions can be a good balance to rewriting applications as long as a single session will not require scaling past one instance.

George Koniaris • May 6 '20

Hey John,

Thanks for commenting. I would not rewrite a piece of my application if it was already relying on the server's internal state, but I would make sure that I have some kind of rate limit on these endpoints to avoid server overloading. Now, for race conditions, you can use session storage like Redis and perform atomic operations on the values of the session. Have you ever found any case where it's not possible to use an async service, like Redis, for this job?

John Mercier • May 6 '20

No I think something like that would make sense.

Daniel Barwikowski • May 6 '20

Now you just moved bottleneck to the "sharable service".
If it goes down now, not only one server has problems, but all of them.

JWT solves one part of this issue and at the same time it creates another one:
-How do you invalidate token?

Do you keep white list of tokens?
Do you keep black list?

Cheers,
Daniel

George Koniaris • May 6 '20

Hi Daniel,

Thanks for commenting!! Sharable services can be distributed, like Redis, but I could not show this in the diagrams. What I try to explain is that it's not a good thing to keep your application's state locally. I think what you refer to is not sticky sessions, but sessions in general and what you say is correct. Sticky sessions can be, at least in theory, implemented with JSON web tokens too. Do I miss something?

davidhopkins • May 20 '20 • Edited

Perhaps JWT is getting slightly off-topic but Daniel raises a good point that you are really just kicking the problem down the road.

Yes you can make the shared service distributed but you are then literally back to where you started as your state is shared across multiple boxes.

And Yes of course mature products like Redis, and (distributed) MySQL go to extreme lengths to address your original concerns with things like replication/redundancy but it doesn't change the fact that they're essentially performing the same work as the sticky sessions model you started with.

It's a bit like saying don't store your valuables in a vault in the attic. Store them in them in a safety deposit box at the Bank instead. This advice is valid but the Bank is basically just using a bigger vault with lots more people looking after it. Also, it now takes you slightly longer to fetch your valuables when you want them and you probably have to pay a charge.

George Koniaris • May 20 '20

Hey David,

I see it more like storing a copy of your valuables in multiple stores (if that was possible in the real world :P), giving you the ability to retrieve them from any of these vaults in case one of them gets fire. Also, if the service that you keep your application's state goes down (for example your database) you are in big trouble either way. Of course, this risk is reduced in distributed services. There is no correct or wrong way to do something, every way has its own advantages and disadvantages. Based on my personal experience it's much easier for a server to go down for some reason, instead of your whole database or cache server (based on the assumption that if one of your instances goes down there is at least one instance that can be used as master).