This post was originally published in my blog.
Sticky sessions grant the ability to route incoming requests to a specific server, based on their ...
For further actions, you may consider blocking this person and/or reporting abuse
How do we handle deployments? Say I have 4 servers. I have frontend assets with prefixed hash in each server. While deploying one server at a time, say p4, (I cannot upgrade all in one shot, downtime issue), the asset requests for multiple assets (a.js, b.js, c.js) is sent to all other servers(p1, p2, p3) which are not upgraded. This will result in 404. What is the right way to solve this?
Hey Jebin,
First of all, it's possible to upgrade all servers at once by building your frontend assets in a new folder and then switching your application to serve the new version in all servers at the same time. You can also solve this problem by using a CDN. If the changes in your frontend are incremental (not removing files mainly) it shouldn't cause a problem. The final solution would be to use canary deployments. This would require some kind of "stickiness" but you can limit these kinds of requests to frontend URLs only, and its something that you usually perform for a limited amount of time until you make sure that your deployment doesn't break anything.
Does this solution seem correct for your use case?
Thanks Mr Koniaris. Upgrading all servers will incur downtime, don't you think? I have docker containers. If I upgrade all servers at once with the new container, there will be a momentary downtime. It is undesirable. CDN is a viable solution but I wanted to see if we can solve without getting there. Also the changes are not incremental because webpack builds whole new set of assets with new hash prefixed files.
Do you recommend having stickiness only for assets when we don't use a CDN?
Hi again,
If you deploy the new docker containers along with the old ones, having the new ones in an idle state, and then switching your load balancer to the new containers the downtime will be almost zero, if not 100% zero. Keep in mind that this requires extra work to make it work correctly, but it also gives you the ability to easily rollback your frontend changes in case something goes awfully wrong, by just switching back to the old docker containers.
Now, if the changes are not incremental and each version contains major changes in javascript or the UI of the site, for example, you will have to have some kind of "stickiness". Although, you don't have to limit the new version to only one server. If you have 4 servers, you can balance the load of each version to 2 and 2 servers. So yes, it would require having stickiness. But you can avoid binding a user to only one server.
Load balancing is a fair point. Good one. Thanks again Mr. Koniaris
The article is a good basic intro, thanks -- certainly in the context of how to design a new app.
But a really LARGE app might need a really LARGE session, in which case keeping it in local memory and using sticky sessions can actually make sense. Sessions can still get stored to an external service (e.g. Redis) for failover if a server goes down. So "sticky" is not inherently bad, but one should definitely be aware of the pros and cons.
Hi Charles,
That's exactly what I am trying to explain in the article. There is always some "stickiness" when working with multiple server instances. The most common is deciding which server to use based on the client's location. The problem occurs when you treat server instances as the only place where data is persisted for the session. If you want to stick a request to a specific server for speed that's ok, but you have to make sure that your application keeps working as expected if it goes down for some reason. Maybe I should have picked another title containing the "application state" keyword inside it as well.
Fair enough, and point taken (and made in the original article).
I was just complaining about the "never" part of the title. :-)
I now see that the word "never" is a little bit absolute. In any case, your comment is well taken, along with every comment that provides constructive criticism. As I am not a professional writer, I try to express my opinions on these subjects as well as I can, and these kinds of comments make me express them in a better way in the next article.
You made me chuckle with "little bit absolute". Rather like "very unique". :-)
It's all good, and the article is very informative for people new to the question -- in fact I'm going to send it to one of my new minions, I mean, interns.
Personally, I don't mind the occasional "always" or "never". Exceptions are normal, but an absolute word does a good job of communicating the point... and it also sparks conversations like this about what the exceptions could be!
Every important idea started out as "interesting but wrong", then got better through conversations.
...Well, maybe not every idea. ;)
Thanks for the article, and congrats for making it into The Overflow newsletter where I found you!
"A really LARGE app might need a really LARGE session"... this got me thinking about my own situation. So here's a case study! Now that I've written it, our "sticky session" system is more like a distributed cache, but it bears some similarities. Thoughts?
I work on an app that serves procedurally generated content. When a user makes a request, we put together a video based on their history and request parameters, then we stream the video to them. Many videos get reused, but we put together a never-before-seen video for nearly half of the requests. We save a description of the video to a database so we can re-create it if needed, but we stream the video directly from the server instance that produces it. The streaming is, I believe, a sticky session.
We've considered copying the video to a shared S3 bucket and streaming it from there, but the initial lag from creating and copying the video would be too long. Next, we experimented with switching the source of the stream: start streaming from the server where the video is being created, but copy the video to S3 and switch to streaming from S3 once the video has been copied there. This wasn't actually better. We would have gotten performance and money gains from putting the video into a shared cache, and from freeing up disk space on the source server once the video was done copying, but these gains were fully offset by the increased CPU load from copying the file, the need for more S3 storage, and the extra network usage and complexity that we would have needed to pull this off.
Our setup suffers from the pitfalls of sticky sessions, but we've mitigated them somewhat.
So, are these true "sticky sessions"? Is the way we've handled the sticky-session problems widely useful, or good for our use case only? (Or could we do better?)
Hey sciepsilon,
Thanks for providing a so detailed description!!! What you describe here is the same use case as having an open web socket connection. Of course, you are not going to renew the connection in every message just to avoid stickiness. Also, you can't have every video saved on every server. I haven't worked with video streaming, but a nice approach would be to have a master server for each video and at least one acting as a replica.
Based on your description, you are using AWS. I will make an assumption here, correct me if I am wrong. The assumption is that you use EBS to store your videos. So, one thing I would consider doing is having a mechanism to auto-mount a failed server's EBS storage (the place where you store videos) to a healthy server and use that server as the master one for these videos until the old one gets up again. This would require a good amount of DevOps so it's just a random idea by someone that doesn't know the internals of your web application.
Now, I don't consider streaming a video being a true "sticky session". The real problem would occur in case you didn't have a fallback mechanism (even if it takes a minute or two to rebuild the videos) in case of a server failure. I would really like to hear your opinion on this, as it's a special case I had never thought of.
I think you're right - my video-streaming example isn't really a sticky session. And yes, we're using EBS. :)
The "master and replica" idea is an interesting one. It's similar to copying the videos to S3 as they're created, but I'm assuming the replica would receive a copy of the request and generate its video independently, rather than receiving its video from the master. This would definitely increase reliability: when the master goes down or we redeploy to it, the replica can pick up right where it left off. With the right network architecture, I think we could even make the transition invisible, without the user having to make a new request or open a different socket connection.
Of course there's a cost too. Since we're generating each video twice, we would need double the amount of compute power for creating the videos. I don't think the tradeoff would be worth it for us, but it would be a clear win if we were a less error-tolerant application. For us, perhaps a hybrid solution is possible where we spin up replicas only during a deployment or when we're expecting trouble.
We've also taken a couple other steps to improve reliability that I didn't mention in my earlier comment. The biggest one is that we use dedicated servers for creating the videos, with all other tasks (including creating the video description) handled elsewhere. We deploy changes multiple times a day, but we only deploy to the video creation servers every few weeks, when there's a change that directly affects them. That separation, combined with the fact that our service is for recreation rather than, say, open-heart surgeries or high-speed trading, lets us be pretty lax about what happens during a deployment. :-P We also do some content caching with Cloudfront, but I don't think that really affects the issues we've been discussing.
I didn't know it was possible to mount a failed EBS server's storage to another server! I always assumed that a server was a physical machine with its CPU and storage in the same box. I don't think we'll actually do this, but I'd still like to learn more. Can I read about it somewhere?
Hi again!
I think that EBS storage can be mounted to another instance if it's first removed from the one that is mounted. I don't know what happens in the case that the server goes down, but if the failure is not in the EBS itself I think it can be mounted to the new one. Also, you don't have to generate the video in every replica, you can just use scp or rsync to clone the file to the server that you need. Of course that would double the cost of your EBS storage, but would greatly reduce the CPU load if you decided to use replicas. I think that this is the easiest way to keep replicas of your videos by just increasing the internal network load (as far as I know they have great internal networks so that wouldn't be a problem).
This is the first article that I bumped into, explaining how to mount EBS storage to a server.
devopscube.com/mount-ebs-volume-ec...
In the hypothetical case that you decide to mount the EBS storage to another server, you can also create a clone of the EBS itself, so you can mount the cloned version to the replica server. By doing this, when your master server goes up, the original EBS storage will still be mounted to the master server. Unfortunately, EBS is still priced even if the EBS volume is not mounted to an instance, so you may have to remove the cloned EBS after the problem has been resolved.
Personally, if I had to perform replication I would start by using rsync or scp as they are the easiest way and they don't require any extra devops.
Thanks!
Interesting. I just read this yesterday which mentions these problems. The reasoning behind sticky sessions is explained here:
I think a lot of these decisions rely on how the application is written and how much performance is needed for a single session. When scaling there is a whole new level of race conditions to consider which could result in rewriting the entire application. I think sticky sessions can be a good balance to rewriting applications as long as a single session will not require scaling past one instance.
Hey John,
Thanks for commenting. I would not rewrite a piece of my application if it was already relying on the server's internal state, but I would make sure that I have some kind of rate limit on these endpoints to avoid server overloading. Now, for race conditions, you can use session storage like Redis and perform atomic operations on the values of the session. Have you ever found any case where it's not possible to use an async service, like Redis, for this job?
No I think something like that would make sense.
Now you just moved bottleneck to the "sharable service".
If it goes down now, not only one server has problems, but all of them.
JWT solves one part of this issue and at the same time it creates another one:
-How do you invalidate token?
Do you keep white list of tokens?
Do you keep black list?
Cheers,
Daniel
Hi Daniel,
Thanks for commenting!! Sharable services can be distributed, like Redis, but I could not show this in the diagrams. What I try to explain is that it's not a good thing to keep your application's state locally. I think what you refer to is not sticky sessions, but sessions in general and what you say is correct. Sticky sessions can be, at least in theory, implemented with JSON web tokens too. Do I miss something?
Perhaps JWT is getting slightly off-topic but Daniel raises a good point that you are really just kicking the problem down the road.
Yes you can make the shared service distributed but you are then literally back to where you started as your state is shared across multiple boxes.
And Yes of course mature products like Redis, and (distributed) MySQL go to extreme lengths to address your original concerns with things like replication/redundancy but it doesn't change the fact that they're essentially performing the same work as the sticky sessions model you started with.
It's a bit like saying don't store your valuables in a vault in the attic. Store them in them in a safety deposit box at the Bank instead. This advice is valid but the Bank is basically just using a bigger vault with lots more people looking after it. Also, it now takes you slightly longer to fetch your valuables when you want them and you probably have to pay a charge.
Hey David,
I see it more like storing a copy of your valuables in multiple stores (if that was possible in the real world :P), giving you the ability to retrieve them from any of these vaults in case one of them gets fire. Also, if the service that you keep your application's state goes down (for example your database) you are in big trouble either way. Of course, this risk is reduced in distributed services. There is no correct or wrong way to do something, every way has its own advantages and disadvantages. Based on my personal experience it's much easier for a server to go down for some reason, instead of your whole database or cache server (based on the assumption that if one of your instances goes down there is at least one instance that can be used as master).
Excellent post - Thanks!