ZeeshanAli-0704

Posted on Aug 26, 2023 • Updated on Sep 25, 2023

HTTP Cookie in browser

#javascript

A HTTP cookie is a small piece of data stored on a browser that’s created either by client-side JavaScript or by a server during an HTTP request. The browser can then send that cookie back with requests to the same server and/or let the client-side JavaScript of the webpage access the cookie when a user revisits the page.

A cookie is just one or more pieces of information stored as text strings on your machine. A cookie is sent to your browser by a Web server via an HTTP request. The browser then returns the cookie to the server the next time the page is referenced.

Specific cookies are used to identify specific users and improve their web browsing experience. The data stored in a cookie is created by the server upon your connection. This data is labelled with an ID unique to you and your computer. When the cookie is exchanged between your computer and the network server, the server reads the ID and knows what information to specifically serve you.

At client side on Front end:

We can access cookies via the document property cookie.

document.cookie="country=India"

By default, cookie will take expiration time as session time, which means it will be stored until the session is active. If the current browsing session ends, the cookie expires. (also known as session cookies).

But we can also set the expiry time for cookies using expires or max-age (known as persistent cookies).


// persistent cookies

var expires = (new Date(Date.now()+ 86400*1000)). toUTCString();

document.cookie = "cookieName=cookieValue; expires=" + expires + ";path=/;"

document.cookie = "cookieName=cookieValue; max-age=86400; path=/;"

Note: A cookie with a negative or zero age expires immediately.

At the server side :

The server can also access cookies, can read cookies, and can also modify them.

With the help of the Set-Cookie Response Header and cookie in the Request Header.(explained below)

When a server responds to a client's HTTP request, request not only the requested resource but also HTTP headers describing how the server handled that request. One such header is the Set-Cookie response header, which instructs the user agent to store a cookie.

The response may contain as many Set-Cookie headers as needed, one for each cookie that should be set. Below is a sample HTTP response message instructing the client to set three cookies:

HTTP/2 200

Content-Type: text/html; charset=UTF-8

Set-Cookie: nameA=valueA; Domain=example.com; HTTPOnly

Set-Cookie: nameB=valueB; Domain=example.com; Max-Age=10

Set-Cookie: nameC=valueC; Domain=example.com; Secure

Also, the server can read the cookie passed via http calls in the request header cookie.

var cookies = request.headers.cookie;

console.log(cookies);
// will give you cookies passed from client side to server 
// side in the request header.

Cookie-Attribute:

Domain attribute:
Tells a browser which hosts are allowed to access a cookie. If unspecified, it defaults to the same host that set the cookie.

So, when accessing a cookie using client-side JavaScript, only the cookies that have the same domain as the one in the URL bar are accessible.

Path attribute

The Path attribute specifies the path in the request URL that must be present to access the cookie.

Expires attribute

The Expires attribute sets an expiration date when cookies are destroyed. This can come in handy when you are using a cookie to check if the user saw an interstitial ad; you can set the cookie to expire in a month so the ad can show again after a month.

Secure attribute

A cookie with the Secure attribute only sends data to the server over the secure HTTPS protocol and never over the HTTP protocol (except on localhost).

HttpOnly attribute

This attribute, as the name probably suggests, allows cookies to be only accessible via the server. So, only the server can set them via the response headers. If they are sent to the server with every subsequent request’s headers, they won’t be accessible via client-side JavaScript.

This can partially help secure cookies with sensitive information.

If the cookie was set with the http-only flag, you can't read it using JavaScript; this is a security measure to prevent session hijacking and should be set for any surrogate identifier, including session cookies.

SameSite attribute

Used to control whether a cookie should be sent in cross-site requests (e.g., if Site B sends a request to Site A).

Web Storage APIs

Also, with the modern web, we got the new Web Storage APIs (localStorage and sessionStorage) for client-side storage, which allow browsers to store client-side data in the form of key-value pairs.

Caching Locations
Web cache can be shared or private depending upon the location where it exists. Here is the list of different caching locations

Browser Cache
Proxy Cache
Reverse Proxy Cache

1) Browser Cache

You might have noticed that when you click the back button in your browser it takes less time to load the page than the time that it took during the first load; this is the browser cache in play. Browser cache is the most common location for caching and browsers usually reserve some space for it.

A browser cache is limited to just one user and unlike other caches, it can store the “private” responses. More on it later.

2) Proxy Cache
Unlike browser cache which serves a single user, proxy caches may serve hundreds of different users accessing the same content. They are usually implemented on a broader level by ISPs or any other independent entities for example.

3) Reverse Proxy Cache
A Reverse proxy cache or surrogate cache is implemented close to the origin servers in order to reduce the load on the server. Unlike proxy caches which are implemented by ISPs etc to reduce the bandwidth usage in a network, surrogates or reverse proxy caches are implemented near the origin servers by the server administrators to reduce the load on the server.

Although you can control the reverse proxy caches (since it is implemented by you on your server) you can not avoid or control browser and proxy caches. And if your website is not configured to use these caches properly, it will still be cached using whatever defaults are set on these caches.

Caching Headers

So, how do we control the web cache? Whenever the server emits some response, it is accompanied by some HTTP headers to guide the caches on whether and how to cache this response. The content provider is the one that has to make sure to return proper HTTP headers to force the caches on how to cache the content.

Introduction

Expires
Pragma
Cache-Control
private
public
no-store
no-cache
max-age: seconds
s-maxage: seconds
must-revalidate
proxy-revalidate
Mixing Values
Validators
ETag
Last-Modified

Expires

Before HTTP/1.1 and the introduction of Cache-Control, there was an Expires header which is simply a timestamp telling the caches how long should some content be considered fresh. A possible value to this header is the absolute expiry date; where a date has to be in GMT. Below is the sample header

Expires: Mon, 13 Mar 2017 12:22:00 GMT

It should be noted that the date cannot be more than a year and if the date format is wrong, the content will be considered stale. Also, the clock on the cache has to be in sync with the clock on the server, otherwise, the desired results might not be achieved.

Although the Expires header is still valid and is supported widely by the caches, preference should be given to HTTP/1.1 successor of it i.e. Cache-Control.

Pragma

Another one from the old, pre HTTP/1.1 days, is Pragma. Everything that it could do is now possible using the cache-control header given below. However, one thing I would like to point out about it is, that you might see Pragma: no-cache being used here and there in hopes of stopping the response from being cached. It might not necessarily work; as HTTP specification discusses it in the request headers and there is no mention of it in the response headers. Rather Cache-Control header should be used to control the caching.

Cache-Control

Cache-Control specifies how long and in what manner should the content be cached. This family of headers was introduced in HTTP/1.1 to overcome the limitations of the Expires header.

Value for the Cache-Control header is composite i.e. it can have multiple directive/values. Let’s look at the possible values that this header may contain.

private

Setting the cache to private means that the content will not be cached in any of the proxies and it will only be cached by the client (i.e. browser)

Cache-Control: private

Having said that, don’t let it fool you into thinking that setting this header will make your data any secure; you still have to use SSL for that purpose.

public

If set to public, apart from being cached by the client, it can also be cached by the proxies; serving many other users

Cache-Control: public

no-store

no-store : specifies that the content is not to be cached by any of the caches

Cache-Control: no-store

no-cache

no-cache indicates that the cache can be maintained but the cached content is to be re-validated (using ETag for example) from the server before being served. That is, there is still a request to server but for validation and not to download the cached content.

Cache-Control: max-age=3600, no-cache, public

max-age: seconds

max-age specifies the number of seconds for which the content will be cached. For example, if the cache-control looks like below:

Cache-Control: max-age=3600, public

it would mean that the content is publicly cacheable and will be considered stale after 60 minutes

s-maxage: seconds

s-maxage here s- prefix stands for shared. This directive specifically targets the shared caches. Like max-age it also gets the number of seconds for which something is to be cached. If present, it will override max-age and expires headers for shared caching.

Cache-Control: s-maxage=3600, public

must-revalidate

must-revalidate it might happen sometimes that if you have network problems and the content cannot be retrieved from the server, the browser may serve stale content without validation. must-revalidate avoids that. If this directive is present, it means that stale content cannot be served in any case and the data must be re-validated from the server before serving.

Cache-Control: max-age=3600, public, must-revalidate

proxy-revalidate

proxy-revalidate is similar to must-revalidate but it specifies the same for shared or proxy caches. In other words proxy-revalidate is to must-revalidate as s-maxage is to max-age. But why did they not call it s-revalidate?. I have no idea why, if you have any clue please leave a comment below.

Mixing Values
You can combine these directives in different ways to achieve different caching behaviors, however no-cache/no-store and public/private are mutually exclusive.

If you specify both no-store and no-cache, no-store will be given precedence over no-cache.

; If specified both Cache-Control: no-store, no-cache ; Below will be considered Cache-Control: no-store

For private/public, for any unauthenticated requests cache is considered public and for any authenticated ones cache is considered private.

Validators

Up until now we only discussed how the content is cached and how long the cached content is to be considered fresh but we did not discuss how the client does the validation from the server. Below we discuss the headers used for this purpose.

ETag

Etag or “entity tag” was introduced in HTTP/1.1 specs. Etag is just a unique identifier that the server attaches with some resource. This ETag is later on used by the client to make conditional HTTP requests stating "give me this resource if ETag is not same as the ETag that I have" and the content is downloaded only if the etags do not match.

Method by which ETag is generated is not specified in the HTTP docs and usually some collision-resistant hash function is used to assign etags to each version of a resource. There could be two types of etags i.e. strong and weak

ETag: "j82j8232ha7sdh0q2882" - Strong Etag ETag: W/"j82j8232ha7sdh0q2882" - Weak Etag (prefixed withW/)

A strong validating ETag means that two resources are exactly same and there is no difference between them at all. While a weak ETag means that two resources although not strictly the same but could be considered the same. Weak etags might be useful for dynamic content, for example.

Now you know what etags are but how does the browser make this request? by making a request to server while sending the available Etag in If-None-Match header.

Consider the scenario, you opened a web page which loaded a logo image with caching period of 60 seconds and ETag of abc123xyz. After about 30 minutes you reload the page, browser will notice that the logo which was fresh for 60 seconds is now stale; it will trigger a request to server, sending the ETag of the stale logo image in if-none-match header

If-None-Match: "abc123xyz"

Server will then compare this ETag with the ETag of the current version of resource. If both etags are matched, server will send back the response of 304 Not Modified which will tell the client that the copy that it has is still good and it will be considered fresh for another 60 seconds. If both the etags do not match i.e. the logo has likely changed and client will be sent the new logo which it will use to replace the stale logo that it has.

Last-Modified

Server might include the Last-Modified header indicating the date and time at which some content was last modified on.

Last-Modified: Wed, 15 Mar 2017 12:30:26 GMT
When the content gets stale, client will make a conditional request including the last modified date that it has inside the header called If-Modified-Since to server to get the updated Last-Modified date; if it matches the date that the client has, Last-Modified date for the content is updated to be considered fresh for another n seconds. If the received Last-Modified date does not match the one that the client has, content is reloaded from the server and replaced with the content that client has.

If-Modified-Since: Wed, 15 Mar 2017 12:30:26 GMT

You might be questioning now, what if the cached content has both the Last-Modified and ETag assigned to it? Well, in that case both are to be used i.e. there will not be any re-downloading of the resource if and only if ETag matches the newly retrieved one and so does the Last-Modified date. If either the ETag does not match or the Last-Modified is greater than the one from the server, content has to be downloaded again.

DEV Community