Whenever we send a request to a website like Unsplash, our browser sends a request to a DNS server, this server responds to our browser with an IP address, this IP address points to a load balancing server owned by Unsplash. When our request hits the server, our session is immediately encrypted with the SSL certificate. The load balancer forwards our request to the freest server it can find in Unsplash’s pool of servers. The web server then reads our request and consults the database or the application server where appropriate in order to generate a response which is then sent back to our browser and we can then see the Unsplash page.
If you have ever wondered what really happens when we type a url in our browser and press enter, then you are in luck because you are about to become a digital surgeon. Join me, let’s dissect the web. Spatula please!
From the foundations of the earth, computers and computing has always been a merry game of numbers. This, as far as I know, is the only thing that is set in stone. Even Thanos cannot change that. So how exactly do we crunch numbers that give us websites like Youtube and Dev.to?
To understand that, you must first understand that not all computers are the same. If I decide to classify computers based on the type of work that they do, I can say that some of them are SERVERS while others are CLIENTS. A server is a special type of computer that manages resources.
Think of it this way. I work at a large company and in the office we have only one printer. Whenever I have something to print and go down to the printing room I will most likely meet someone there. It rarely ever happens that I am the first down there. On a particular day, I wanted to print just one page of a document and when I got there I met someone from advertising who was printing 1000 pages of fliers. You will find that waiting for someone to print 1000 pages before you print just a page is not a pleasant experience. On another day, I had a squabble with a colleague who claimed she was there before me. The printer easily became a source of conflict in the office. Then the company decided to hire someone to control printer usage. Now whenever I want to print something, I would send a request to this printer receptionist, she handles it and gets my documents to me. If I was a computer, this printer receptionist is my server and I am her client. A server manages a resource, the printer receptionist managed the office printer.
Great! Now we know what servers are. Each server, like a house, has an address. And this address is a number. All numbers. It looks sort of like this - 184.108.40.206. The people living in that house are the files that you need to access. It could be any type of file, from the video files on Youtube to the pictures on Unsplash. If I was communicating with a friend via email, I have to know his email address and if he wanted to send a message back to me, he needs to know mine. Same goes for computers. If I wanted to get pictures from Unsplash, I need to know the address of the server and for Unsplash to reply me with the pictures I need, Unsplash server needs to know my computer address. Yes, your computer has an address too. This address is known as IP address and what I just described is a very basic version of the popular request and response pattern. You are now familiar with a lot of things, well done. But that’s not all, there was a problem with this pattern.
A lot of intelligent computer people found that human beings would rarely remember the addresses of the servers they wanted to use so they came up with some sort of phonebook for the web. So humans can look up say the address of Bing or Youtube. This phonebook is called a Domain Name System or DNS. This is the first step. When you type a url in your browser and press enter, the browser immediately goes to DNS to check what address this url translates to and then sends a request to that address.
If you are still here congratulations, I feel we are now close enough for me to tell you my life secrets. I might have told a bit of a lie, it’s a very small lie. I may have led you to believe that one server stores the data that the client requests and handles everything involved in sending a response. This is not true. Whenever you interact with the web you interact with whole systems of computers, there is a lot of software and hardware engineering magic that happens under the hood. Some of which I will reveal to you now because we are friends.
This first one may come as a shocker but stay with me, I promise I won’t lead you astray again. A server can be either software or hardware. When it is hardware, then it is a physical computer somewhere, but when it is a software then it is a program that helps to “manage and serve” some content. Examples are Apache and Nginx. Another twist to this story, is that there are different types of servers. Apache and Nginx are web servers. There are also application servers, database servers, mail servers, and so much more. From the name of the server you can already guess what resource it manages. For this scope of this operation, I will only tamper with your knowledge on web servers, application servers and database servers. I will leave your other organs as they are.
Web servers help us manage access to static files. There are some websites we term as static and others we term as dynamic. This is the basic distinction between the two, static sites are those sites you go to and you see the same thing that anybody who goes to the site will see. There is no unique feed for you as a user. Dynamic sites do just that, when you go to a dynamic site, you have content that is user-centric, tailored to just you. Anybody who goes to that site wouldn’t see the same kind of things you would see. E.g on Facebook, my timeline is based on the type of friends I keep. We do not all have the same friends so we would all see different things on our Facebook timelines. Now those files that are accessed on a static website are called static files, they do not change. A web server handles just that.
Application server on the other hand helps us to apply logic to our user/client requests. For example, if somebody posts something on Facebook, I cannot delete that post because I do not have the permissions to. If somehow by witchcraft, I send a request to Facebook to delete my friends post, the application server intercepts the request and checks if I am the one who made the post in question. It does the necessary extra checks and draws a conclusion that it will not allow me to delete my friends post. This is the work of the application server. They help implement the business logic that makes up dynamic websites. So you rest assured that Youtube has an application server since it is a dynamic site. A site can have both a web server and application server. If the web server receives a request it cannot handle it typically passes it on to the application server to handle it.
A database server just as the name implies, manages data resources. These data assets could be our user details like username, password, emails, etc, it could also be data to keep track of files that the user has uploaded or even metrics about usage. It could be any type of data. All these work together to make our websites like Youtube and Unsplash possible.
Now we have entered the second step, when we type a url and press enter, the request goes to the DNS server, comes back to our browser as an IP address and then from our browser it goes out to a web server at that particular address which may communicate with an application server and even a database before getting us back a response. If you understand this, you have come a long way, but there is still yet another problem to solve.
Imagine we have a lot of users like say 10 million users. If each of them sends a request to our server, at some point our server will reach its limit and will crash. Our service or website will then be seen as unavailable or even worse, unreliable. In order to prevent this we replicate our service across many servers sometimes tens of thousands of servers to handle the load of these requests. We also connect all these servers with another server called a load balancer. The work of this load balancer is to direct requests to the servers that it deems the freest. It uses a kind of algorithm known as a load balancing algorithm to determine which of our servers are free at a point in time and then forwards the clients request to them. The next step is that we need to secure our pool of servers now so that requests can only come in through the load balancer. We do not want anyone directly accessing any of our servers simply because they know the IP address. To improve the security, we secure the connection that our servers have with our clients by encrypting requests with an SSL certificate. This way the communication between our server and client cannot be leaked and even if it is, it is hardly useful to third parties.
This is the third and final step as regards this article. Whenever we send a request to a website like Unsplash, our browser sends a request to a DNS server, this server responds to our browser with an IP address, this IP address points to a load balancing server owned by Unsplash. When our request hits the server, our session is immediately encrypted with the SSL certificate. The load balancer forwards our request to the freest server it can find in Unsplash’s pool of servers. The web server then reads our request and consults the database or the application server where appropriate in order to generate a response which is then sent back to our browser and we can then see the Unsplash page. All these processes happen in split seconds and as the years go by it becomes more and more difficult to appreciate the wonder of the internet we have in this age.
What makes the request and response cycle work so smoothly is a protocol called the TCP/IP protocol. IP takes care of the addressing system while TCP makes sure that the right packages/requests reach the right addresses. Thank you so much for reading this far, we have come to the end of our operation. Till next time.