This article was originally posted on Breadth a weekly newsletter aimed at helping you increasing your technical (and non-technical) knowledge to help you become a more effective engineer
TCP / IP - the real MVP of the internet.
Nearly every website/application uses them.
It is literally the fundamental building blocks that pretty much powers everything, yet many developers who build upon these foundations have no idea how they work.
And you know what? That can be ok - the best part about it being around so long is that it has proven to be a trustworthy and semi-reliable set of protocols.
You don't have to know in-depth how it works, but it is good have a basic foundational knowledge.
Let's dive in.
Firstly, let's clear something up.
TCP / IP isn't a single thing - it's two separate protocols. Well, sort of. Really, it's one protocol built on top of the other. IP is the base level protocol. It stands for Internet Protocol.
Aptly put by Cloudflare:
The Internet Protocol (IP) is a protocol, or set of rules, for routing and addressing packets of data so that they can travel across networks and arrive at the correct destination.
Essentially, it is pretty much saying that it's a set of rules that define how a piece of data (packet) is sent across the internet.
Now you might be thinking - is that where an "IP Address" comes from? And you'd be right.
IP protocol basically tells computers how to get a packet of data from one IP address to another.
However, it isn't quite that simple.
To illustrate. Let's say, for example, you wanted to send a letter to your friend Bob.
Now Bob lives on the other side of the country, and your mail system can only allow letters 5cm by 5cm to be delivered.
Unfortunately, the letter you've written is much larger. Hence, you decide to cut up the letter into three parts and mail each piece individually.
Now when you send your letter, it gets sent with three different mailmen.
Mailmen 1 - has a few other deliveries to do on the way, so it takes him a bit longer to reach your friend.
Mailmen 2 - can take the letter straight there.
Mailmen 3 - unfortunately, get's into an accident and so the third part of the letter doesn't make it.
Now Bob has got 2 parts to a 3 part letter. Even worse, Bob isn't the brightest of the lot and can't figure out how to put pieces 1 and 2 back together in the right order. Since part 2 arrived before part 1, he reads in that order.
So here the IP is what tells you how to get a letter from A -> B. It says that you must use a mailman, you must give the mailman an address and you must only send small pieces of data at a time. You need to trust that they will get there time.
However, as we have trust seen, this isn't always that reliable. In fact, it's turned into a bit of a mess.
To illustrate the above example using actual packets (pieces of a letter), imagine you want to send some data from LA -> Melbourne.
One packet might get routed from LA -> LONDON -> MELBOURNE.
Another might got straight from LA -> MELBOURNE.
And the last packet might get dropped along the way.
Clearly I struggle with drawing countries.
Now because of this, if the receiving server attempts to read the packets as they come in, they will get B -> A and no C.
Basically, they will get rubbish and have no idea what it means.
Because of this, we call IP an unreliable protocol. It will try to get your data from A -> B as best it can. Still, it makes no guarantees about the order or deliverability of those packets.
So how does the internet work at all? How do we make this unreliable protocol, reliable?
TCP (Transmission Control Protocol) is a protocol built on top of the IP protocol (yes, I am aware that's the same as saying ATM machine, no, I am not changing).
It attempts (and does a reasonably solid job) at making IP reliable.
So how does TCP/IP work?
Well, let's go back to the letter example.
So to start with, before we actually even send the data, with TCP, we have to open a connection. We do this to tell the receiver we are going to send some data.
So in our example, instead of just sending the letter with our message, we instead send him a letter telling him we are going to send a letter. Bit meta right. We also ask him to send a confirmation that he got this letter.
This tells us a few things. If Bob sends us a confirmation back:
- We know that his address was correct
- We know his mailbox works and he can receive letters.
- We know he has the basic writing skills of a 5 year old.
Now if Bob doesn’t reply we know:
- He can’t receive letters or his address was wrong.
- We shouldn’t bother sending him our real letter because he probably won’t receive it.
- Or he just doesn’t want to hear from us.
This lets Bob know how to put the letters back in the same order as they were sent. Now it doesn't matter how slow the letters were or which order they arrived, Bob would still be able to reconstruct the original letter.
This solves one problem. We still have to figure out a way to handle the case of missing letters.
To do this, we also tell Bob that every time he receives a letter, he again needs to send us a letter back confirming what number-letter he received.
Now from when you send a letter, you start a timer. You know it will take a maximum of 3 days to send a letter to Bob and for him to send one back.
So if after three days, you don't receive a letter from Bob confirming he received it, then you will resend that part of the letter again.
That's the general gist. Obviously, there is much more that goes on in the internals. Still, for now, you and Bob have a working system for delivering messages.
The first part of any TCP connection is the handshake.
With a TCP connection, this handshake is comprised of three distinct parts.
Syn (short for synchronize)
Syn-Ack(short for Acknowledgement)
Because it's three parts, this is often referred to as a three-way handshake.
Essentially what goes on during the handshake is synchronization of sequence numbers (syn) and a set of acknowledgments (ack). This lets the other party know how to arrange the packets. It will also allow it to know if there is a packet missing.
For example, if you open a connection to a server and tell it that your sequence number starts at 123. When it acknowledges with 124, you can know that the server has received all packets up to, but not including, packet 124.
The server will also send you it's synchronization number, for example, 432. Now to let the server know you have received all messages up to 432, you send and ack with 433.
Now that everyone is on the same page, the transmission of data can begin.
Now the client and server can communicate, using seq numbers and acknowledgments, to transmit packets reliably.
The sequence numbers aren't actually incremented by 1 each time, however, and instead by the number of bytes being sent. This adds another level of reliability. Now not only do you know the order of the packets being received, but you can also tell if any bytes are missing.
Of course, there are still going to be many issues along the way. Still, there is a reasonably robust framework in place for dealing with and mitigating a lot of these errors.
Most likely, you are already using TCP/IP.
TCP is the best used where reliability is needed. Where, if any missing pieces of data can have a negative effect on the experience of the client.
For example, if you missed 10% of packets when loading a webpage - then chances are you are going to receive a hot mess, and the page will fail to render.
However, there are cases where reliability and error checking isn't really vital. In fact, it can be detrimental.
Gaming and Video streaming are two areas where TCP isn't actually the best choice. Instead, UDP is a much better option because UDP has a stronger focus on speed then reliability. Ensuring every single packet in a video call is received is nowhere near as important as making sure the participants can see and hear each other.
Does it really matter if 5 pixels in the corner are missing for 1 frame? Will anyone really notice? Most likely not. Here speed is king, and a few dropped packets aren't going to be a big deal.
Hopefully, you now know a bit more about TCP then before you started reading. If you want to dive deeper and not only increase your breadth but also your depth then check out the below resources 👇
This weeks puzzle is based around the Collatz conjecture.
The Collatz conjecture is quite simple. It states that:
Given a number (n):
If (n) is === 1 👉 return 1 and finish
If (n) is even 👉 return n / 2
If (n) is odd 👉 return 3n + 1
This simple set of rules will always return 1 (well maybe not always but it hasn’t been proven otherwise).
The path from N -> 1 is called the sequence. E.g.
n = 20
20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
So if n = 20, the sequence is 8 numbers long.
Your challenge is to find the longest sequence where n < 1,000,000.
Please let me know if you come up with a solution in by responding to this tweet:
Breadth@getbreadthOur first article is out 💯 This one explains what TCP/IP is and why we use it to power well... nearly everything.
Check it out here 👇
We also have first puzzle at the bottom - this ones to do with the Collatz Conjecture #100DaysOfCode #programming19:16 PM - 25 May 2020
Alternatively if you liked this article and want to show some love then above tweet is the place to do it ☝️
If you want to subscribe to Breadth and get posts the like above straight to your mail box - then click here.