Building networks from A to Z - Part 6 : Transporting data

#computerscience #beginners #networks

This post is the part 6 of the "Building networks from A to Z" series, I would really recommend you to go read the 5 other parts if you haven't, you would not understand all concepts explained here if not. Thank you !

We are now talking about OSI model layer 4, which is called Transport. On the TCP/IP model, most known by developers, Transport is the 3rd layer, and is also very important to the programming world, for reasons we will explain later.

From this layer comes the very known term "port", which is a logical address used by your system to know to which program the distant computer wants to talk.

Two protocols are really used in this layer, which are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). There are main differences between the two of them, that I will explain later, let's first speak about computer ports.

In every Operating System, the port system is a really important notion that is defined by the network, meaning that you will have no major differences between the different OS.

Ports addressing

Ports are assigned by the IANA (Internet Assigned Numbers Authority), that also assigns IP addresses to companies around the world. It is a branch of the wider organization of the ICANN (Internet Corporation for Assigned Names and Numbers), that regulates the worldwide Internet. They are also responsible for the Domain Name System root servers, which will be introduced in the next part.

We have different ranges of ports, explained below :

Ports 0 to 1023 are the well-known or reserved ports, which are assigned to specific programs or protocols by IANA. Among them, you have :
- 80 and 443 ports reserved to, respectively, HTTP and HTTPS. They serve the web.
- The 53 port is widely used by the name-resolution protocol, called DNS. It is used to perform queries.
- Ports 25/465 and 143/993 are used, respectively, for SMTP/SMTPS and IMAP/IMAPS. Both of these protocols are essential to e-mail communication.
- Port 21 is used for FTP control (File Transfer Protocol). For FTP control over SSL/TLS, port 990 is used.
- Port 22 is used by SSH (Secured SHell), this protocol allows you to run a distant shell to a machine that hosts a SSH server.
Ports 1024 to 49151 are user ports, they can be used freely by the users without risking conflicts with other programs. 8080 is the unofficial admitted alternative port to HTTP when programming websites locally, also, Minecraft servers often use the 25565 port to establish connections.
Ports 49152 to 65535 are dynamic ports that are used by programs to create sockets (or connections), they are almost always by the client-side program to connect to a server. They are dynamically attributed and tend to change a lot, they are not stable so you shouldn't use them manually !

As you can see, ports are encoded on 16 bits (2^16 - 1 = 65535). They are unique to all TCP protocols, so UDP and TCP have to share them.

Data transport

The main principle of the Transport layer consists in a router inside your computer that will distribute data to concerned programs :

Layer 4 communication — Two programs communicating with each other over the local network

Please note that the "router" seen here is only used as demonstration purposes. It has not the same features than a Layer 3 router, for example, the source port is not changed when going through the outgoing router. Its only role is to find the program associated with the destination port.

Transport reliability and connection

The main difference between UDP and TCP is what we call the "connection". On the transport layer, a connection is a mechanism that will ensure that the two hosts are able to communicate by making sure they receive the messages each time they are supposed to. It seems quite difficult, so I will explain the whole connection mechanism over a diagram.

Connection establishment — TCP connection establishment mechanism

The host willing to communicate over TCP sends a "SYN" (synchronize) packet over the network. When the receiver receives it, it sends a "SYN-ACK" packet to the first host (synchronize, acknowledgement) which sends an "ACK" packet to the second host.

This mechanism allows for both hosts to be sure that their packets will be received and that the network is reliable. Internet Protocol DOES NOT provide any reliability at all, which means that TCP is the only responsible of this feature. Lower layers ensure that the data is electronically emitted and that the data integrity is preserved, but they will not ensure the IP packets aren't lost, which TCP monitors.

When the two hosts are 100% sure that the data can be safely transmitted, then the program data is sent over the network.

TCP also is responsible of the effective data bandwidth over IP, which means that it has to regulate the bandwidth depending of the network speed. For that aspect come different mechanisms that rely on the ACK duplications (when a lot of ACK messages are duplicated, it means the data is well received and the network is reliable) to adapt the bandwidth. The different mechanisms are Reno, New Reno, tahoe, Vegas and are called "congestion control". As they are a very specific aspect of networking, I will not discuss the details of theses mechanisms on this series but we can talk about them in the comments.

For TCP to be sure that the data that is sent is well received, it uses a "serial number" that is assigned to each and every packet sent over the network. The serial number is incremented by the data length received. For example, when a data packet with a serial number 1000 is sent with a length of 1500 bytes, the ACK message that will come after has to be 2500. When an ACK is received, the serial number used in the connection is incremented by 1, when receiving SYN messages, by 10.

By this simple operation, a lost packet will result by an incoherence of serial numbers between the two hosts, which will cause a retransmission of the lost packet(s).

TCP or UDP for my application ?

TCP is THE big chunk of the Transport layer, and used when data is crucial to the applications (and so to the users). But UDP is used really often, quite as often as TCP.

By its features, TCP is used for "signalling", which means establishing the connection between two hosts, then UDP is used in paralled to stream data. UDP ensures no reliability of data over the network, which means that it is sent with no ACK needed nor confirmation whatsoever.

Even if it lacks a big feature of reliability, UDP is used when speed is required over reliability. Example applications can be video streaming or even phone communications (paired with a TCP connection for the signalling).

Now you have the basics of the two big protocols of the Transport layer, which are no mystery to you anymore.

On the next article, we will begin talking about applications, but I have to briefly discuss the layers 5/6, especially for SSL/TLS, which is used everywhere (fortunately).

See you soon :)