Almost everyone of us has used torrent at least once either to download games or movies or series, but ever wondered what is happening behind the scenes? Why sometimes the download gets stuck at 99%? Why does it use more data than the indicated download size?
Lets try to understand that today -
So what is a torrent?
It is a file sharing protocol that enables efficient and decentralized distribution of files over the internet.
The term 'decentralized distribution' is a very important part of how torrent works, so keep it in mind.
Now as we are talking about torrents lets also understand what was the need of it?
So, before torrents, the most common method of downloading files was through direct download from a central server.
In this model, there was one main server, and everyone downloaded from it. It leads to high cost on server and slow speed as it gets distributed among more people.
We will understand how torrent solves this problem but before that lets see the structure of a torrent file.
So a torrent file is a small metadata file that contains information about file or files being shared through the torrent network.
It serves as a roadmap for the torrent client to locate and download the desired file.
There are many details in its metadata but here is the key information that it contains -
- File name and its size
- Piece Size
- Piece Hashes
- Tracker Information
- File Structure (We'll understand meaning of each throughout the blog)
So let's go through the whole journey of a torrent file from start to end.
So the first person who creates a torrent is called Initial seeder.
A Seeder is a term which comes under or is a subpart of a term known as Peers.
Lets see what is Peers and on the way we will understand what is Seeders.
So Peers are those torrent clients who take participation in the torrent network. It has two types
- Seeders - Clients who have the complete file and who uploads it for other people.
- Leechers - Clients who download and upload at the same time but dont have the whole file.
It can be a little confusing at start but it will be cleared when we follow a torrent's journey.
So the Initial Seeder has the whole file. Now lets say a new Client (Leecher) joins the network through the torrent file.
Remember there was a tracker in the metadata of torrent file?
Now here it will be used , So a tracker is the server which keeps track of the peers in the torrent network.
The client sends a GET request to the tracker to get all the information required to connect to peers already in the torrent network.
The primary role of a tracker is to keep track of the peers participating in a specific torrent.
So when a GET request is made then the client sends some data to tracker and in response tracker gives -
- Interval - The number of seconds a client should wait between sending regular requests to the tracker.
Now why the GET request will be sent again and again? Its because as more and more leecher or seeders will join the server its not possible to update it continuously so Client sends a GET request on a fixed interval of time. And we will understand why updating this list is important.
Failure Message - If present then nothing else will be present in response.
Warning Message - Similar to failure message but the response still gets processed.
Now that the client gets the list of peers (in our case, there is only one initial seeder in the network), these two will first establish a TCP connection.
Once this connection is made a handshake is also done. This handshake is done to validate that -
- Opposite party can communicate using BitTorrent Protocol.
- Can understand and respond to our messages.
This handshake sends and receives the same data, which is then compared with what was sent in the form of a hash. If both hashes are the same, then the handshake is complete.
Now that TCP Connection is made and handshake is done, now initial seeder will send files in form of pieces.
So lets say that I want to share a 7mb file. Considering that each piece would be of 512kb then there will be 14 pieces.
Now When we request a piece we don't directly request the whole piece, instead we request a block.
So a block is a subpart of a piece. A piece is made up of a collection of blocks. A block is usually of 16kb and in our case our piece is of 512kb then it means to get one piece we will need to do 32 requests.
So now the file is being downloaded on the leecher's PC. Now lets say a third person (Leecher) joins the network while this process is already happening. The new leecher will do a GET request to the tracker and will get the lists of Leechers and Seeders, and will select a peer based on availability , speed and other factors.
Now the first leecher (Us) are still downloading pieces from Initial seeder right? But we can still send pieces to the new leecher.
We (First leecher) will share the pieces which are completely downloaded on our system. Second Leecher will request pieces not only from us but also from initial seeder and this chain goes on.
How is this happening? How can a leecher can get pieces from multiple sources? Will it not contradict?
No, this is how torrent works there is no one source of getting the pieces , this way all peers are helping each other to download the whole file now if there are 5000 peers in a network everyone can help each other to download a file one can share one piece while other can share other piece while you are receiving these pieces you are also sharing the pieces with new peers who are not having the pieces that you have.
Peers in a torrent swarm communicate with each other through a process called "choking" and "unchoking." A peer (seed or leech) can request pieces from another peer only if it's unchoked by that peer. Peers regularly exchange information about which pieces they have and which they need. This information helps each peer decide which pieces to request and from which peers.
Peers request pieces from each other based on which pieces they need and which pieces the other peers have. The BitTorrent protocol ensures that a peer doesn't request the same piece from multiple sources at the same time. When a peer successfully downloads a piece, it can then share that piece with other peers, increasing the overall availability of that piece in the swarm.
You must have noticed that sometimes when your torrent file is downloaded completely then it shows 'Seeding' It means that you are no longer a leecher now you are a seeder as you have the whole file so you can share the file with other leechers. This is also the reason why a Torrent file costs more data than it says because you are not only getting data but you are also uploading the data for other people as they are doing for you.
This also helps us understand why sometimes a torrent download gets stuck at 99% and never gets completed. This is because there is no seeder in the network and no leecher has that one piece which is needed to complete the download.
Just to be clear -
A "torrent" is the metadata file or information about the shared content meaning its the file extension in a way, while "BitTorrent" is the protocol and technology used to facilitate the sharing of files using that metadata.
I know I have not written the whole process of how torrent works because torrent is a very deep and complex protocol , so this is more or less tip of the iceberg, I will add more parts of it as I learn more about it.
Thanks for reading so far 😁