In recent years, privacy has become one of the fundamentals of security and information technology. The Onion Router (Tor) Project can help us achieve what many users have been asking for in terms of an assurance of online anonymity. Tor is a global network of computers run by volunteers to provide online anonymity to anyone who needs it.
This article will explain how Tor can help us research and develop tools for the online anonymity and privacy of its users while they're surfing the internet. Tor does this by setting up virtual circuits between the various nodes that make up the Tor network. We will also look at how Tor works from an anonymity point of view, stopping websites from tracking you.
Let's get started!
The internet is arguably the largest source of mass surveillance in the world but is also one of the safest ways to send anonymous messages. Most internet users use the default applications and settings, making it possible to track, log, and analyze almost all of their communications. This is exemplified by the data exfiltration performed by large companies, which aim to obtain economic advantage from their users’ data.
There are different types of anonymous browsing. Browsing through a single proxy offers a level of anonymity at the network level.
Another widely used system for anonymization is the use of VPNs to send traffic. In general, this works the same way as Tor, sending your traffic through another user's computer. The difference is the lack of anonymization between your computer and the VPN provider. In Tor, for example, the "exit node" is the one that actually collects your data – for example, the website you are trying to view anonymously – but it is more difficult to track the user and discover their origin address.
All this requires the use of programs that aim to hide the user's identity. Perhaps the biggest anonymization device in use at the moment is Tor. This system facilitates anonymous communication by routing the messages on the Tor network through other computers.
Thanks to the Tor network, we can connect completely anonymously due to it being an encrypted connection where the IP changes with each request that is made to each of the nodes.
Tor is a network of virtual tunnels that protect you or your corporation from being placed at a specific location in the network. The objective of this network is to change the traditional routing mode, which we all use, so as to maintain the anonymity and privacy of our data.
Tor provides anonymity by routing all your packets in an encrypted way through a complex web of repeaters. These communicate with each other to help transport your messages to the right destination, without anyone knowing who made the request or actually sent it.
From a privacy point of view, Tor has two distinct purposes:
- Hiding the locations of users who are browsing the web: Your computer can be traced through your IP address. Tor ensures untraceability through this method.
- Encrypting your browsing traffic: Tor encrypts your browsing traffic by mixing it with other users' traffic using a technique called onion routing, which hides your IP address from the websites you visit. It also hides the traffic from your ISP address, which can see when you're connected to the Tor network but cannot determine what sites you are accessing through it. Now would be a good time to briefly highlight the use of DNS servers for the resolution of domain names provided by our ISP. If we have access to the configuration of our router, it's possible for us to change the DNS servers that we use and opt for a DNS service that offers us additional services, such as anonymity or protection against fraudulent or potentially dangerous destinations for our equipment or the integrity of our data.
Now that we've understood the purpose of Tor networking, let's look at how it works.
The Tor network is based on the principle of onion routing. This means that a connection goes through several encrypted layers, and the router at each layer only knows what is essential to perform the work at that layer.
When you connect to the Tor network, the following process occurs:
- The client downloads a list of all available Tor relays and selects three: one guard node, one middle or relay node, and one exit node.
- If you then send information through the Tor network to the internet, it's first encrypted so that only the exit relay can see what website you're requesting. From a user privacy point of view, the exit nodes have visibility of this data through the network packets that are sent, but in most cases, the identity of those packets is not known.
- Then, this already encrypted layer is further encrypted so that only the middle relay node knows that it should be sent to the exit relay. This doubly encrypted layer is encrypted so that only the guard relay can see who the middle relay is: Figure 1 – Onion routing connection flow between the client and server
All this encryption is done before the network traffic leaves your computer, which means the following for us:
- Anyone monitoring your internet connection can only see you exchanging encrypted information with the guard relay.
- The guard relay only knows your IP address and who the middle relay is.
- The middle relay only knows the guard relay and the exit relay, but not who you are or what website you're requesting.
- The exit node knows what you're requesting off the internet, as well as who the middle relay is, but not who you are or who the guard relay is.
This process completely separates the content you're requesting from anything that can be used to establish your identity.
The source code for the Tor Project is available at the project's website at https://www.torproject.org/download/tor/ and the project's GitHub repository at https://github.com/torproject/tor.
So, how does the network work? Let's suppose that we have two computers: computer A and computer B. A wants to send a message to B and makes a connection to a server that contains the addresses of the Tor nodes.
You can see this process in a graphical way on the official Tor website: https://2019.www.torproject.org/about/overview.html.en.
Let's take a look at how this works, step by step:
- The first step is getting a directory listing from the central server.
- After receiving the dialog list from this server, our Tor client will connect to a random node through an encrypted connection. This node will pick another random node with another encrypted connection, and so on until it reaches the node before the message arrives at computer B. The egress node (the penultimate node of the communication) will make an unencrypted connection to node B. All Tor nodes are chosen at random and no node can be used twice.
- Using asymmetric encryption, computer A encrypts the message into a structure that resembles an onion's structure: layered. First, it will encrypt the message with the public key of the last node of the route so that only computer B can decrypt it. In addition to the message, it includes (also encrypted) directions to the destination, B. This entire package, along with directions to the last node on the list, is encrypted again so that it can only be decrypted by the penultimate node on the route.
- Now, we can already see the structure of the data in onion routing. Using asymmetric encryption, computer A encrypts the message in layers. The first thing computer A will do is encrypt the message with the public key of the last node in the list so that only A can decrypt it. In addition, it encrypts and includes directions to the destination, which is computer B. This entire packet is encrypted again by instructions being added to get to the last node in the list. This is done so that it can decrypt the packet and eventually reach node B.
- To avoid third-party analysis of our communications, every 10 minutes, the Tor connection nodes are changed, with new nodes being chosen.
- The nodes of the Tor network are public. If we ourselves are a node, we will increase our privacy. Although this sounds contradictory, I'll explain why this happens: if Alice uses the Tor network to connect to Bob, she will need to connect to another Tor node. However, if it works as a node for Jane or Dave, it will also be connected to another node. Therefore, a third party will not be able to know if the communication by Alice has been initiated as a user or as a node.
This makes it more complex for a third party to extract information. If Alice were to function as a node for hundreds of users, it would be difficult to spy on their data.
This process is repeated until we're finished with all the nodes of the route. With this, we already have the data package ready, so it's time to send it. Computer A connects to the first node on the route and sends the packet to it. This node decrypts it and follows the instructions it has decrypted to send the rest of the packet to the next node. This one will be decrypted again and resent to the next one, and so on. The data will finally arrive at the output node, which will send the message to its destination.
The Tor protocol works by multiplexing multiple circuits over a single node-to-node TLS connection. Each circuit is a path that's created by clients via the Tor network. This path consists of randomly selected nodes. Tor traffic is routed through three nodes by default: Guard, Relay, and Exit. In order to route multiple relays, Tor has flow-multiplexing capabilities where the following occurs:
- A single Tor circuit can transport multiple TCP connections.
- Each node knows only the source and destination pair for a circuit; i.e., it doesn't know the entire route.
- Next, we'll look at hidden services.
Tor allows a website to hide its IP address from its users. Such sites are called onion services or hidden services.
Hidden services are those sites that can only be accessed by being connected to Tor because they are sites hosted within the Tor network itself. Most of these sites are usually illegal sites because the protection of being inside the Tor network attracts the people who set up such sites.
According to the Tor Project's statistics, there are over 60,000 onion services running at the time of writing: https://metrics.torproject.org/hidserv-dironions-seen.html.
Hidden services provide a mechanism where the anonymity and the confidentiality of data are preserved safely. However, it sacrifices other aspects in terms of performance since it is quite expensive to build the circuits involved between the client and the server. For this reason, hidden services in Tor are slow.
To maintain proper use of the Tor network, the user and the onion service that they wish to access must assemble complete Tor circuits. For this reason, there will be six nodes between the user and the service provider. This makes the connection slower and explains why onion services generally use very simple and lightweight websites.
Now that you understand the basics of the Tor Project and what hidden services are, let's move on and learn about the main tools we can use to connect to the Tor network.
In this section, you will learn about the main tools that provide anonymity in the Tor network. We'll do this by learning how to connect to the Tor Browser and introducing other tools for controlling our Tor instance.
The easy way to navigate through the Tor network is to use the Tor Browser, which is a modified version of Firefox that includes extensions such as Torbutton, NoScript, and HTTPS Everywhere.
The Tor Browser is configured to obtain the different routes and servers that we can connect to automatically. In addition to allowing you to browse with a high degree of anonymity, by closing a browsing session, confidential user data related to cookies and browsing history will be automatically deleted.
To connect to the Tor network, all you need to do is the following:
- Download the Tor Browser Bundle from https://www.torproject.org.
- Unzip it.
- Run the start-tor-browser script in the unzipped directory.
In Debian-based distributions such as Ubuntu and Linux Mint, we can also install it through the torbrowser-launcher package to get the latest version of the browser. For example, here, we can find the latest version of the Ubuntu distribution:
We can install it with the following command:
$ sudo apt install torbrowser-launcher $ torbrowser-launcher
We can execute torbrowser-launcher to download the Tor Browser and follow the auto installer's instructions.
Once installed and connected successfully, the Tor Browser will launch and point to http://check.torproject.org, which will confirm you are browsing anonymously. If you see something similar to the following, then this means you have successfully configured Tor and can navigate through the internet anonymously:
Figure 2 – Prompt that shows the connection to the Tor Browser was successful
The initial Tor check page not only validates that you are using the Tor network but also displays your current IP address. Remember that you may be exiting the Tor network from an exit node in another country, and specific sites try to visit the site in the native language of that country.
An interesting feature offered by the Tor Browser is the Use new identity option. This functionality allows us to browse with a different IP. Just remember that when you use Tor, you are really browsing through your network, but the router that we go to the internet through is always the same. This means that you use the same IP, unless you change it with the aforementioned option. This IP changes dynamically with each request you make.
When browsing the Tor Browser, our IP will be the IP of the last router that we have passed within the Tor network, which will always be the same as long as we do not provide the option to change IP addresses. In addition to this, once we enter the Tor network, the path that the packets will follow to the last node or router in the Tor network will always be different, so tracking a user's data flow is almost impossible. In addition to this, connection data is only stored for a certain amount of time (less than an hour).
The Tor community develops various projects, some of which can be found at https://2019.www.torproject.org/projects/projects. Let's take a brief look at two of the most popular ones:
- Tails, https://tails.boum.org, is an operating system that you can carry on a USB stick that makes all its connections through Tor, preserving the anonymity of its users.
- Orbot, https://guardianproject.info/apps/orbot, is the official application for Android.
There are several others, the main one being the Tor Browser: https://www.torproject.org/projects/torbrowser.html.en.
None of the intermediate nodes know the origin or destination of the message. They also do not know what position they occupy in the network. These nodes are spread all over the world so that anonymity is achieved. The intermediate nodes are resources donated by anonymous people from all over the world. If we look at the TorMap service, https://tormap.void.gr, we'll see a map showing all these nodes.
Due to the way the Tor network works, not all the nodes that make it up are the same. Depending on its characteristics and configuration, a node can fulfill certain functions:
- Entry nodes (guard relays): These communicate with Tor clients and connect users to the rest of the Tor network. They have generally been in use for a long time and have generous bandwidths.
- Middle nodes (middle relays): These only communicate with other nodes, so their traffic never leaves the Tor network and represents the most comfortable, fast, and secure option for configuring nodes.
Output nodes (exit relays): These are the endpoints within the Tor network. They take the requests, send them to their recipients, receive their responses, and send them back to the network so that they reach the original requestor. They are usually maintained by institutions and other actors and have the capacity to face the possible legal consequences of what users look up using the Tor network if their connections leave through these nodes.
Bridge nodes (bridge relays): These are normal relays that are not listed within the Tor directory, which means they can be considerably more difficult to block. We can use bridge relays when our ISP is blocking the use of Tor but we still want to connect to our network. The only difference between normal and bridge relays is that normal relays are listed in a public directory, whereas bridge relays are not. You can get a list of bridge nodes at the following URL: https://bridges.torproject.org. We can access https://bridges.torproject.org/bridges to get random bridge data.
Now that we understand how the Tor network works, let's learn how to install the Tor service on our machines.
One of the ways we can control a Tor instance is through a service that we can install on our machine. The objective of installing this service is to allow us to customize the way in which we can control our instance and send commands to, for example, change our identity when we are surfing anonymously.
Installing the Tor service in Debian/Ubuntu-based distributions is easy – just run the following Terminal commands:
$ sudo apt-get update $ sudo apt-get install tor $ sudo /etc/init.d/tor restart
To start the Tor service from a Terminal, enter the following command:
$ sudo service tor start
We can verify that the Tor service has been started correctly with the following command:
$ service tor status
We can also verify that the Tor network works and provides anonymous connectivity. For this, we can call Tor routing using the following proxychains command:
$ proxychains firefox www.whatismyip.com
ProxyChains (https://github.com/haad/proxychains) is a tool with the ability to connect to various proxies through the HTTP(S), SOCKS4, and SOCKS5 protocols. It also has the ability to resolve DNS addresses through the proxy server. By using this application with Tor, it becomes very difficult for others to detect our real IP.
A whois search of that IP address from a Terminal window indicates that the transmission is now leaving a Tor exit node. You can also verify that Tor is working properly by accessing the https://check.torproject.org and https://browserleaks.com/ip services.
You can control the Tor service by configuring the torrc file to enable the ControlPort option. In this way, we can control the Tor service from our Python programs.
In the preceding image, we can see how the service is listening on port 9050. By default, the Tor client uses port 9050 for SOCKS traffic. If we need a special configuration, we need to change the configuration of the torrc file. The Tor Project documentation (https://support.torproject.org/tbb/tbb-47/) shows the SOCKS proxy configuration we can establish in the Tor Browser's network settings.
Depending on the Tor configuration, the Tor client will listen on two ports:
- ControlPort 9051: This is the port where Tor will accept the connections and allow the Tor process to be managed using the Tor Control Protocol.
- SocksPort 9050: This port waits for connections from other applications and determines which port number the SOCKS proxy will listen on for incoming connections from external applications.
Configuring the torrc file is similar to launching the Tor service in that you have to establish the aforementioned arguments:
$ tor --SocksPort 9050 --ControlPort 9051
In the following screenshot, we can see the startup process and the different steps that must be taken to initialize Tor to establish a circuit in more detail:
Figure 6 – Initializing Tor to establish a circuit
As we can see, the process of establishing a circuit follows four different phases, as follows:
- In the first phase, the machine tries to connect to the directory server that is responsible – through a non-encrypted link – for providing you with a complete list of nodes that make up the Tor network.
- Next, a handshake with the directory server is attempted and an encrypted directory connection is established.
- In the third step, the network status consensus is loaded and authorization to load certificate keys is provided.
- Finally, information related to the relay descriptors is gathered before the Tor circuit is established.
Next, we'll take a look at two different services: ExoneraTor and Nyx.
The ExoneraTor service (https://exonerator.torproject.org) maintains a database of IP addresses that have been part of the Tor network. It offers a service where, by entering an IP address and a date, you can find out if that address has been used as a relay node in the Tor network.
This service can store more than one IP address per relay if the nodes use a different IP address to go out to the internet rather than registering with the Tor network, and it stores information on whether a node allows Tor traffic to go to the internet.
Nyx (https://nyx.torproject.org) is another interesting project that allows you to gather detailed real-time information about relays, such as their bandwidth usage, event logs, and connections.
Nyx also allows us to view the connections and circuits that have been established from the Tor instance, the instance's options and their configuration, and the content of the torrc file:
Figure 8 – Tor connections and circuits established
The connection data provided by Nyx is similar to the netstat or top commands but is correlated with the information in the Tor relays.
In this article we've looked at how Tor provides online users with a high degree of anonymity, by setting up virtual circuits between the nodes that make up the Tor network. As well as having insight into what Tor is and how it works, the knowledge you now have makes a solid foundation for learning about tools that can help with, for example, automating the process of searching and finding hidden services.
This article is part of José Manuel Ortega's book Mastering Python for Networking and Security, a guide to overcoming security and networking issues using Python scripts and libraries. Check it out now to read more about leveraging Python packages to build a secure network.