The Dawn of the Universe

Posted on Nov 7

The Transporter Of The Web

#webdev #web #proxy #devops

Imagine you’re watching The Transporter. Frank Martin, is a man with a strict set of rules: never change the deal, never open the package, and always deliver on time. But what if Frank had to occasionally peek inside the package, maybe even swap something out?

This is where a similar story to intercepting proxy servers, about delivering information on the web, with an added twist for those times when mere delivery is not enough. We’ll take a journey through the world of proxies, exploring their core purpose and function, and eventually stepping up to the role of the intercepting proxy, the “Transporter” that opens the package when necessary.

The Basics

Proxy servers generally operate on a simple set of rules. The essence of a proxy is that it is a middleman. When your browser wants to visit a website, it can choose to send its request through a proxy server. This middleman intercepts the request, forwards it to the target server, and then provides a response. Here are some reasons to use a proxy:

Privacy and Anonymity: Proxies can hide a client’s IP address / client’s identity. This is an ideal setup for browsing privately or circumventing geographic restrictions.
Content Filtering: Need to block access to certain sites? Proxies can filter content. This is useful for school and office networks, where certain websites need to be restricted.
Load Balancing and Caching: Some proxies help optimize traffic by caching popular resources and distributing the load, reducing the load on the server and increasing speed.
Security and Firewalls: Proxies can act as a layer of security, filtering malicious requests before they reach their destination server.

Proxies are reliable, and in most cases, they work without you even realizing it. However, proxies do have limitations. What if, just once, we need to break the “no snooping” rule?

The Need for Something More

Proxies work well, but there are times when simply being a messenger isn’t enough. Imagine this, You’re a developer or security tester, and you need to modify a request on the fly to test how your application handles unexpected data. Or you want to modify a header to analyze your application’s response to different user agents. Or maybe you need to strip some sensitive information from the response before it reaches the client.

In the world of standard proxies, these requirements present a problem. Regular proxies are bound by their rules to forward requests and responses as-is, without any modification. To solve this, we need something more sophisticated,** a proxy that can inspect, modify, and even alter the request and response streams** or it can be called intercepting proxy.

This is where things get interesting. An intercepting proxy, much like Frank breaking his own rule to “never open the package,” is designed to do just that: open, inspect, and modify the requests and responses in transit. These special proxy, often called a man-in-the-middle proxy, isn’t just a simple messenger, it’s a tool for deeper inspection and manipulation.

What Makes an Intercepting Proxy Different? An intercepting proxy’s capabilities go beyond basic forwarding:

Request and Response Manipulation: An intercepting proxy can modify request headers, change body content, or even rewrite responses.
Enhanced Debugging and Development: For developers, an intercepting proxy provides invaluable insights by showing exactly what’s inside each request and response or even shadowing the request. This lets you test and troubleshoot in real time, making it much easier to find and fix issues.
Security Testing: An intercepting proxy is giving the ability to spot-check packets for threats. By manipulating requests, testers can try to expose vulnerabilities like SQL Injection, Cross-Site Scripting, and other potential threats.

How It Works

So how exactly does an intercepting proxy work? In simple terms this is how it works.

Client Request: When a client (like a browser) wants to reach a server, it sends its request to the intercepting proxy instead.
Interception Point: Here’s the key difference: instead of forwarding the request immediately, the intercepting proxy pauses to inspect the contents. It can see the headers, body, and etc.
Modification (Optional): If needed, the intercepting proxy can modify the request/packets.
Forwarding to the Target Server: After inspection (and any changes), the request proceeds to the destination server.
Response Interception: When the server responds, the intercepting proxy has another chance to inspect or alter the response before it reaches the client, adding an extra layer of control and flexibility.

Simple, isn't it? But in the case of request/packet without encryption, everything is easy because the proxy can read the contents of the packet and manipulate it, but the story is different when the packet uses encryption/SSL because only the client and target server can open and read the contents of the packet, while the proxy as man-in-the-middle does not have the "power" to read the packet.

So we need to do a "little change to the HTTPS flow", so that the proxy has the "power" to be able to read the packet. First of all let's see how the HTTPS flow usually works:

The Proxy do not do a TLS handshake and just forward things. After the user types https://example.com in their browser, a TCP connection is made to us and then the TLS handshake begins. The first step of the TLS handshake is the ClientHello (TLS 1.2). Now, we need to decide where to send this ClientHello. How can we discover the endpoint/target server from infromation inside ClientHello? the answer is of course no.

The browser needs to tell the proxy where to forward the request and this must happen before the TLS handshake and after the TCP connection is established. This is where the _CONNECT _method comes into play. The browser sends a request with the _CONNECT _method with a domain name to the proxy before the TLS handshake. This request contains the endpoint and port in this format HOST:PORT or we called authority-form format.

Then the proxy should establish a connection to the target server and if successful should respond with a 2xx (Successful) response. The proxy only replies if it can connect to the target server, and the application initiates a TLS handshake when it receives a 2xx response.

So what about intercepting proxies? here's an overview of how it works:

They are all very similar, the only difference is that the connection is made by doing a TLS handshake with the client and thus will have the data in plain text. Proxy intercepting does not do the initial TCP connection to the target server after the CONNECT request and just responds with the 200 response. It proceeds with the TLS handshake and then only "contacts" example.com after it receives the first request (in this case the GET request).

This kind of flow allows the proxy to read the packet because the TLS handshake will occur between the client-proxy and proxy-target server, allowing the proxy to read and modify the request before it is sent to the target server, as well as the response from the target server before it is sent to the client.

Let’s summarize the differences between a general proxy and an intercepting proxy, side by side:

Feature	General Proxy	Intercepting Proxy
Visibility	Often operates invisibly to the user	Often operates visibly with inspection
Security	Limited, acts as a simple gateway	Allows for real-time manipulation for testing
Debugging Capablities	Limited to traffic observation	Full access to inspect and troubleshoot data flows
Request/Response Forwarding	Directly forwards without modifications	Inspects and can modify requests/responses

Wrapping Up

Now we know that interception is much more than just a "traffic manager". From rule-following proxies to boundary-pushing intercepting proxies, each type plays a unique role in delivering a secure, efficient, and insightful web experience. However they come with a responsibility: if left unsecured, they can expose sensitive data. They need to be used wisely and securely, especially in production environments, to avoid misuse.

DEV Community

The Transporter Of The Web

The Basics

The Need for Something More

How It Works

Wrapping Up

Top comments (0)

Read next

Go-DOM - A headless browser written in Go.

Understanding ARM Server Deployment for Beginners

JavaScript Memory Management and Optimization Techniques for Large-Scale Applications

AWS Root User vs IAM Users