DEV Community

ik_5
ik_5

Posted on • Updated on

You (probably) do not understand UDP

On my day to day work, I find that few types of IT equipment cannot properly support anything but TCP transport layers, even when their specification explains that it does.

I also read rants such as this regarding implementing something in UDP where TCP would have been better.

So this post is about giving a small idea of what is UDP, and I hope it will shed more light on what is missing in the understanding of the transport protocol.

TLDR

You and I (probably) do not (fully) understand UDP.

TCP and UDP have different usages and should be compared based on the end application needs rather than "you do not have a 3-way handshake so it sucks" like rants.

Most of the time, TCP does not require you to do anything regarding transport instructions besides the port number regarding L7 (there are very few exceptions to this rule that this post never touches and blatantly ignores).

UDP is a lightweight transport protocol that forces handling needed transport logic at the application layer (L7), while TCP does most of it on its own almost auto-magically (when you just handle L7) for you.
This post tries to provide examples of such usage.

Network story

Let's talk about networking (almost) without talking about the transport layer.

There is a need for some sort of way to deliver data (e.g., RJ45-based Ethernet cable).

Then there are some physical hardware (e.g., Wifi, Ethernet, token-ring [not really], Bluetooth, Cellular, modem, FXO/FXS modem, etc.
A.K.A. "Physical layer" or Layer #1.

Then there is some sort of firmware and device driver at the software level that can read stuff on operating systems such as Windows/Linux/MacOS/... and provides a low-level API for the OS to do stuff that makes it unified for user-space stuff.

From Layer 2 up to and including Layer 7, everything is software-based.

Each layer is encapsulated backward than what the numerical layer should be, so application level (L7) data is encapsulated first, then L6..L5 it goes down to Layer 2, and then hardware layer to pass data around in a Big-Endian way.

The order of each "capsule" exists, so the first thing that gets accessed will get the data it knows how to parse, leaving everything else to the next level in the chain to handle until it's our program that handles an application layer.

Let us network a bit

To figure out what the size of encapsulation is, there is a measuring unit named MTU (Max Transmission Unit)—the biggest payload size that can send/receive (on normal networking, the value can be only up to 1500 bytes) with the entire encapsulation of everything.

If our payload is bigger than the MTU size, there is a requirement to fragment the payload into smaller chunks and send each of them on their own.

The part that might do that is either our software or transport layer depends first on our application and only then on the transport layer (when supported by it).

The application can decide to perform fragmentation of the payload without considering the transport layer. HTTP has a nice usage for L7 fragmentation using HTTP range requests, for example.

What did you say?

Let's take a hardware tool named "hub".
That tool is an array of RJ45 Ethernet cards that are only separated by actual position, but all data is available at the same time by all used cards.

On that hub, let's connect a single point (that is, a machine = computer).

Let's ask using the "arp" protocol, who knows some sort of hardware address, to return its IP address.
While doing that, let's do some tcpdumping magic and sniff our requests.

We will find out very soon that now and then we cannot see our request after it exists from our machine.

Even if we add another machine and sniff the traffic using the new machine, the result will be the same.

Sometimes we are losing packets!
Why? Gremlins love bits on the net, that's why!

So some transport layers also handle the loss of data.

But if there are a lot of requests (or heavy traffic, or slower processing, etc..), sometimes the order of the packets does not go by the order the L7 wanted them to go, so packet #10 arrives before packet #4, for example.

So some transport layers also handle the assembling of the packets based on the order number inside that layer.

And by talking about "some", it is usually about TCP (in our case).

TCP (and) UDP

TCP

TCP is a complex transport protocol that does so many things for you (almost in an automagically manner), such as:

  1. Open a communication path to the endpoint, preparing it to accept your data (3-way handshake) and also alert that the entire payload was sent.
  2. Understand for you how to fragment a payload and number it.
  3. Understand the timeout and missing fragment, and send the missing fragment again (retransmission).
  4. Handle congestion of the network by using a sending window handler.
  5. Can send "ping" like to keep alive the path and endpoint of the connection.

Sounds amazing, but it comes with a few costs, such as:

  1. Heavy traffic (e.g., 3-way handshake, retransmission of fragments, and keep-alive).
  2. If you have network issues, you'll lose your connection to the endpoint when timers are timed out.
  3. Big header to include many features = smaller payload (MTU).
  4. The speed of data is much slower than UDP (for example) due to so many things that are going on that need to be verified.
  5. Your TCP stack implementation handles for you the congestion, and if it supports using the wrong algorithm for your needs, you need mostly to suck it up.
  6. When a reply is too slow for your timers (L7/L4), it will re-transmit the same fragment many times even though the endpoint did receive it but was slow to respond.

UDP

UDP is a simple lightweight (compared to TCP) transport protocol.
It does not assume anything besides Port (mostly) and lets your application do all that it needs as business logic.

Does it sound bad compared to TCP?
Well, no, because it exists for different uses.

Examples

VPN

Let's say I create a site-to-site VPN—that is, a VPN as infrastructure—that provides new network legs rather than two machines connected.

If an application creates a VPN using TCP transport, every packet will take a long time to work.
It never knows when something will be sent, and the traffic will create congestion in the network.

So to mitigate some of the issues, things are more relaxed, thanks to UDP*.

The connection can be once every 10 hours, but in theory, there is a connection that will transport that content and is "alive" and kicking.

*There are some TCP-based VPNs, but they are not suited for site-to-site, to my knowledge.

ssh

mosh session

Normal ssh is TCP-based.
But there is also a different implementation named mosh.

What happens if your network connection is unstable?
Well, the connection will be lost at some point, no matter how stable the protocol is trying to be (even with keep-alive), the end server will disconnect you when the client is up again.

For the above reason mosh is using UDP.
Even when your connection was lost for 10 hours (don't you love my magic numbers in this post?), you can still connect and continue from the point it ended.

Streaming

When using communication protocols (it does not have to be telephony), there is a Real Time Protocol (RTP) that sends chunks of media (audio and/or video) for this example.

That media is broken out into "frames", and there is a buffer (jitter buffer) that collects X amount of media chunks and sends it to be processed by a device (such as audio devices).

Such media is part of a codec (encoder-decoder) that stores the data in a specific format instead of its raw form.
Such codec defines (or provides a range of) several frames that need to be collected to be proper media.
Some codecs also contain a variant of frames that change based on the bandwidth and/or performance (such as OPUS).

When there are fewer frames of a defined amount of frames available at the buffer level, audio (up to a point) sounds more like a metallic sound (the issue named jittering), and video will not be updated.
Also, each implementation needs to define a sweet spot between waiting for data and releasing the current buffer, and that also can create the jittering.
Older frames that arrive late are thrown away if the buffer was already released (or the re-ordering of data will cause a big delay).

But there is no need to collect older frames that arrived late because it misses the point of streaming; there is a need to continue with the newer frames, ignoring older ones.

But how can a program know how much was arrived in total and how much was lost if there isn't any L4 implementation?!

There is another "nice" protocol named RTCP (RTP Control Protocol)
that sums up every X amount of time how much data (frames) were sent, and now there are also statistics and the ability to change variants of codecs or even negotiate new ones based on that information.

Such decisions usually use table scores such as MOS to help understand packet loss of media and its effect over time.

Google's streaming as example

Conclusion?!

Well, to explain UDP, I'll require several boring blog posts (even more than this one), so I decided to explain the way networking works and provide examples of using UDP and why TCP might not be suited for the task.

I hope that by doing so, I was able to explain a bit more why there is a need for UDP and why it is so different from TCP.
It's not about "what L4 protocol is better", but "what L4 protocol suits my needs", and what is required to be made for that.

About me

I'm writing telecommunication software in my day-to-day work.

My day-to-day work forces me to interact with many networking protocols, and many OSI network layers, including three transport layers:

My work also forces me to work with so many IT-related stuff, such as:

To name a few...

Mr Meeseeks from Rick and Morty

Top comments (0)