knut

Posted on Oct 14, 2023 • Edited on Oct 19, 2023

netcrab: a networking tool

#rust #tokio #networking #performance

Before I get to this project known as netcrab, I thought it'd be fun to share some history from Xbox's past... call it the origin story of this tool. Let's go back in time a little bit. The year was 2012 and I had joined the Xbox console operating system team a year or so before. We'd wrapped up working on one of the last major updates for the Xbox 360 and were well underway with the next project, the thing that would eventually release as the Xbox One.

I worked on the networking team, and the architecture of the Xbox One was wildly different than the 360. The Xbox One consisted of three virtual machines: a host, a shared partition (for the system UI, etc.), and an exclusive game partition. They all had to share a single network adapter, and so they had a whole lot of new code for this virtualized networking system to multiplex 3 VMs' worth of network traffic through a single adapter. To make matters even more fun, the host VM, the one that actually had access to the physical network adapters and ran their drivers, was (and still is) relatively lightweight. A lot of networking features are just not there, like, uh DHCP, IP fragmentation, and more.

All of that to say, back then I was doing a lot of debugging of extremely simple networking things, like whether the box even gets an IP address, did it send a packet, did it receive a packet, why did the firewall reject this, and why did the firewall allow this? I had a need for a simple networking tool I could use as a TCP client, TCP server, UDP listener, or UDP sender.

Well, such a tool already exists, of course, and has existed for a million years. It's called netcat and is well known to Unix people. There were two problems with it for me, though:

I didn't want to deal with license issues integrating it into tooling at work. I'm not sure what the license is, but Microsoft was much more wary about open source projects back then.
That Xbox host VM I mentioned earlier does not run off-the-shelf programs. You have to recompile your program from source for it to work.

So I took matters into my own hands--because I love having tools--and wrote my own replacement called netfeline. It got the job done for me at work, but it had just one problem: it is private to Microsoft, so I can't share it with anyone. And I'm not sure I'd want to; it's not my best work by a long shot. Now we arrive at April 2023 and I'm trying to get better at Rust and finally got the itch to rewrite netcat as an open source project.

Early Choices

Right from the beginning I knew I wanted to try using Rust's async/await functionality, but the current state of async programming in Rust is a bit weird. The language and compiler have support for certain keywords like async, but there's no standard library that provides an async runtime, which is needed to actually execute asynchronous tasks. The Rust async book has a good chapter on the state of the ecosystem.

So I started by using Tokio, a popular async runtime. The docs and samples helped me get a simple outbound TCP connection working. The Rust async book also had a lot of good explanations, both practical and digging into the details of what a runtime does.

When I work on projects, I like to add breadth first before depth. I want to stretch the code as broadly as it needs to go and see minimal functionality across all the features I want before I polish them. I find this helps me sort out all the structural questions I have. As I'll describe, I had to do a lot of stretching and restructuring throughout this project.

To start with, I got a TCP client and server and UDP listener and sender all minimally working. This set up the code to handle four major, simple scenarios: TCP/UDP and listener/sender, all of which have slightly different ways to work with them.

Wrestling with user input

The Tokio library I am using also provides wrappers for stdin and stdout. Unfortunately I found that this didn't work well for me. From Tokio's stdin docs:

This handle is best used for non-interactive uses, such as when a file is piped into the application. For technical reasons, stdin is implemented by using an ordinary blocking read on a separate thread, and it is impossible to cancel that read. This can make shutdown of the runtime hang until the user presses enter.
For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.

Well, I wanted an interactive mode to work. You should be able to start a server on one end and a client on the other, and if you push a key on your keyboard, the other end should see it pop up. Tokio's stdin implementation had two problems:

if the program was about to exit due to, say, the socket being closed by the peer, it wouldn't exit until you pressed a key. Unacceptable.
if you pushed a key, it didn't get transmitted until you hit Enter. Boooo.

To address first problem, I ended up having to take the docs' advice and put those blocking reads on my own thread. By "blocking read", I mean my thread calls a read function that will sit there and wait (A.K.A. "block") until there is data available to be read (because the user pressed a key). The first problem exists because the Tokio runtime won't shut down until all the tasks it's waiting on complete, and one of them will be stuck with a blocking read call until the user hits a key. But by putting it on a std::thread, it's not managed by the Tokio runtime, and Rust is happy with tearing it down in the middle of a blocking call at process exit time.

For the second problem, I found a useful crate called console. This gives the ability to read one character at a time without the user needing to hit Enter. It has a weird bug on Unix-type systems though, so it currently defaults to the -i stdin-nochar input mode there.

All these arguments

By this time I had already gotten tired of parsing arguments by myself and had looked for something to help with that. I found a really dang good argument parsing library called clap. What makes it so cool is it's largely declarative for common uses. You simply mark up a struct with attributes, and the parser automatically generates the usage and all the argument parsing code.

Here's a snippet of one of the parts of netcrab's args as an example. This lets the user configure the random number generator for producing random byte streams sent to connected sockets. It exposes three arguments that the user could pass: --rsizemin NUM, --rsizemax NUM, and --rvals binary or --rvals ascii.

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug, clap::ValueEnum)]
enum RandValueType {
    /// Binary data
    #[value(name = "binary", alias = "b")]
    Binary,

    /// ASCII data
    #[value(name = "ascii", alias = "a")]
    Ascii,
}

#[derive(Args, Clone)]
#[group(required = false, multiple = true)]
struct RandConfig {
    /// Min size for random sends
    #[arg(long = "rsizemin", default_value_t = 1)]
    size_min: usize,

    /// Max size for random sends
    #[arg(long = "rsizemax", default_value_t = 1450)]
    size_max: usize,

    /// Random value selection
    #[arg(long = "rvals", value_enum, default_value_t = RandValueType::Binary)]
    vals: RandValueType,
}

Here are a few tips about clap, for me to remember and for you to maybe learn.

It's not super straightforward what attributes are available to apply. If you have a #[group(...)], you can use attributes that match any of the getters in ArgGroup.
If you have an #[arg(...)] you can use ones from Arg.
A #[command(...)] corresponds to Command.
If you want to use an enum value in args, remember to add the attribute #[derive(clap::ValueEnum)] or else you'll get cryptic compiler errors.
#[command(flatten)] can be applied to pull in all of a struct's fields into the usage while retaining the nested nature in the struct.
If you want to add your own line to the usage for -h you can add #[command(disable_help_flag = true)].
If you are using the "derive" parser like I did, but you want to execute one of the methods on Command, you can call YourArgs::command().whatever().

Customizing your sockets

One of the useful things to do when testing a networking stack is customizing various socket options. If you click on the link you'll see that there are quite a lot of them. Tokio exposes a few to customize on the TcpSocket, TcpStream, and UdpSocket objects, but by no means all of them.

On the other hand, the socket2 object, upon which Tokio's sockets are built, exposes quite a lot of them from many of the different option groups. Joining multicast groups, enabling broadcast, setting TTL, etc.. I didn't quite know which all I wanted to expose in command line args, but I wanted to set myself up for success by getting access to the socket2::Socket.

Unfortunately, I didn't see a clear way to convert from a Tokio socket to that underlying socket2 socket. All I could see were FromRawFd and FromRawSocket, which could be joined up with the Tokio socket's AsRawFd/AsRawSocket. Well, I pushed forward with this and committed the following crime:

let socket2 = ManuallyDrop::new(unsafe {
    #[cfg(windows)]
    let s = socket2::Socket::from_raw_socket(socket.as_raw_socket());

    #[cfg(unix)]
    let s = socket2::Socket::from_raw_fd(socket.as_raw_fd());

    s
});

At the point I take the raw FD/socket and create a socket2::Socket on top of it, the socket2::Socket takes ownership of the handle and will close it when it goes out of scope. That would be bad, because it would shut down my original Tokio socket. So I had to work around it with an object that Rust provides called ManuallyDrop, which inhibits running the object's destructor.

This solution was ugly but worked, and I had to dip into unsafe APIs for the first time in the project, which made me a bit sad. Is it impossible to write a program of any reasonable complexity in Rust without resorting to unsafe calls?

I tried to hint just now that there actually is a clean way to do this. A recurring theme in this project is finding the right tool for the job. The right tool in this case is socket2::SockRef, which lets you call all those socket options on a AsFd/AsSocket without taking ownership or even requiring a mutable reference. No more unsafe calls either. It's exactly what I needed. I stumbled on it like five minutes ago as part of writing this blog.

The moral of the story is: if you find yourself banging your head against the wall fighting the borrow checker or bringing in unsafe code, look a little harder: most mature libraries have fixes for these rough edges already.

Group talk expansion

At some point I thought I was nearly done: I had the main scenarios of inbound and outbound traffic working, and a bunch of extra scenarios like being an echo server, generating random data to send out, sending broadcast and multicast traffic...

I was just about to call it "feature complete", but then I was browsing around that netcat web page and noticed an interesting feature: they called it "broker mode". It allows multiple clients to be connected at the same time and forwards traffic between all of them.

Suddenly I had a vision of a problem I wanted to be able to solve. I do a lot of my work on a laptop in my home, VPN'd into Microsoft corpnet. I have an Xbox devkit and a test PC at home. Sometimes I hit a bug on a device at home and want to have someone at work take a look at it.

Windbg has a feature called debugger remotes. One end, which is actually attached to the thing being debugged, is called the "debugger server". It can open a port and allow a "debugger client" to connect to it and operate it remotely. In the context of my home/work setup, my home PC can connect to work resources but not the other way around (restrictive VPN firewall), so I thought it would be cool if netcrab could support working around that and allowing a debugger client at work to connect to a debugger server at home.

What we need is a listener at work (since it can accept connections from home) that can accept connections from both home and the remote side of the debugger, and a connector at home that connects to both the local debugger session and the work listener. These two forwarders should be able to send traffic between the debugger session and the remote debugger.

More than one connection

There was a big hurdle to making this work: everything in the program up to this point was narrowly focused on only one remote connection. The program structure had no notion of more than one remote endpoint being connected. In order to expand the breadth of scenarios it can cover, I had to rewrite most of the guts of netcrab.

To work on it in phases, I started off bringing up support for just having more than one incoming connection at a time, postponing the feature of traffic forwarding. The code would create a listening socket and call accept on it. With async programming, the accept call is put in a separate task, and I wait until it completes.

Before that rewrite, when the first accept completes because of an incoming connection, that socket is handled until it closes, then the program either issues another accept to handle another client or exits, depending on user preference. With only one TCP connection at a time to manage, I didn't have to think about having multiple tasks for handling different connections at the same time. Life was good.

My first attempt to expand this went very poorly. I already had a function called handle_tcp_stream that created an async "task" object called a "future" that drove the input and output of the socket to completion, so I figured all I needed to do was call handle_tcp_stream on any new listening socket and stuff the future into a Vec.

I had the right idea but had not yet found the right tool for the job. Putting these futures in a Vec doesn't work because Rust doesn't let you both modify the list for adding to it and asynchronously modify it for removing completed futures from it. This requires mutably borrowing the Vec twice, and that's disallowed by Rust. By the way, around this time I found this good article about ways of thinking about mutability in Rust.

At some point I got the feeling I was barking up the wrong tree, and so with some searching I stumbled upon the right tool for the job, FuturesUnordered. Let's see:

a set of futures that can complete in any order
can add to it without a mutable borrow (wow)
automatically removes a future from the list when one completes (wow)

Suddenly my original idea was simple: every time I accept a new connection, I stuff it in a FuturesUnordered collection of ongoing connections and just await the next one finishing.

Everything is sinks and streams

Once I had multiple connections at the same time, the next step was to enable forwarding between them, and by the way I can't forget local input and output, which also should go to and from all connections.

Before we get too deep into the router's guts, it's worth explaining about streams and sinks, which feature prominently in Rust async programming and consequently in netcrab. A Stream is basically an asynchronous Iterator. While Iterator::next produces subsequent items synchronously, Stream::next returns a future that asynchronously produces the next item. Just like an Iterator, there are many convenience methods to modify the data as it emerges from the Stream (e.g. map, filter, etc.).

A Sink is the opposite: an object that can receive values and asynchronously handle them. You call Sink::send and await the transmission completing. The Sink is templated with the type of value it accepts. You can call with to add an "adapter" before the sink in order to change the data type the Sink accepts or to intercept and process items before the Sink handles them.

A common thing to do is send all the data from some stream to some other sink. That is done using the send_all method.

Where, in Rust, do sinks and streams come from? Well, anything that implements the AsyncWrite trait can be turned into a Sink by using the FramedWrite helper. Likewise with AsyncRead and FramedRead producing a Stream.

And where do you get an AsyncWrite or AsyncRead from? Well, Tokio provides them in many places. For example, you can call TcpStream::split, and you get one for each direction: writing to or reading from the socket asynchronously.

In practice, it looks a little like this:

let (socket_writer, socket_reader) = tcp_socket.split();
let socket_sink = FramedWrite::new(socket_sink, BytesCodec::new());

// Call `freeze` to convert a BytesMut to a Bytes so it can be easily copied around.
let socket_stream = FramedRead::new(socket_reader, BytesCodec::new()).map(|bm| bm.freeze());

router_sink.send_all(&mut socket_stream).await;

Another way to get a sink and stream is to use an mpsc channel. You get a sink and stream, either with a fixed limit of data it can carry or unbounded that allocates from the heap as needed. MPSC stands for "Multiple-Producer, Single-Consumer", and so one of the coolest properties is that the sink part can be cloned and you can have many parts of your program all feeding data into the same channel. This was a tool I reached for a lot. Maybe too much, but we'll get to that later.

This isn't strictly about sinks and streams, but I want to talk about Bytes for a second, since it's such a simple and cool object. In its most common case, it's a reference-counted, heap-allocated buffer, so it's cheap to copy around. It has other fancy things like avoiding reference counting for static allocations, but in the context of netcrab, a FramedRead with the BytesCodec produces BytesMut instances (which can be converted to a read-only Bytes cheaply), so all of the channels use them to pass data around without incurring buffer copies everywhere.

An aside: I am a big fan of writing technical blogs because the process of writing makes me think about things to change or improvements to make. Expounding about BytesMut above helped me remember that I had several places where I made a temporary buffer, filled it, and then created a Bytes from it, incurring a buffer copy.

I made a change to instead fill a BytesMut directly, then freeze it, to remove that buffer copy. Unfortunately, profiling didn't show any change.

The router is also sinks and streams

I started conceptualizing a "router". At its core it is:

a single Stream of data from various sources (local I/O and multiple sockets)
a piece of code that examines the source of each chunk of data and decides where to forward it
a collection of Sinks so that it could forward data wherever it should go
a way for the rest of the program to tell the router that a new socket has just connected

I used this blog post to finally get around to learning a tiny bit of Mermaid so I could make this chart. It's neat but does not give you enough control to make diagrams look just like you want.

mermaid / source

The input type to the router is SourcedBytes. What is that? It's something I added: a Bytes plus a SocketAddr. The reason I need that is the mpsc Sink can be cloned so multiple things can feed into it, but the Stream side of it doesn't indicate which one of the Sink clones inserted each element; I have to bundle that in myself.

By the way, the diagram says Sender and Receiver for readability, but I actually used UnboundedSender and UnboundedReceiver because I was OK with spending more memory for higher throughput and to avoid handling errors with the fixed-size channels being full.

Just like the router requires the remote address to accompany each data buffer, it also stores each socket sink in a map indexed by the remote address. The router can now implement some simple forwarding logic:

examine the source address of a data buffer
enumerate all the known socket sinks and send to each one that doesn't have the same remote address

That's it. That's "hub" mode.

Revisiting windbg remotes with hub mode

Equipped with hub mode, I tested out my idea. I'll show the spew just from a test all on localhost. This is the point of view of the work PC, not the home PC. I'll also add annotations.

// Listen on port 55001. Use forwarding mode "hub". Squash output
>nc -L *:55001 --fm hub -o none

// Successfully listening on that port.
Listening on [::]:55001, protocol TCP, family IPv6
Listening on 0.0.0.0:55001, protocol TCP, family IPv4

// Incoming connection from the "home" machine's netcrab instance, which is also connected to the debugger server and doing hub mode.
Accepted connection from 192.168.1.150:60667, protocol TCP, family IPv6

// Incoming connection from windbg debugging client, running on this same machine.
Accepted connection from 127.0.0.1:60668, protocol TCP, family IPv4

// Wait, what's this? Another?
Accepted connection from 127.0.0.1:60669, protocol TCP, family IPv4

What I discovered is that windbg (and many other programs) make multiple connections, for some application-specific reason. Whatever the reason, hub mode won't work for them, because traffic from one socket is forwarded to all other sockets. You end up with cross-talk that will surely confuse any application.

mermaid / source

Introducing "channels" mode

What you really want is something more like a tunnel. Two sockets to remote machines are associated with each other as two ends of a tunnel (or "channel", as I called the feature). Traffic is forwarded between these two endpoints without any cross-talk with other sockets.

mermaid / source

I already had the code to manage multiple sockets and decide which ones should be forwarded data. For hub mode, I had a broadcast-type policy implemented, and now I needed to add the necessary bookkeeping to use a different forwarding policy. A channel at its core is just a grouping of two remote endpoints, so I created this thing called a ChannelMap.

struct ChannelMap {
    channels: HashMap<SocketAddr, SocketAddr>,
}

When a new socket showed up, the router would try to add it to the channel map by passing in the new SocketAddr. The criteria for selecting the socket at the other end of a new channel are:

the other socket must not be part of a channel aready, and
the other socket must be from a different IP address

That second criterion is a bit weird, but without it you can create channels contained within this machine, which aren't useful.

And of course, the router had to choose to consult the channel map instead of using the broadcast policy when in channel mode.

To support the debugger client scenario where you need multiple outbound connections, I added a convenienece feature to create multiple outbound connections easily. The user can suffix "xNNN" to the port number, like localhost:55000x13 to create 13 outbound connections to the same host.

Applications that connect to a channel socket are expecting them to be transparent, meaning if they disconnect, it should disconnect all the way to the "server" end of the socket, so a socket in a channel disconnecting needs to "forward" that disconnection onwards to the other end of the channel. To allow connecting to the channel again, I had to add the ability to automatically reconnect closed outbound connections: the new -r argument, which is the analog of -L (listen again after client disconnection).

With these features, the channels scenario worked smoothly with windbg.

Socket address to route address

The ability to create multiple outgoing connections actually threw a wrinkle into the router. Above I pasted code that used a SocketAddr (the remote address) as a shorthand for a socket identifier, a way to figure out which socket produced an incoming piece of data. That doesn't work if you make multiple outgoing connections to the same remote host. See this spew:

>nc localhost:55000x3
Targets:
3x localhost:55000
    [::1]:55000
    127.0.0.1:55000

Connected from [::1]:50844 to [::1]:55000, protocol TCP, family IPv6
Connected from [::1]:50845 to [::1]:55000, protocol TCP, family IPv6
Connected from [::1]:50846 to [::1]:55000, protocol TCP, family IPv6

Here I'm connecting three times to the same remote host. Notice that the remote address is the same for all of them. If all I'm tracking on each socket is the remote address, how do I tell the difference between any packet originating from the remote address [::1]:55000? Right, I can't. I need to store a tuple of the local and remote addresses to uniquely identify a socket.

Not a big deal. I created a new type and used it in any place a SocketAddr was previously used to uniquely identify a socket.

struct RouteAddr {
    local: SocketAddr,
    peer: SocketAddr,
}

It did mean that now every piece of traffic flowing through the router had, umm, SIXTY FOUR BYTES extra attached to it!? Hold on, I'll be right back.

...four nights later...

Whew, I fixed that. I replaced it with a 2-byte route identifier. Though, as much as I was hoping it would show some improvement, especially when handling small packets, I wasn't able to measure any real difference. Either my laptop is too fast or it doesn't end up mattering.

It did add some complexity, since now I have to maintain a mapping between these short route IDs and the real route address, but I like having an identifier that doesn't also double as the socket address, so I'm going to keep it.

Removing mpsc channel per socket

Going back to the diagram of the router from before, you might have noticed an overabundance of mpsc channels. I wasn't kidding when I said I used them a lot. They were a very convenient tool for creating sinks and streams without fighting Rust too much: every socket got one, the router got one, and local input got one.

The router sends into each socket's channel, and the Stream side of it is sent to the socket using send_all. That's a pipe that only exists to make Rust happy. Each socket exposes a sink, called tokio::net::tcp::WriteHalf. It feels like it should be possible just to bypass the middleman and have the router send directly to the WriteHalf.

So I tried that. And promptly got suplexed by the borrow checker and/or lifetime errors (can't remember exactly what error I hit). I was passing the ReadHalf and WriteHalf to two different futures, and both require mutably borrowing the TcpStream. Just like when I tried to store futures in a Vec, it was never gonna work.

This was yet another case of not having the right tool for the job, which turned out to be TcpStream::into_split, which consumes the TcpStream instance and gives you "owned" versions of the read and write half. These can be passed around freely, since the original TcpStream object they came from has been consumed rather than borrowed. With that, I could remove a nice layer of queuing from the architecture.

With the removal of a queue, it of course also reduced memory usage in cases where the socket was producing data at a faster rate than the router could consume it. The "unbounded" version of the mspc channel allocates extra storage in this case, and, true to its name, memory usage sometimes grew to over 1 GB. Big yikes. Anyway, here's the new diagram.

mermaid / source

Oh, and making this change increased throughput about 2x.

Removing mpsc channel for local input

While writing this blog, that mpsc channel for local input also kept bugging me. Surely there has to be a way to remove that one too, right? The reason it is a channel is to have a unified model for all input modes. Each input mode is represented by a stream: the "random data" input mode is an iterator that produces random bytes, wrapped in a stream. The "fixed data" input mode is an iterator that produces the same value, wrapped as a stream. Likewise, reading from stdin came from a stream. So the router code would just do a send_all from the local input stream to its main sink and process local input just like any other socket.

In other words, I've been saying "all local input modes are a stream", but really what I've done is construct a model where I have to force all local input through streams. What if I can find a different commonality that fits more naturally without forcing?

What if I said "all local input is a future that sends to the router sink and ends when the local input is done?" Before, I had the type LocalIoStream that was defined like this:

type LocalIoStream<'a> = Pin<Box<dyn futures::Stream<Item = std::io::Result<Bytes>> + 'a>>;

In short, that's a stream that produces Bytes objects. I tried to change it from a stream to a future, like this:

// A future that represents work that drives the local input to completion. It is used with any `InputMode`, regardless
// of how input is obtained (stdin, random generation, etc.).
type LocalInputDriver = Pin<Box<dyn FusedFuture<Output = std::io::Result<()>>>>;

This is a future (task that completes asynchronously) with no result at the end, just notification that it completed.

It's slightly unfortunate that the type of LocalInputDriver doesn't imply anything about its functionality, like the fact that it's supposed to send data to the router sink. But it's all in the name of performance, so I can live with it.

In practice, creating a local input driver is usually the same as creating a local input stream, just with an added router_sink.send_all(&mut stream) call at the end of the future.

I mean, except for reading from actual stdin, which is done on a separate task with individual router_sink.send() calls, and the future ends when stdin hits EOF.

And with that, here's the final diagram of how netcrab is. Not a lot of cruft to cut out anymore.

mermaid / source

What about UDP?

I may not have said it explicitly, but everything I talked about before with the router was actually in the context of TCP sockets. UDP works a little differently, enough so that I annoyingly couldn't reuse the TcpRouter object and instead had to write almost the same router functionality again.

The main difference is that the TcpStream object implicitly embeds the local and remote addresses, whereas a UdpSocket only includes the local address. Well, you could constrain a UDP socket to have a single implicit destination by calling connect, but it prevents you from receiving from other destinations, so that's no good.

So whereas with TCP you have one separated stream for each remote endpoint and each one can end when a disconnect happens, with UDP you have a small set of locally-bound sockets that never end, and an association with a remote peer can be created at any time when the first packet arrives from it. Also, every UDP send needs to include the destination address.

It's just different enough that all the TCP router code can't be reused. I had to write analogous code for the UDP paths, heavily patterned off of the TCP router. Frustrating.

Listening and connecting

When I first started this project, there were four major "modes":

do_tcp_connect
do_udp_connect
do_tcp_listen
do_udp_listen

In other words, I had a clean separation between scenarios with outbound connections and ones with inbound connections. Once I had the router in place and a much better structure to the code, I could re-examine that. In an outbound connection scenario I basically resolve some hostnames, establish connections, and then throw the resultant streams into the router to manager. For inbound, I create some listening sockets and throw any inbound connection into the router.

Why not have one function that does both inbound and outbound, depending on arguments? If the user asks to listen on some port (-L), then start up the listening sockets. If the user passes hostnames to resolve, then do that. Feed everything into the router. With this change, I now have just do_tcp and do_udp.

Here's an example of a place where this proxy-like feature could come in handy. Say you want to force a certain application's connection to go out of a certain network adapter. Let's say that adapter has the local address 192.168.1.150. You can do netcrab --fm channels -s 192.168.1.150 -L *:55000 target-host:55000. That will set up a channel listening on port 55000 and forwarding to target-host over port 55000 using the local adapter with address 192.168.1.150.

Iterators are too fast

An interesting problem I ran into was that my input modes that involved iterators like rand and fixed produced data so quickly that they stalled all other processing. I don't quite know what policy caused this, but I found one workaround is inserting a yield_now call in every iterator step.

How big a `BytesMut` do I want?

I'll end with one last topic, which is kind of fun. Earlier I talked about using BytesMut to create a buffer, fill it, then turn it immutable and pass it around. One place this is used is when reading from stdin. The program lets the user choose what size of data to accumulate from stdin before sending it to the network: the "chunk size" or "send size".

In theory I could allocate a single BytesMut with enough space for several chunks and freeze only the most recently filled chunk, then start writing into the next chunk. BytesMut::split lets you do that. It costs CPU to allocate and free memory, so this would reduce the number of allocations. Here's basically what the scratch buffer then looks like.

// Allocate 4 chunks of space at a time.
let alloc_size = chunk_size * 4;
let mut read_buf = BytesMut::with_capacity(alloc_size);

// Manually set the length because we know we're going to fill it.
unsafe { read_buf.set_len(chunk_size) };

// ... Later, when sending a chunk:

// split() makes a Bytes with just the valid length but retains the remaining capacity in the BytesMut.
let next_chunk = read_buf.split().freeze();

// If we've used up all the capacity, allocate a new one.
if read_buf.capacity() == 0 {
    *read_buf = BytesMut::with_capacity(alloc_size);
}

unsafe { read_buf.set_len(chunk_size) };

I had this working fine. The memory usage looks big, but I was pushing hundreds of MB/sec read from disk, so that was expected.

Then I noticed a function called reserve. It has an interesting note that it will try to reclaim space from the buffer if it can find a region of the buffer that has no further Bytes objects still alive referring to it.

I thought this was pretty cool. Imagine not having to reallocate new chunks of memory, but instead automatically getting to reuse space you had previously allocated. So I swapped the with_capacity calls above for reserve to see if that trick ever kicked in.

Well, uh, the graph ended up looking like this instead.

So clearly my calling pattern with this buffer is such that I always have some overlapping use of it when it comes time to reserve more space, so it just grows and grows. And of course it didn't come with any perf benefit either, so I had to fall back to using with_capacity, which was just fine.

Conclusion

I wrote a lot. It kind of meandered. It was a tour through a bunch of the internals. It was written as much for me as for you (I fixed at least three things due to writing this). Am I a good Rust programmer now? Absolutely not. Did I learn something in the course of making netcrab? Absolutely yes. And I got a useful tool out of it, too.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Early Choices

Wrestling with user input

All these arguments

Customizing your sockets

Group talk expansion

More than one connection

Everything is sinks and streams

The router is also sinks and streams

Revisiting windbg remotes with hub mode

Introducing "channels" mode

Socket address to route address

Removing mpsc channel per socket

Removing mpsc channel for local input

What about UDP?

Listening and connecting

Iterators are too fast

How big a BytesMut do I want?

Conclusion

How big a `BytesMut` do I want?