loading...

My Dream Hardware #1: Write Buffer

joeyhub profile image Joey Hernández ・7 min read

I quite often find myself in a situation where I'm considering performance and imagine hardware that might help. In often cases hardware that shouldn't be expensive because it exists at a very cheap rate in other things as the same set of ingredients but the wrong recipe.

Another version of this is why do things sometimes seem more expensive than they need to be, for example, RAID controllers and NAS systems.

The various reasons might be complicated but either way these are things that I think are possible that appear lacking in the real world.

Is this "Write Buffer?"

A write buffer or write cache is typically used to buffer writes to block storage (hard drives).

Block devices are often limited in terms of performance and writes can be particularly sensitive.

Block device can often only write in large blocks. Such as 4KB or 8KB at a time. In some cases this is even more extreme, it may be megabytes. To change one single byte means having to write thousands or millions of bytes which is an enormous overheadache.

You might look at a block device and see it as reporting 500MB/s rate. Then compare that to RAM, for example, 30GB/s and it might appear 60 times slower. Because of the overhead of blocks, that might work out as potentially a hundred thousand times slower for certain write patterns because of the overhead. Even if it's reaching 500MB/s, which it might not (that is, 500MB/s might be the maximum theoretical), most of it might be overhead.

A write buffer is typically used to optimise writes to a device to attempt to reduce overhead and make more efficient use of the device.

A write buffer can deduplicate writes. If there are two writes to the same block in short succession, a write buffer can be used to discard the earlier out of date block to be written and write only the latest block.

A write buffer might also reorder writes as many devices can reach a speed closer to their theoretical maximum if writes are in sequential order or blocks that are close together on the device are also written close together.

A write buffer may also be used to allow software to resume operation rather than waiting for a slow write.

A write buffer only needs a portion of RAM at the minimum. It is however very hard to get the most out of a write buffer without sacrificing safety. anything in RAM, including the write buffer, is lost if the machine loses power or otherwise malfunctions.

Typically when software asks the system to save something and the system tells the software it is saved, the software then expects that data is now safely stored on persistent storage and will be there even if the machine is powered down and then up again.

A write buffer in effect may lie to the software to say it is written when in fact it is only buffered in RAM queued to be written. If instead it waits until the buffer is flushed then it may impose a latency penalty on software.

It is also hard for it to fully deduplicate and reorder safely. Typically if you copy a file and the machine suddenly loses power halfway through, you would expect it to fail in a reasonable way. For example, the file might be only half as long as intended. With reordering you might end up with the filing also being corrupt. For example, rather than missing some amount off the end, missing a bit off the middle or containing a portion of random data.

Database software often uses far more complex write patterns than copy (append) and rely heavily on write order to ensure that in the event of being interrupted the database file is still left in a usable state rather than becoming corrupt.

While it's possible to have write buffers using normal system RAM, they're far more constrained if they don't have power loss protection.

Power loss protection for write buffers usually means having them as a separate portion of RAM to main RAM and equipped with batteries/capacitors. Some may also have flash so that in the worst case they can flush their content and reload before the battery runs out.

Optimising writes can be unexpectedly difficult. The problem with writes is that sometimes you're forced to be inefficient and write too many times because safety first, you need to be sure the data is safely saved. You might have to save the same thing many times in short succession and if you try to optimise out or skip a beat then that'll get lost on power lost.

What's the problem then?

The problem is, it's write buffer.

It's a bit like going into a universe where everyone calls scissors hair cutters. That's one possible practical application of scissors. You want to cut some paper and point towards the hair cutters for someone to pass them to you. You're told sorry, those are hair cutters, you can't use them.

What I really want is the thing we use to be able to have a power loss safe write buffer to do with as I please.

The Murder Weapon

You could argue such hardware does exist but I find it's not as cheap, simple, generally available or flexible as it could be.

What we'd want is "non-volatile RAM" which exists in many flavours but I've not seen it take off in a way that would be generally useful to me.

Implementations are often specifically purposed for being write buffers to block devices. I believe it would be more useful to provide power loss safe RAM to the software and let it use it how it wants to.

As far as I know, there's not a common standard either for generic power loss safe memory. For example, to track allocations, how it would present as a hardware device, be selected, track and report failures, etc. Plug and play.

Solid State drives today do often include write buffers that are capacitor backed though this reduces the benefit from write buffers elsewhere it does not eliminate it.

In theory it may be possible to modify firmware or write drivers to access this though it may be ill advised to hijack the write buffers from other hardware that intends it for its own purposes.

The solution...

I think it would be nice to have it almost as standard to have power safe RAM on motherboards for servers that can be used however the software (or kernel) wants.

Alternatively as an addon in a fashion that's generic and as simple to use as it is with any other generic module such, without vendor lock in or overreaching propriety protocols.

We'd probably use it as a write cache among other things, but it would be unassuming and we could use it as we like.

Thought Experiment

If you want a cheap write buffer, get two cheap android phones $50, make your own app for them providing write cache over USB or even exploit something pre-existing and use them as RAID 1 (mirrored) over USB2 write caches that are:

  • $50 each ($100)
  • 1GB RAM
  • Battery Backed
  • Flash Backed
  • 60MB/s USB2

That might be some what expensive and critically the USB2 as usually being the fastest external bus is too slow.

The phone is a recipe with many unneeded ingredients. We don't need the screen, case, SoC, radio/wireless, camera, external interfaces, etc. It doesn't need all of the flash memory and it doesn't need all of the battery.

If it's also no longer an external device the USB2 issue is easily solved as the internal buses will already be faster. That is, the USB2 port on the phone's motherboard is like turning of the motorway onto a country lane.

Our real price for one power loss safe memory module or two in mirrored mode isn't going to be anywhere near $50 and $100 respectively.

Being very generous, $15 for a single piece could be commercially viable. Considering server boards and workstations cost considerably more that that and are expected to last for many years then it's a very small cost.

Gain?

It might take a while to pick up. It might require sending samples to database developers but otherwise if you make the drivers available open source for Linux, make the hardware easily available, then they will come.

A piece of power safe RAM, while it's main use would likely be to optimise writes to block devices, has a great deal of flexibility. I wouldn't advise it but it could also be abused as a small piece very fast persistent memory.

While it might not be as fast as main RAM, compared to typical SSDs it would still be much faster. Not only because memory bandwidths of GB/s are common but also because it doesn't have the overhead of writing huge blocks of thousands of bytes when you only need to save a few bytes.

Once software can use it then the software doesn't have to put the resulting blocks from an operation, only the original operation or data that produces subsequent writes. Its speed also makes it more viable for readback.

It would fill a small niche but as I see it the application for it is there. There's only so much storage solution write buffers can achieve as they're not aware of software specifics. That it could be done cheaply also lowers risk.

This is the kind of thing that probably should have happened ten years ago. However even with SSDs posing a lot of competition, their write rates in the worst case even for NVMe still sneak all the way down to under two hundred MB/s in the worst case versus a worst case for a RAM chip of several GB/s.

It's hard to imagine that solid state block devices could ever keep up because they're block devices which will always suffer against small writes (such as high frequency counters, etc) and being unaware of what's happening in software. The technology would have to improve dramatically to keep up.

I also suspect many servers have wasted PCI lanes which unless you have GPUs strapped in (somewhat niche application), which a lot of people don't, go to waste.

There are many products out there but I've never seen anything cheap, cheerful, small, generic, easy to use and generally available.

Discussion

pic
Editor guide