Dmitry Dvoinikov

Posted on Jul 27, 2020 • Edited on Aug 3, 2020

Learning FPGA programming, key points for a software developer (part 2, registered logic)

#fpga #vhdl #hardware #programming

The first part of this article discussed the importance of time in an electrical circuit, of which a VHDL program is a schematic. You may want to read it before proceeding.

As a summary, an electrical circuit depends on the coordinated delivery of its input signals, before it could produce a valid output.

On the face of it, we could wire any number of circuits together, activate the top-level inputs, and then just wait for the entire maze to flush through and yield stable output after some time.

It could work for some circuits, and when it does work, this approach is the fastest, I guess, because it's basically limited with the speed of light. But it doesn't work in all cases.

One reason why it doesn't, is that circuits can be so twisted that they simply won't stabilize over time, and never provide output.

The simplest such case is a feedback loop, when a circuit's output is directly or indirectly plugged into its input. Under such conditions, the circuit could go into an indeterminate state forever.

A combinatorial approach whereby you just wire blocks together thus has its limits. And the problem of stable inputs arriving to a circuit at the same time is still there. How is it solved ?

To ensure that the signals that cross boundaries between combinatorial blocks are stable and synchronous, they are held in a buffer between the blocks, until all of them arrive and stabilize. This buffer is called register.

This looks good but now we have a different problem. How does the register know for how long to hold its input and when to pass it through ? It is nothing but a wired circuit itself after all.

To solve it, a clock is introduced.

A clock is an external source of a periodic pulse, connected to every register in the system with a dedicated line.

All the registers in the system pass their input to output at clock tick, thus conceptually at the same time.

When we decouple multiple combinatorial circuits with registers, we get a bigger registered or synchronous circuit. The are two important things to observe about it.

First, there is still no guarantee that clock tick arrives at the moment when output is stable, because combinatorial circuits generate it asynchronously, with no respect to any clock. This will still fail to pass the correct value:

To make sure it does not happen, the combinatorial components must play by the clock rules and have their outputs stabilized after the last tick, but before the next. This effectively means two things:

They should only accept registered signals for their inputs. Any input signal not synchronized with the same clock is a risk to stability.
They must be simple enough to propagate the signal in the limited time they have between the clock ticks.

Second, the earliest moment when a value will appear at the other side of the register is at the start of next tick, even though it became available long before that.

Therefore a registered circuit makes a pipeline, in which signal values propagate one leg at a time, and not immediately from start to finish as before. And the earliest moment when a sequence of N registered components will produce the final output is N times tick duration.

From the above pipeline you could see that we get in 300 nanoseconds what we could otherwise get in 3, and if we wait for the output, then at any time 3/4 of the circuit is not doing any productive job. That's a lot of waste but it is the price to pay for stability.

Now, this was all about hardware.

From the software developer's perspective, there are a few things to notice.

The most important thing is that when you are working with registered signals, you are, figuratively speaking, turning yesterday's values into tomorrow's values.

Any registered signal assignment will have effect only in the next tick, even observed by the same component.

Another important thing is that

registered signal exchange between components takes very significant amount of time.

In software, we assume that calls take no time compared to the processing. We factor our programs freely into processing modules of all kinds, components, classes, functions, along the domain objects, and call one from another without second thought. But in hardware, it's the opposite: processing happens at the speed of light, but every registered signal exchange adds an extra clock cycle, thus decimating the performance.

Consider the following simplest case of client-server exchange. The client waits for a ready signal from the server, then sends a request and waits for a response.

Here is the code sample:

Its timing diagram:

And the legend:

Both the client and the server come out of reset and the server immediately asserts ready.
The client sees that the server is ready and immediately provides the input.
The server sees the input and immediately responds.
The client sees the response and immediately passes it on.

As you can see, despite all the reactions being immediate, the registers only pass the updated value signal at the next tick. Therefore, should you have inlined the calculation in the client, it would have been immediate, but when you delegate it to a different registered component, it takes 4 clock cycles.

Factoring a VHDL program into components should be based on physical composition, not on logical delegation.

Another thing, in software, the basic concept is that code is executed by a thread, a logical entity that follows the code like a vinyl player needle. And then we could start any number of threads playing the same record independently.

In hardware it is once again the opposite. Now it is the values that flow through the program, and it is the front of that wave that is its current state. You can't just run a second wave through the same wires, they will cancel each other. To increase parallelism in a registered circuit, we notice that every register holds the flow like a floodgate. Then it is possible to have multiple waves coming one after another, separated by as much as a single tick and they will not conflict.

Finally, from the most basic coding standpoint, registers are invisible. Just like the flow of time from the first part of this article, they do not appear in the code. And you just have to see them where they are supposed to be, to understand how the circuit behaves. I will address this in more details in the next part of this article:

Part 3: Code patterns and inferred behavior

Thank you for reading !

DEV Community

Learning FPGA programming, key points for a software developer (part 2, registered logic)

Top comments (0)

Read next

Requesting camera and microphone permission in an Electron app

15 System Design Resources for Interviews (including Cheat Sheets)

7 Powerful Python Performance Optimization Techniques for Faster Code

ASP.NET Interview Questions: Part 1 - (10 Q & A)