DEV Community

loading...

Building a Digital Synthesizer Part 3: Envelopes and Debug Visualizations

ndesmic
I like to make fun web things from scratch. Ideally build-less, framework-less, infrastructure-less and free from the annoyances of my day job.
・10 min read

So far we've been able to make some sounds but we want to evolve those into instruments.

Cleanup

I found out that sampleRate is property on the AudioWorkletGlobalScope so you can use globalThis.sampleRate from inside the worklet to get it. No need for a parameter, we can remove all of them!

Envelopes

One of the properties of instruments is that they aren't just on/off. If you were to blow into a flute at first the flute would ramp up to it's full volume for a little bit. As the pressure inside releases it will stabilize to it's normal volume and as you stop blowing into it the pressure will subside. We want to recreate this for our key presses but we can play around with it more to make some more interesting sounding things.

When we go through these changes in volume (amplitude) it's called "enveloping." There are 4 distinct phases:

  • Attack (volume goes from 0 to max)
  • Decay (volume settles down from max normal output)
  • Sustain (volume is normal)
  • Release (volume decreases back to 0)

Different instruments can have different times and amplitudes associated with these values so we want to model that. For this I'm creating a function called envelope in the worklet that once passed all the necessary values can get the wave amplitude at a given time.

The envelope function signature

We need a few parameters, and since there's a lot it's easier to make them into a destructured object rather than positional parameters in my opinion:

const envelope = ({ attackMs, attackAmplitude, decayMs, sustainAmplitide, releaseMs, maxMs }) =>
  (downTime, upTime, time) => {  ... }
Enter fullscreen mode Exit fullscreen mode

This signature is a little complex so let's work it out:

  • attackMs (milisecond duration of attack phase)
  • attackAmplitude (the peak amplitude at the end of the attack phase)
  • decayMs (milisecond duration of decay phase)
  • sustainAmplitude (the amplitude of the sustain phase)
  • releaseMs (the milisecond duration of the release phase)
  • maxMs (optional, can be used for instruments that do no sustain like a drum)

These parameters then output a new function that takes some more parameters:

  • downTime (keyDown timestamp in miliseconds)
  • upTime (keyUp timestamp in miliseconds, optional as we might still be pressing the key)
  • time (the current time in miliseconds)

I decided to curry the function like this as I think in the future we might carry around the envelope functions as part of a reusable instrument. We'll see how that goes. The output of the function is the amplitude. As a general rule we want to normalize amplitudes between 0 and 1, so you your sustain is likely going to be less than the 1 we currently use. It'll still work fine if you don't but it'll make things easier to normalize.

Implementation of the envelope

const envelope = ({ attackMs, attackAmplitude, decayMs, sustainAmplitide, releaseMs, maxMs }) =>
    (downTime, upTime, time) => {
        let amplitude = 0;
        if (time >= downTime) {
            const envelopeTime = time - downTime;

            if (!upTime) {
                if (envelopeTime <= attackMs) {
                    amplitude = attackAmplitude * (envelopeTime / attackMs)
                } else if (envelopeTime < (decayMs + attackMs)) {
                    amplitude = attackAmplitude + ((sustainAmplitide - attackAmplitude) * ((envelopeTime - attackMs) / decayMs));
                } else {
                    amplitude = sustainAmplitide;
                }
            } else {
                const timeSinceRelease = time - upTime;
                if (timeSinceRelease < releaseMs) {
                    amplitude = sustainAmplitide + ((0 - sustainAmplitide) * (timeSinceRelease / releaseMs));
                } else {
                    amplitude = 0;
                }
            }

            if (maxMs && envelopeTime > maxMs) {
                amplitude = 0;
            }
        }
        if (amplitude < 0.001) {
            amplitude = 0;
        }
        return amplitude;
    };
Enter fullscreen mode Exit fullscreen mode

Ok so there's a lot going on here. First we see if the current time is greater or equal to the key down time. This should always be true so maybe that could go away. Next we get envelopeTime. This is the time elapsed inside the envelope and we're going to use the value several times.

Next we have a branch. If the key has not come back up, we're in the first part with the Attack, Decay, or Sustain phases. If we do have a key up time then we're in the Release phase. So for ADS, we see if we're in the attack phase and if we are we linearly interpolate the value between 0 and attackAmplitude based on how far into the phase we are. If not, we check if we're in the decay phase and again linearly interpolate between attackAmplitude and sustainAmplitude. If not, then we're in sustain and we can just use the sustainAmplitude. If we have a key up time, we want to know how long it's been up for. If it's within the release time we again linearly interpolate the value between sustainAmplitude and 0. Finally, if we're past the release phase then the amplitude is 0.

There's a final check to see it we've exceeded the maxMs. If so we're just going to stop the sound with amplitude 0 (this could be better). And the final bit is to check if the amplitude is really low at which point we round it down to 0.

Zero amplitude is important because we're going to use it to essentially garbage collect the notes.

Updating the note events

We need to change how we do events. Since the envelope causes the sound to live past the keyup event we can no longer track them in wc-synth, we need to move the currently playing note source of truth to tone-processor. So now we just get the noteDown and noteUp events.

For noteDown, first there's a deduping guard. We don't want to play the same note twice as that would just make a really loud version of that note if you mashed the key. If it's a new note we add it to the #playingNotes array. However an interesting thing can happen when we hit the release phase. Here the note is playing but the user could press the key again and then we want to ramp up the amplitude again. So for this we recycle the existing note in the array, update the down time and remove the up time. Unfortunately, this will leave us with an annoying defect, if you do press the key during the release phase you'll often hear a click which is the amplitude suddenly halting to start at zero. I don't have a good way around this yet. There's also a new private property #isSilent this will be used later but to give a heads up: since wc-synth will no longer know which notes are playing we'll need a new way to optimize the audio node disconnect during silence.

noteUp finds the existing note in the array and updates its upTime value. It could be possible in some scenarios to get a keyUp without a keyDown (like changing window contexts) so we should guard against that.

The times come from the current frame index. In order to actually get the millisecond time we need to divide by the sampleRate to get the time in seconds, and then multiply by 1000 to get milliseconds.

The process function

process(inputs, outputs, parameters) {
    if (this.#isSilent) return true;

        // Omitted

    output.forEach(channel => {
        const notesToRemove = [];
        for (let i = 0; i < channel.length; i++) {
            const time = this.#index / globalThis.sampleRate;
            const timeMs = time * 1000;
            let value = 0;
            for (const note of this.#playingNotes) {
                const amplitude = envelope({
                    attackMs: 100,
                    attackAmplitude: 1,
                    decayMs: 10,
                    sustainAmplitide: 0.8,
                    releaseMs: 100,
                    maxMs: null
                })(note.downTime, note.upTime, timeMs);
                const frequency = this.#baseFrequency * frequencyPowerBase ** noteIndex[note.note];
                if (amplitude === 0 && note.upTime) {
                    notesToRemove.push(note.note);
                }
                value += generatorFunction(frequency, time, amplitude);
            }
            channel[i] = value;
            this.#index++;
        }
        this.#playingNotes = this.#playingNotes.filter(n => !notesToRemove.includes(n.note));
        if (this.#playingNotes.length === 0) {
            this.#isSilent = true;
            this.port.postMessage({ type: "silence" });
            console.log("Silent")
        }
    });
    return true;
}
Enter fullscreen mode Exit fullscreen mode

The first thing we do is get the time in milliseconds like we did above. We call our new envelope function with parameters and you can play around with it to see how it sounds.

After getting the frequency and amplitude we check to see if the amplitude is zero and the key has been released. If so, then that note is done and we can remove it from the array. We need to check for the keyUp though because if we didn't, since all sounds start at 0, we'll basically cancel the sound before you can hear it.

Finally we pass the amplitude and frequency into the wave generator function we've selected and add up the waves for the different keys being pressed.

We have one last check which is to see if anything is pressed. If not we can save cycles by not running the process loop. In this case we emit a message to the main thread to disconnect the audio context from the output, essentially putting the tone-processor to sleep until the next key is pressed.

Debugging

As I was building the envelope I had some bugs and it's quite difficult to tell what's going on which just audio. If you mess something up you're likely to get harsh clicks and pops but where they occur is more difficult to tell.

Perhaps the first instinct is to use a breakpoint, but with 48,000 samples per second you're going to be pressing the resume button a lot. The next thought is to use console log to observe the amplitude. Again with so many samples per second the console log slows everything to a crawl, it's simply too much to handle even if you could make sense of all that data.

In order to get a better sense of the wave envelope we probably want something more visual, the waveform. The way I decided to do this was to start recording audio frames when I press the key and a little while after I lift the key it will stop recording and then export the data out to the main thread for graphing. Note that this whole thing is hacked together, the performance is horrendous but it at least allows us to get the sense of what's going on.

//wc-synth
async play(note) {
    this.toneNode.connect(this.context.destination);
    this.toneNode.port.postMessage({ type: "startDebugCapture" });
    this.toneNode.port.postMessage({ type: "noteDown", note });
}
async stop(note) {
    setTimeout(() => {
        this.toneNode.port.postMessage({ type: "endDebugCapture" });
    }, 500);
    this.toneNode.port.postMessage({ type: "noteUp", note });
}
Enter fullscreen mode Exit fullscreen mode

I'm adding the new message. The setTimeout is because we want to capture the release period which ends sometime after I release the key. Without knowledge on the main thread of how long that is, I'm just taking a long enough sample and can adjust as needed.

//tone-processer.js
onMessage(e){
        switch(e.data.type){
            /* ... */
            case "startDebugCapture": {
                this.#debugFrames = [];
                this.#debug = true;
                console.log("Capturing debug data.");
                break;
            }
            case "endDebugCapture":{
                                this.#debug = false;
                this.port.postMessage({ type: "debugInfo", data: this.#debugFrames });
                console.log("Ending debug data.");
                break;
            }
            /* ... */
        }
    }
Enter fullscreen mode Exit fullscreen mode

We're just listening for those messages. When we get a startDebugCapture we init a new array to hold the sample data and a boolean to know we're recording. On end, we simply pass the data through. On endDebugCaputure we set #debug back to false, and post all the data back to the main thread. There's numerous optimizations we could make here such as making the data a transferable type so we don't need to copy all the data. That requires converting it into an ArrayBuffer and normalizing the frames from 0-255 and then converting them back on the other side which is overkill for a debug feature (interestingly this process of transferring blocks of ints is roughly how data is pushed actual sound card under-the-hood).

//tone-process.js

process(inputs, outputs, parameters){
  /* ... */
  channel[i] = value;
  if(this.#debug){ //< ---new
    this.#debugFrames.push(value);
  }
  this.#index++;
  /* ... */
}
Enter fullscreen mode Exit fullscreen mode

In the process method, right after we've assigned a frame value we also write that value to the #debugFrames array if we're in debug mode. Now we can just listen for the debugInfo event back in the main thread:

//wc-synth.js
async onAudioMessage(e){
    switch(e.data.type){
        case "silence": {
            this.toneNode.disconnect(this.context.destination);
            break;
        }
        case "debugInfo": {
            const svgGraph = document.createElement("wc-svg-line-graph");
            svgGraph.width = 1280;
            svgGraph.height = 720;
            svgGraph.ymax = 1;
            svgGraph.ymin = -1;
            svgGraph.xmin = 0;
            const transformedData = e.data.data.filter((x, i) => (i % 50) === 0).map((x, i) => [i, x]);
            svgGraph.xmax = transformedData.length;
            svgGraph.points = transformedData;
            this.dom.debug.innerHTML = "";
            this.dom.debug.appendChild(svgGraph);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

wc-svg-line-graph is a custom element I whipped up that can draw simple SVG graphs. Its implementation is beyond the scope of this post but the attributes are straightforward. It has a width and height as well as min and max values for x and y. The data point format needs to be and array of arrays where each inner array has vertex info (x, y, color, size, shape). Default values are given for things not supplied but they do need to be at least tuples of x,y coordinates which is why they are mapped. In this case x is just the frame index as we care less about the time but more about the shape. The other thing I do is mod the index and only keep every 50th frame. With so many frames graphing them all is slow and does not help our understanding so I throw a bunch out (this could also be done at the capture level for better performance but it's just easier here). This may have issues if the wave-form is really dense or the sample is long. If you want a perfect 1 sample per pixel chart then you can mod by the following value const frameLimit = Math.floor(e.data.data.length / 720); where 720 is the visual width. You can also change that width to fit more data.

Of course this is literally a waveform visualizer so there are more purposes beyond debugging.

I used this to debug some pops that was getting with the audio:

image

Here we can see that we have an envelope but the decay part has amplitude 0, a bug of some sort. This sudden period of no sound will cause pops but it would be very hard to tell that from audio and console alone especially since decay is a short period. I can also see that even though we captured 500ms of extra audio on the tail I don't seem to see the release phase.

These things weren't too tough to fix once I understood what the problem was. It was just fixing some of the conditions for those phases.

With all that we have properly working enveloping.

You can find the code here: https://github.com/ndesmic/web-synth/tree/v0.3

And the up-to-date demo here: https://gh.ndesmic.com/web-synth/

Discussion (0)