DEV Community

loading...

A novel way to approach digital audio

mikesol profile image Mike Solomon Updated on ・9 min read

Recently, I've been enjoying working with Phil Freeman's purescript-behaviors to do audio programming. The idea, which I'm developing in a GitHub project called purescript-audio-behaviors, is that you can work with audio signals using functional reactive programming. A basic example looks like this:

sinOsc 440
  + sinOsc 880
  + (if mouseDown then sinOsc 220 else sinOsc 234)

Here, we sum up three sine waves, with the third one changing frequency depending if the mouse is down or not.

Treating audio as a temporal stream that can interact with other functions of time, like if the mouse is down, is my favorite way to think about audio programming because that's how I experience sound in the physical world.

In order to make sound with PureScript, I use the amazingly full-featured and fast WebAudio API. In the browser, you can accomplish digital signal processing (DSP) tasks that used to only be possible in dedicated software like Max and SuperCollider, including synthesizing sounds from FFTs. The framework's ergonomics are clunky, however, and working with anything other than a "hello world" example requires extensive bookkeeping and routing of objects.

The reason that WebAudio uses this pattern is because it makes a lot of sense from a signal-flow perspective. One audio unit ingests an array of floats from another unit, processes it, and passes the result along. An object-oriented apporach also makes it easier to split audio into parallelizable chunks, which is a must-have to hit the ~3ms deadline for processing a batch of 128 samples at a sample rate of 44100/second.

This article attempts to unify the two approaches to audio: the functional-reactive-programming (hereafter FRP) approach and the object-oriented approach (hereafter OOP). I will first present the two approaches in detail, showing how FRP is great for expressivity and terrible for computational efficiency whereas the object-oriented approach has the opposite qualities. I will then show how the two concepts can be unified using quadratic optimization. Lastly, I will show how to make this process fast thanks to certain shortcuts we can take.

The result will be the ability to use the FRP style (ie sinosc 440 + sinosc 440) in WebAudio without the performance penalty. Don't believe me? Read on!

FRP vs OOP - the good, the bad and the ugly

In audio programming, FRP and OOP are the exact opposite. FRP is elegant to read/write but impossible for realtime progrmaming, whereas object-ortiented programming is lighting fast but awful to read, write and maintain.

Expressivity

Let's start with an audio graph that adds together five or seven overtones of the pitch 220 depending on if the mouse is clicked or not. The overtones will diminish in intensity, with the first starting at 0.5, the second at 0.25 etc. Here is the PureScript version:

speaker $
  add
    (map
      (gain (0.5 / _) <<< sinosc <<< (220 * _) <<< toNumber)
      range 1 (if mouseDown then 5 else 7))

Here is around 1/5 of the WebAudio version:

var ctx = new (window.AudioContext || window.webkitAudioContext)()
var audio0 = null
var audio1 = null
var audio2 = null
var audio3 = null
var audio4 = null
var audio5 = null
var audio6 = null
var gain = gainNode.connect(ctx.destination)
function initializeAudio(n) {
    if (!audio0) {
        audio0 = ctx.createOscillator()
    }
    // ....
}
document.body.onmousedown = function () {
    initializeAudio(5)
    // ... more stuff
}
document.body.onmouseup = function () {
    initializeAudio(7)
    // ... more stuff
}

Functional-reactive-programming in PureScript using purescript-behaviors is the clear winner for expressivity, especially as you start to create more intricate structures. Aside from being shorter and easier to follow, it does not require any explicit allocation, deallocation, or connecting of resources.

Computation

On the other hand, imagine that the PureScript version were actually used for computation. Instead of each of the sine waves running in parallel (as WebAudio does in its implemenation), all of the computations would be done sequentially in the same function. Furthermore, instead of each computation happening on a bloc of memory (ie 128 samples per bloc), we are processing samples one-by-one, leading to a lot of temporary object creation and garbage collection.

These slow-downs make the PureScript version, if writing directly to the speaker, execute 1000 times slower than the JS version. Yes, 1000. I tried it. That means that, for all but the most trivial of audio projects, there will be stuttering.

The provisional winner

In the audio world, what we care about is the quality of the result. So, even though the WebAudio version is a pain to write, it's the de facto choice.

But what if...

...we could have both the expressivity of FRP and the speed of WebAudio?

To do so, we would need some translation mechanism from the world of FRP to WebAudio primitives. Let's take a simple example to illustrate the problem: adding two sine waves with variable frequency.

gain 0.5 (sinosc (440 + 30 * sin x) + sinosc (441 + 20 * sin x))

Each of these sine waves will have a warble effect due to the varying frequency (from the sin x), with the first warble being faster than the second. Furthermore, the second is slightly higher in pitch, creating another small warble.

Now, imagine that each sine wav in the PureScript is linked to an OscillatorNode in WebAudio. How, at any time t, do we know which oscillator to link to a given sinosc? One solution would be to give each audio unit a unique identifier, ie:

gain
  "myGain"
  0.5
  (plus
    "myPlus"
    (sinosc "leftSine" (440 + 30 * sin x))
    (sinosc "rightSine" (441 + 20 * sin x)))

While this does the trick, it has several drawbacks:

  • It requires scrupulous accounting to make sure there is no double-naming.
  • It makes authoring plugins challenging because of the potential for naming conflicts across packages.
  • Infix operators are out. We have to use plus instead of +.
  • It is longer to type, which kills one's creative flow.

What would be ideal is if the underlying library could do this accounting for us. Going back to:

gain 0.5 (sinosc (440 + 30 * sin x) + sinosc (441 + 20 * sin x))

We would expect the library to create two oscillators and "just know" which one is which.

Assignment problem

This is a variation on the classic assignment problem. Audio units at the previous time step have to be assigned to audio units at the current timestep. To do this, we need to rank all possible assignments based on a cost function and choose the best one. A good cost function should accomplish two criteria:

  1. Audio units with similar values should map to each other. So, for example, if step 1 has two oscillators with frequencies 440 and 880, and step 2 has two oscillators with frequencies 442 and 884, it would be logical that 440 maps to 442 and 880 to 884.
  2. The audio graph should be rearranged as litle as possible. So, for example, if there are two filters in a certain order at time t-1 and two comparable filters appear at time t, the order should be preserved.

Of these two constraints, the second one is more important. Violating the first will cause a small glitch or interruption whereas the second will cause a giant pop. Changing the order of audio units (ie swapping two reverbs) is like rearranging space, and the human ear doesn't take kindly to fast changes in space (that's what we hear, for example, during an explosion).

So what type of optimization problem is this? Binary linear optimization is what is traditionally used for the assignment problem, and it can be quite speedy.

Let's revisit the two criteria above to see how we can express them in terms of binary linear optimization.

  1. Audio units with similar values should map to each other: In our sine wave example above, there are four possible mappings and we can only keep two. Let's call them o440_442, o440_884, o880_442, o880_884. So the function to optimize (find the minimum of is) 2 * o440_442 + 444 * o440_884 + 438 * o880_442 + 4 * o880_884 subject to the constraints o440_442 + o440_884 == 1, o880_442 + o880_884 == 1, o440_442, o440_884, o880_442, o880_884 >=0. A linear solver can do this.
  2. The audio graph should be rearranged as litle as possible: Imagine two IIR filters with the exact same impulse response but at different points in space (time): filta and filtb. The possible mappings are given by filta_a, filta_b, filtb_a, and filtb_b. The outcome is dependent on two variables. In other words, it is quadratic. The constraint could be formalized like this: filtb_a * filta_b * 1000.0. Meaning "if b is moved to before a and a is moved after b, then issue a large penalty." Here, the constraint can actually be simplified because there are only two nodes, so b_a does not encode any information that a_b doesn't encode. However, for systems with many nodes, the quadratic constraint is necessary. Another way to think of it is that, if constraint #1 represents mapping nodes to nodes, constraint #2 represents mapping edges to edges. Edges represent the interaction of two nodes and are thus quadratic.

Quadratic optimization

Fortunately, quadratic optimization problems are possible to solve. CVXOPT, for example, has moderately fast quadratic solving capabilities, and it is possible to port these over to the browser for someone with the patience and willingness to work with web-assembly. But, luckily for us, that is not necessary here. In the next section, I will show how this particular quadratic problem is soluble using vanilla linear programming via GLPK.

From quadratic to linear

In order to turn the quadratic problem above into a linear one, we have to do a series of manipulations. But first, to set the stage, let's write a bit more pseudocode. I'm going to write it using the same syntax as PuLP, a popular python linear optimizer. Note that PuLP won't work here because of the quadratic constraint, but the code is easier to follow than using CVXOPT.

from pulp import *

# time t-1
"""
m    --|
o440 --|--filt1.1--filt1.2--speaker
o880 --|
"""

# time t
"""
m    --|
o330 --|--filt1.2--reverb--filt1.1--speaker
o970 --|
"""

m = LpVariable("mic", cat="Binary")
o440_o330 = LpVariable("o440_o330", cat="Binary")
o440_o970 = LpVariable("o440_o970", cat="Binary")
o880_o330 = LpVariable("o880_o330", cat="Binary")
o880_o970 = LpVariable("o880_o970", cat="Binary")
filt_12_11 = LpVariable("filt_11_12", cat="Binary")
filt_11_11 = LpVariable("filt_11_11", cat="Binary")
filt_11_12 = LpVariable("filt_11_12", cat="Binary")
filt_12_12 = LpVariable("filt_12_12", cat="Binary")
reverb = LpVariable("reverb", cat="Binary")
speaker = LpVariable("speaker", cat="Binary")

# We attach a large coefficient to represent
# the edge being torn if the filter is moved
# back in time. As explained above, this is quadratic
# because we are multiplying two unknowns.
LCOEF = 100
linkage_constraints = (
    ((o440_o330 * filt_11_11) * LCOEF)
    + ((o440_o970 * filt_11_11) * LCOEF)
    + ((o880_o330 * filt_11_11) * LCOEF)
    + ((o880_o970 * filt_11_11) * LCOEF)
)

OCOEF = 0.05
FCOEF = 0.1

# The constraints for the values of the oscillator are linear
# as they are "node" constraints
osc_constraints = (
    (o440_o330 * (440 - 330) * OCOEF)
    + (o440_o970 * (970 - 440) * OCOEF)
    + (o880_o330 * (880 - 330) * OCOEF)
    + (o880_o970 * (970 - 880) * OCOEF)
)

# The constraints for the values of the filter are linear
# as they are "node" constraints
filt_constraints = (
    (filt_11_11 * 0 * FCOEF)
    + (filt_11_12 * 0.1 * FCOEF)
    + (filt_12_11 * 0.1 * FCOEF)
    + (filt_12_12 * 0 * FCOEF)
)

prob = LpProblem("myProblem", LpMinimize)
prob += linkage_constraints + osc_constraints + filt_constraints
prob += o440_o330 + o440_o970 == 1
prob += o880_o330 + o880_o970 == 1
prob += filt_11_11 + filt_11_12 == 1
prob += filt_12_11 + filt_12_12 == 1
prob += m == 1
prob += reverb == 1
prob += speaker == 1

status = prob.solve()
print(status)

As mentioned, PuLP will compile the code above but crash because it is not linear due to the pesky multiplication of o440_o330 * filt_11_11, etc in the objective function. So how can we fix this?

The trick is to note that, because we are using mixed-integer linear programming, we can create the following equivalency between nodes and edges iff the variables are integers:

0 <= edge <= 1
0 <= startNode <= 1
0 <= endNode <= 1
edge <= 0.5 * startNode + 0.5 * endNode

Let's think about what this is saying. If startNode and endNode are not present, then edge must be 0. If either of the nodes are present, then edge also must be 0 because it is an integer between 0 and 1 that is less than 0.5, so 0 by definition. Only if startNode and endNode are present can it possibly be 1. Then, we add an additional constraint summing all possible edge candidates to 1, which will guarantee that if a candidate is chosen, then its start and end node must be present.

Technically, any coefficient between 0.5 and 0.99999 will do. Below is the output from purescript-audio-behaviors that is fed to GLPK to do this calculation. In the output, I choose a coefficient of 0.75 to avoid rounding error. It is long and boring, but the important bit is that all of those constraints are there and are solved correctly. You can see the 0.75 constraints towards the very end. To see the type of mapping it performs, you can check out the unit tests.

Conclusion

In this article, I have shown how using FRP in audio programming is more expressive and elegant than other methods of working with audio. At the same time, FRP has significant disadvantages when it comes to computational speed.

To remedy this problem, it is possible to reconcile an audio graph at time t-1 with an audio graph at time t by optimizing for the least amount of changes possible to morph one graph into another.

To accomplish this morphing, there are two things that need to be considered: minimizing the change of any given audio unit (ie the change of frequency of a sine wave) and minimizing the change in order of units (ie the order in which filters are chained). Of the two, the more important constraint from an auditory perspective is the order in which units are connected.

The change of a single node can be represented by a linear constraint, whereas the change of a connection between nodes (an edge) must be represented by a quadratic constraint. Fortunately, we can do a few tricks to turn the quadratic problem into one solvable by Lagrangian multipliers and LU decomposition over integers.

If you're interested in following or contributing to purescript-audio-behavior, please visit the repo!

Discussion (0)

pic
Editor guide