ndesmic

Posted on Jan 3, 2023 • Edited on Jan 4, 2023

Playing with Image Convolution Filters

#vanillajs #computergraphics #webgl

Last episode, I explored different ways to change a bump map into a normal map and we touched a bit on convolution. I figured it needed a bit more expansion but it's not specifically related to our 3d engine. This technique is how you can programmatically sharpen and blur images as well as detect edges among other things.

Intro to convolution

To re-iterate from last time, convolution is when for each pixel in an image you use a matrix which provides the weight of the surrounding pixels (the kernel), you multiply the pixel values by their position in the kernel and then add them all up to get the final pixel value.

In this figure the pixel we are calculating is highlighted in red. White pixels are 1.0 and black pixels are 0.0 to avoid confusion. We grab the 8 surrounding pixels as well as the pixel itself. Then by position, we multiply each pixel by the kernel and finally, we just add them all up. Nothing too complicated. In this example the specific kernel is called the laplacian which is a type of edge detection. If we apply this to all the edges the result will look like this:

Edge behavior

One thing you might have noticed is that pixels around the edges can be a bit special. When we're at the edge and the kernel expands outside we have different choices we could make:

Value

The simplest thing we can do is simply assign a value to out-of-bounds indices. A good value to try might be 0 or 1 or maybe even 0.5 depending on what you're going for.

Clamp

Clamping is probably the next easiest. We simply constrain the range to be within the bounds. If it goes out-of-bounds we set it to either the min or max value:



function clamp(value, min = Number.MIN_SAFE_INTEGER, max = Number.MAX_SAFE_INTEGER){
    return Math.max(Math.min(value, max), min);
}

Wrapping

Wrapping is where we pretend the pattern repeats indefinitely at the edges. So if we read max + 3 then we'd get 2 and if we read min - 3 and the max was 100 we'd get 97. You may think this is just the modulus :



const wrappedValue = value % max;

Which will work provided two things:

The min value is 0
The index is not negative

The first case is easy to solve:



const range = max - min;
const wrappedValue = min + value % range;

But this will not work for negative indices.

If you look this problem up you'll probably find something similar to:



const wrappedValue = (value % max + max) % max;

This will work for negative indexes but assumes that the min is zero. This is actually fine for most cases in image manipulation so you may use it. But, you'll probably find it doesn't work right if you have a minimum that's not 0.



function wrap(value, min = Number.MIN_SAFE_INTEGER, max = Number.MAX_SAFE_INTEGER){
    const range = max - min;
    return value < min
        ? max - Math.abs(min - value) % range
        : min + (value + range) % range;
}

This was the simplest version I could come up with that handles both. Note that min values are still odd and change the way value is understood. Is it a distance in the range or an absolute value and we discard a bit to get into range? It really depends and in this case it's the latter.

Mirroring

This is similar to wrapping except when you hit the ends you basically reflect off the end instead of wrapping around. You can also think of it like folding back and forth between two endpoints. It's also a tad more complicated like wrapping so you need to be sure you understand what value is.



function mirror(value, min = Number.MIN_SAFE_INTEGER, max = Number.MAX_SAFE_INTEGER){
    const range = max - min;
    const minDistance = Math.abs(min - value);
    const intervalValue = minDistance % range;
    if (value % (max + max) > max) return max - intervalValue //too high (mirrored)
    if (value >= max) return min + intervalValue; //to high (unmirrored)
    if (value < min && minDistance % (range + range) > range) return max - intervalValue; //too low (mirrored)
    if (value <= min) return min + intervalValue; //to low (mirrored)
    return value;
}

I could only figure it out with 5 cases, high mirrored and unmirrored, low mirrored and unmirrored and in-range. The behaviors you get from these functions may not be obvious:



describe("wrap", () => {
    it("should get in range", () => {
        expect(wrap(70, 0, 100)).toEqual(70);
    });
    it("should wrap high", () => {
        expect(wrap(170, 0, 100)).toEqual(70);
    });
    it("should wrap high with non-zero min", () => {
        expect(wrap(170, 10, 100)).toEqual(90);
    });
    it("should wrap low", () => {
        expect(wrap(-7, 0, 100)).toEqual(93);
    });
    it("should wrap low with non-zero min", () => {
        expect(wrap(3, 10, 100)).toEqual(93);
    });
});
describe("mirrorWrap", () => {
    it("should get in range", () => {
        expect(mirrorWrap(70, 0, 100)).toEqual(70);
    });
    it("should mirror high (mirrored)", () => {
        expect(mirrorWrap(170, 0, 100)).toEqual(30);
    });
    it("should mirror high (unmirrored)", () => {
        expect(mirrorWrap(210, 0, 100)).toEqual(10);
    });
    it("should mirror high with non-zero min (mirrored)", () => {
        expect(mirrorWrap(170, 10, 100)).toEqual(30);
    });
    it("should mirror high with non-zero min (unmirrored)", () => {
        expect(mirrorWrap(220, 10, 100)).toEqual(40);
    });
    it("should mirror low (mirrored)", () => {
        expect(mirrorWrap(-7, 0, 100)).toEqual(7);
    });
    it("should mirror low (unmirrored)", () => {
        expect(mirrorWrap(-107, 0, 100)).toEqual(93);
    });
    it("should mirror low with non-zero min (mirrored)", () => {
        expect(mirrorWrap(3, 10, 100)).toEqual(17);
    });
    it("should mirror low with non-zero min (unmirrored)", () => {
        expect(mirrorWrap(-99, 10, 100)).toEqual(81);
    });
});

Positive numbers lose their distance from 0 before deciding when to wrap, kinda like shooting a laser into a mirrored box. There's some initial distance and then it starts bouncing once it enters the box. Number less than the initial distance are always reflected because shooting the laser the opposite direction of the box doesn't make sense.

In the end there are many choices we could make and many more edge cases. It's not necessary to solve all of them for simple indexing (mirroring doesn't need a min value either in this case, we can always start at 0) but it does create an interesting discussion. For example I have not tested these over negative ranges (max < min) but that may be something you need to support as well.

Truncating the kernel

One final way we could account for these is to simply omit those values of the kernel that fall out of bounds. If we sample column -1, then we simply do not add it to the final result. This is easy to implement but happens on a slightly different level before we even try to sample a pixel.

The code (in JS)

Last time we made a function that can do this:



import { clamp, wrap, mirrorWrap } from "./math-helpers.js";

export function readOob(value, min, max, behavior) {
    switch (behavior) {
        case "clamp": return clamp(value, min, max);
        case "wrap": return wrap(value, min, max);
        case "mirror": return mirrorWrap(value, min, max);
        default: (value < min || value > max) ? behavior : value;
    }
}

export function sample(imageData, row, col, oobBehavior = { x: "clamp", y: "clamp" }) {
    const sampleCol = readOob(col, 0, imageData.width, oobBehavior.x);
    const sampleRow = readOob(row, 0, imageData.height, oobBehavior.y);

    const offset = (sampleRow * imageData.width * 4) + (sampleCol * 4);
    return [
        imageData.data[offset + 0] / 255,
        imageData.data[offset + 1] / 255,
        imageData.data[offset + 2] / 255,
        imageData.data[offset + 3] / 255
    ]
}

export function setPx(imageData, row, col, val) {
    col = clamp(col, 0, imageData.width);
    row = clamp(row, 0, imageData.height);
    const offset = (row * imageData.width * 4) + (col * 4);
    return [
        imageData.data[offset + 0] = val[0] * 255,
        imageData.data[offset + 1] = val[1] * 255,
        imageData.data[offset + 2] = val[2] * 255,
        imageData.data[offset + 3] = val[3] * 255
    ]
}

export function convolute(imageData, kernel, oobBehavior = { x: "clamp", y: "clamp" }) {
    const output = new ImageData(imageData.width, imageData.height);
    const kRowMid = (kernel.length - 1) / 2; //kernels should have odd dimensions
    const kColMid = (kernel[0].length - 1) / 2;

    for (let row = 0; row < imageData.height; row++) {
        for (let col = 0; col < imageData.width; col++) {

            const sum = [0,0,0];
            for (let kRow = 0; kRow < kernel.length; kRow++) {
                for (let kCol = 0; kCol < kernel[kRow].length; kCol++) {
                    const sampleRow = row + (-kRowMid + kRow);
                    const sampleCol = col + (-kColMid + kCol);
                    const color = sample(imageData, sampleRow, sampleCol, oobBehavior);
                    sum[0] += color[0] * kernel[kRow][kCol];
                    sum[1] += color[1] * kernel[kRow][kCol];
                    sum[2] += color[2] * kernel[kRow][kCol];
                }
            }

            setPx(output, row, col, [...sum, 1.0]);
        }
    }

    return output;
}

This works on ImageData but it's trivially converted to work on plain 2d arrays (or even more dimensional tensors) too.

Here's an example that shows different edge behavior. It's mostly noticeable because it's low res but even then the behavior is not that important to the overall filter:

We can also look at some common filters:

Blur and sharpen are probably things you've seen in image editors and this is how they work. The important thing to note here is that if you are making your own filters they should probably add up to 0 to prevent the colors from getting lighter and darker (you can also use the principal to create darken and lighten filters (identity kernel of a value greater than 1 or less than 1).

And we can also see a better representation of what those edge detection filters were doing in the normal-map builder.

I used this image of a bird I found from https://mississippiriverdelta.org/:

And here's what it's like run through our filters:

Sobel:

Sobel-Feldman:

Scharr:

Prewitt:

You can tell that these are finding the edges of an image. They share a lot in common with the Laplacian filter which can do edge detection in a single shot (there's no separate dx and dy version) though it's more sensitive to noise.

Using the GPU

Running convolution for even small images can be slow with the JS version. There are ways we could make it more parallelized like threading however our best bet is to use the GPU. I'm choosing WebGL because WebGPU isn't well supported yet. I don't think it's even worth going into the boilerplate for this, if you've seen my other posts about building 3d engines or the shader canvas, it's the same thing of creating a quad, and passing textures and uniforms to the shader. Edge behavior comes in how you setup the texture sampler (though "omit" behavior is harder and I didn't implement it). For the convolution itself you just apply a pixel shader which naively looks like:



#version 300 es
precision highp float;
in vec2 uv;
in float uSize;
in float vSize;
out vec4 glColor;
uniform sampler2D sampler;
uniform mat3 kernel;

void main(){
    vec3 tl = texture(sampler, uv + vec2(-uSize, -vSize)).rgb * kernel[0][0];
    vec3 tm = texture(sampler, uv + vec2(0.0, -vSize)).rgb * kernel[1][0];
    vec3 tr = texture(sampler, uv + vec2(uSize, -vSize)).rgb * kernel[2][0];
    vec3 ml = texture(sampler, uv + vec2(-uSize, 0.0)).rgb * kernel[0][1];
    vec3 mm = texture(sampler, uv + vec2(0.0, 0.0)).rgb * kernel[1][1];
    vec3 mr = texture(sampler, uv + vec2(uSize, 0.0)).rgb * kernel[2][1];
    vec3 bl = texture(sampler, uv + vec2(-uSize, vSize)).rgb * kernel[0][2];
    vec3 bm = texture(sampler, uv + vec2(0.0, vSize)).rgb * kernel[1][2];
    vec3 br = texture(sampler, uv + vec2(uSize, vSize)).rgb * kernel[2][2];
    vec3 sum =    tl + tm + tr + ml + mm + mr + bl + bm + br;

    glColor = vec4(sum, 1.0);
}

This will get you a 3x3 kernel which covers most of them but what about something bigger like a 5x5? Well in that case we need to use a texture.

This takes a slightly different form that simply loading an image texture but is overall similar:



//must be using webgl2!
function createDataTexture(context, data, textureIndex = 1, width = 32, height = 32){
    context.activeTexture(context.TEXTURE0 + textureIndex);
    const texture = context.createTexture();
    context.bindTexture(context.TEXTURE_2D, texture);

    context.texParameteri(context.TEXTURE_2D, context.TEXTURE_WRAP_S, context.CLAMP_TO_EDGE);
    context.texParameteri(context.TEXTURE_2D, context.TEXTURE_WRAP_T, context.CLAMP_TO_EDGE);
    context.texParameteri(context.TEXTURE_2D, context.TEXTURE_MIN_FILTER, context.NEAREST);
    context.texParameteri(context.TEXTURE_2D, context.TEXTURE_MAG_FILTER, context.NEAREST);

    context.texImage2D(context.TEXTURE_2D, 0, context.R32F, width, height, 0, context.RED, context.FLOAT, data);
}

This function takes in data as a Float32Array and assigns the texture to the texture unit 1 by default (because 0 is used for the image). We need to supply a width and height because otherwise we don't know what the array represented. We're using nearest neighbor filtering so that we only grab discrete values. Clamping shouldn't matter but in case clamp is the best since we never should read outside. Finally the last line makes a texture from a Float32Array. The data, and dimensions make sense but the formats might seem a bit odd. We want a texture that is a series of floating point values, we don't need multiple color channels so we can pick R32F as the "internal format." WebGL2 forces you to match the "format" parameter which is given in this table: https://registry.khronos.org/webgl/specs/latest/2.0/#TEXTURE_TYPES_FORMATS_FROM_DOM_ELEMENTS_TABLE. So we need to use RED (not R32F) for the "format", and finally the type must match so we use FLOAT.

Here's the shader code.



//vertex.glsl
#version 300 es
precision highp float;
in vec3 aPosition;
in vec2 aUv;

out vec2 uv;
flat out vec2 imgPixelSize;
flat out vec2 kernelPixelSize;
flat out ivec2 kernelSize;
uniform sampler2D sampler;
uniform sampler2D kernel;
void main(){            
    ivec2 imgSize = textureSize(sampler, 0);
    imgPixelSize = vec2(1.0, 1.0) / vec2(imgSize);
    kernelSize = textureSize(kernel, 0);
    kernelPixelSize = vec2(1.0, 1.0) / vec2(kernelSize);
    gl_Position = vec4(aPosition, 1.0);
    uv = aUv;
}

Here we're mostly just passing things through. However we can get the texture size and how big a pixel is in terms of UVs here since it's the same for all pixels. In order to do so we need to define these values as flat. This tells WebGL to not interpolate the values, they will be exactly the same for all pixels. If you're wondering what this means since a triangle has 3 vertices, one "provoking vertex" is used by the fragment to determine which values it gets, in our case they are the same for all vertices because the texture doesn't change per vertex so this isn't too important. (I learned the "flat" is because this is how we can do flat-shading in a different way than I previously did). And the fragment shader:



//fragment.glsl
#version 300 es
precision highp float;

in vec2 uv;
flat in vec2 imgPixelSize;
flat in ivec2 kernelSize;
flat in vec2 kernelPixelSize;

uniform sampler2D sampler;
uniform sampler2D kernel;
out vec4 glColor;
void main(){
    vec2 kernelMid = vec2((kernelSize - ivec2(1, 1)) / 2);
    vec3 sum = vec3(0.0, 0.0, 0.0);
    for(int x = 0; x < kernelSize[0]; x++){
        for(int y = 0; y < kernelSize[1]; y++){
            vec2 sampleUV = uv + (-kernelMid + vec2(x, y)) * imgPixelSize;
            float kernelValue = texture(kernel,  (vec2(x, y) + vec2(0.5, 0.5)) * kernelPixelSize).r;
            sum += (texture(sampler, sampleUV).rgb * kernelValue);
        }
    }

    glColor = vec4(sum, 1.0);
}

Here's the meat. It looks a lot like the JS version but vectorized (basically the X and Ys use the same ops so we can pair them together for efficiency). We need our pixel sizes to get the pixel coordinate in terms of UVs. To get the kernel values we sample the kernel texture, but we need to add 0.5 to the result otherwise we sample the very left edge of the texel coordinate which rounds down and gets us the texel before it. 0.5 puts us in the center of the texel. It's just a red value so we take that and multiply it and sum all those together.

Code for this post can be found here: https://github.com/ndesmic/wc-lib/blob/master/shader-canvas/wc-glsl-convolution-canvas.js

and here: https://github.com/ndesmic/wc-lib/blob/master/shader-canvas/wc-js-convolution-canvas.js

DEV Community

Playing with Image Convolution Filters

Intro to convolution

Edge behavior

Value

Clamp

Wrapping

Mirroring

Truncating the kernel

The code (in JS)

Using the GPU

Top comments (0)

Read next

AI System Masters Complex Document Layouts by Reading Like Humans Do

Copy files from another branch with Git

New AI System Makes GPT-4 33% Better at Code Tasks, Study Shows

AI System Makes Expert-Level Quality Judgments on Music, Speech, and Sound