ndesmic

Posted on Jun 22, 2021 • Edited on Jul 4, 2021

WebGL 3D Engine From Scratch Part 5: Cameras

#vanillajs #webgl #cameras #webcomponents

One of the things that's been really annoying thus far is that we keep needing to push the geometry around using transforms. What we'd really like is to represent meshes as being in their absolute position and then using a camera to look around the scene.

Refactor

Before we start, a little cleanup. I've moved some functions floating around in the geo-gl component into a gl-helpers.js file.

Transforms Revisited (again)

The first place I'd like to start is to clean up the transform code, again. After pondering and looking at some examples I finally settled on something that I think makes a bit more sense and keeps the flexibility without the ridged fixed-order process we had. We'll just keep a running modelMatrix that we export at the end so we can perform arbitrary transforms as many times as necessary. We can make the API chainable for convenience:



import { multiplyMatrix, getIdentityMatrix, getTranslationMatrix, getScaleMatrix, getRotationXMatrix, getRotationYMatrix, getRotationZMatrix, transpose } from "./vector.js";

export class Mesh {
    #positions;
    #colors;
    #normals;
    #uvs;
    #triangles;

    #textureName;

    #modelMatrix = getIdentityMatrix();

    constructor(mesh) {
        this.positions = mesh.positions;
        this.colors = mesh.colors;
        this.normals = mesh.normals;
        this.uvs = mesh.uvs;
        this.triangles = mesh.triangles;
        this.textureName = mesh.textureName;
    }

    set positions(val) {
        this.#positions = new Float32Array(val);
    }
    get positions() {
        return this.#positions;
    }
    set colors(val) {
        this.#colors = new Float32Array(val);
    }
    get colors() {
        return this.#colors;
    }
    set normals(val) {
        this.#normals = new Float32Array(val);
    }
    get normals() {
        return this.#normals;
    }
    set uvs(val) {
        this.#uvs = new Float32Array(val);
    }
    get uvs() {
        return this.#uvs;
    }
    get textureName() {
        return this.#textureName;
    }
    set textureName(val) {
        this.#textureName = val;
    }
    set triangles(val) {
        this.#triangles = new Uint16Array(val);
    }
    get triangles() {
        return this.#triangles;
    }
    setTranslation({ x = 0, y = 0, z = 0 }) {
        this.#modelMatrix = multiplyMatrix(getTranslationMatrix(x, y, z), this.#modelMatrix);
        return this;
    }
    setScale({ x = 1, y = 1, z = 1 }) {
        this.#modelMatrix = multiplyMatrix(getScaleMatrix(x, y, z), this.#modelMatrix);
        return this;
    }
    setRotation({ x, y, z }) {
        if (x) {
            this.#modelMatrix = multiplyMatrix(getRotationXMatrix(x), this.#modelMatrix);
        }
        if (y) {
            this.#modelMatrix = multiplyMatrix(getRotationYMatrix(y), this.#modelMatrix);
        }
        if (z) {
            this.#modelMatrix = multiplyMatrix(getRotationZMatrix(z), this.#modelMatrix);
        }
        return this;
    }
    resetTransforms() {
        this.#modelMatrix = getIdentityMatrix();
    }
    getModelMatrix() {
        return transpose(this.#modelMatrix).flat();
    }
}

And we can use it like this:



tcube
    .setRotation({ x: Math.PI / 4,  y: Math.PI / 4 })
    .setTranslation({ z: 2 });

Smaller and more powerful, just what I like. There are some more optimizations in my head around updating the vector math to work on linear arrays and figuring out quaternions but this is fine for now.

The Camera

We're going to create a container for the camera info. This will contain things like the position, which direction it's looking, and also all the field of view and perspective info.



import { getProjectionMatrix } from "./vector.js";

export class Camera {
    #position;
    #rotation;
    #screenWidth;
    #screenHeight;
    #near;
    #far;
    #fieldOfView;
    #isPerspective;

    constructor(camera){
        this.#position = camera.position;
        this.#rotation = camera.rotation;
        this.#screenWidth = camera.screenWidth;
        this.#screenHeight = camera.screenHeight;
        this.#near = camera.near;
        this.#far = camera.far;
        this.#fieldOfView = camera.fieldOfView;
        this.#isPerspective = camera.isPerspective;
    }

    getProjectionMatrix(){
        return new Float32Array(getProjectionMatrix(this.#screenHeight, this.#screenWidth, this.#fieldOfView, this.#zNear, this.#zFar).flat());
    }
}

We can setup up at the beginning like meshes and textures:



createCameras(){
    this.cameras = {
        default: new Camera({
            screenHeight: this.#height,
            screenWidth: this.#width,
            fieldOfView: 90,
            near: 0.01,
            far: 100
        })
    }
}

I call it right after bootGpu. Finally, in setGlobalUniforms we do this instead const projectionMatrix = this.cameras.default.getProjectionMatrix();.

Nothing's changed but we've created a camera abstraction so we could have different sorts of cameras on the same scene.

Orthographic projection

Maybe we want to do orthographic projection as well, why not? To do so we use the following matrix:



export function getOrthoMatrix(left, right, bottom, top, near, far) {
    return [
        [2 / (right - left), 0, 0,  -(right + left) / (right - left)],
        [0, 2 / (top - bottom), 0, -(top + bottom) / (top - bottom)],
        [0, 0, -2 / (near - far), -(near + far) / (near - far)],
        [0, 0, 0, 1]
    ];
}

Where left,right etc. are the maximum positions that will be show in each direction in camera space. Note that the Z component might need to be reversed depending on if you are using a left or right hand coordinate system. We're using left hand with z going into the screen, if your Z goes out of the screen then reverse the far and near terms.

We also need to add some new attributes to the camera:



import { getOrthoMatrix as getOrthoMatrix, getProjectionMatrix } from "./vector.js";

export class Camera {
    #position;
    #rotation;
    #screenWidth;
    #screenHeight;
    #near;
    #far;
    #left;
    #right;
    #top;
    #bottom;
    #fieldOfView;
    #isPerspective;

    constructor(camera){
        this.#position = camera.position;
        this.#target = camera.target;
        this.#screenWidth = camera.screenWidth;
        this.#screenHeight = camera.screenHeight;
        this.#left = camera.left;
        this.#right = camera.right;
        this.#top = camera.top;
        this.#bottom = camera.bottom;
        this.#near = camera.near;
        this.#far = camera.far;
        this.#fieldOfView = camera.fieldOfView;
        this.#isPerspective = camera.isPerspective;

        if (this.#isPerspective && (this.#screenWidth === undefined || this.#screenHeight === undefined || this.#near === undefined || this.#far === undefined || this.#fieldOfView === undefined)){
            throw new Error(`Missing required value for perspective projection`);
        }
        if (!this.#isPerspective && (this.#left === undefined || this.#right === undefined || this.#near === undefined || this.#far === undefined || this.#top === undefined || this.#bottom === undefined)) {
            throw new Error(`Missing required value for ortho projection`);
        }
    }

    getProjectionMatrix(){
        return this.#isPerspective 
            ? new Float32Array(getProjectionMatrix(this.#screenHeight, this.#screenWidth, this.#fieldOfView, this.#near, this.#far).flat())
            : new Float32Array(getOrthoMatrix(this.#left, this.#right, this.#bottom, this.#top, this.#near, this.#far).flat());
    }
}

We only need some of the parameters for each time so there's a little error checking to make sure you're doing it right. And now if we want we can render and orthographic projection:



this.cameras = {
    default: new Camera({
        //screenHeight: this.#height,
        //screenWidth: this.#width,
        //fieldOfView: 90,
        near: 0,
        far: 5,
        left: -1,
        right: 1,
        top: 1,
        bottom: -1
    })
}

If we didn't want to stretch it out we'd need to modify the left, right, top, and bottom value to match the display ratio. Like this:



createCameras(){
    this.cameras = {
        default: new Camera({
            screenHeight: this.#height,
            screenWidth: this.#width,
            fieldOfView: 90,
            near: 0,
            far: 5,
            left: -(this.#width / this.#height),
            right: (this.#width / this.#height),
            top: 1,
            bottom: -1
        })
    }
}

Camera Positioning

This part gets a little confusing. I'll be glossing over some bits because it long, boring and I don't fully understand it myself but basically we need to derive a matrix to transform everything such that it is relative to the camera.

PointAt

First we start with a matrix that can take an object in space and reorient it toward another position in space such that it "points at" the object. To do this we need the starting position, the position of the "target" and what constitutes the "up" direction. That last term is a bit weird but that's how we figure out the rotations. Here the camera has a "forward" direction, and to point it at a target we just subtract the target and position vectors and normalize to get the vector in the direction of the target. However, we don't know how it's rotated. Imagine tilting your head while looking at something, we need to know which way is up. This is pretty trivial in most cases, it's the vector [0, 1, 0] but we can supply it as a parameter incase we want to do something weird. We can then get the last component by getting the cross-product of forward and up. The cross-product gets a vector perpendicular to 2 vectors.



//vector.js
export function crossVector(a, b) {
    return [
        a[1] * b[2] - a[2] * b[1],
        a[2] * b[0] - a[0] * b[2],
        a[0] * b[1] - a[1] * b[0]
    ];
}

export function dotVector(a, b) {
    return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}
export function getPointAtMatrix(position, target, up) {
    const forward = normalizeVector(subtractVector(target, position));
    const newUp = normalizeVector(subtractVector(up, multiplyVector(forward, dotVector(up, forward))));
    const right = crossVector(newUp, forward);

    return [
        [right[0], right[1], right[2], 0],
        [newUp[0], newUp[1], newUp[2], 0],
        [forward[0], forward[1], forward[2], 0],
        [position[0], position[1], position[2], 1]
    ];
}

The crossVector and dotVector only take 3-value vectors though we could extend it if we want everything in 4-value vectors (a change I'm considering). The pointAt matrix can also be used to orient objects towards other objects in the scene too so it's useful.

LookAt

It's good it's useful for that because this isn't so useful for getting things relative to the camera. What we've done is figure out how to point a camera as something, but in order to show it we need the opposite operation. We need all of the objects rotated and translated relative to the camera. For instance: if the camera is at Z = -2 then we expect the object at the origin to be drawn at Z = 2. The opposite can be found via the inverse of the pointAt matrix. That's a lot of complex math but the answer is what is called the lookAt or view matrix. The name is super confusing to me because looking at and pointing at are the same thing but that's just typically what it's called so I guess we'll get used to it.



//vector.js
export function getLookAtMatrix(position, target, up) {
    const forward = normalizeVector(subtractVector(target, position));
    const newUp = normalizeVector(subtractVector(up, multiplyVector(forward, dotVector(up, forward))));
    const right = crossVector(newUp, forward);

    return [
        [right[0], newUp[0], forward[0], 0],
        [right[1], newUp[1], forward[1], 0],
        [right[2], newUp[2], forward[2], 0],
        [-dotVector(position, right), -dotVector(position, newUp), -dotVector(position, forward), 1]
    ];
}

So let's fix up the camera class:



//vector.js
export const UP = [0, 1, 0];

//camera.js
#position = [0,0,0];
#target = [0,0,1];

moveTo(x, y, z){
    this.#position = [x,y,z];
}
lookAt(x, y, z){
    this.#target = [x,y,z];
}
getViewMatrix(){
    return getLookAtMatrix(this.#position, this.#target, UP).flat();
}

Now the camera can give us a view matrix so we can see what it's looking at.

Then we can update setupGlobalUniforms to take in the view matrix as well:



setupGlobalUniforms(){
    const projectionMatrix = this.cameras.default.getProjectionMatrix();
    const projectionLocation = this.context.getUniformLocation(this.program, "uProjectionMatrix");
    this.context.uniformMatrix4fv(projectionLocation, false, projectionMatrix);
    const viewMatrix = this.cameras.default.getViewMatrix();
    const viewLocation = this.context.getUniformLocation(this.program, "uViewMatrix");
    this.context.uniformMatrix4fv(viewLocation, false, viewMatrix);
}

And finally we can update the vertex shader to do the transform:



uniform mat4 uProjectionMatrix;
uniform mat4 uModelMatrix;
uniform mat4 uViewMatrix;

attribute vec3 aVertexPosition;
attribute vec3 aVertexColor;
attribute vec2 aVertexUV;
varying mediump vec4 vColor;
varying mediump vec2 vUV;
void main(){
    gl_Position = uProjectionMatrix * uViewMatrix * uModelMatrix * vec4(aVertexPosition, 1.0);
    vColor = vec4(aVertexColor, 1.0);
    vUV = aVertexUV;
}

It nests between the model and the projection matrix. Now we just need to remember to remove the translate on the cube and we should get the same scene.

Moving the camera

We can setup some events to move the camera now. I'll be using WASD.



attachEvents() {
    document.body.addEventListener("keydown", this.onKeydown);
}
onKeydown(e){
    switch(e.code){
        case "KeyA": {
            this.cameras.default.moveBy({ x: 0.1 });
            break;
        }
        case "KeyD": {
            this.cameras.default.moveBy({ x: -0.1 });
            break;
        }
        case "KeyW": {
            this.cameras.default.moveBy({ z: 0.1 });
            break;
        }
        case "KeyS": {
            this.cameras.default.moveBy({ z: -0.1 });
            break;
        }
    }
    this.render();
}

//camera.js

moveBy({ x = 0, y = 0, z = 0 }){
    this.#position[0] += x; 
    this.#position[1] += y;
    this.#position[2] += z;
}

This will let us zoom in and out but the side-to-side movement will be weird. This is because we're moving the camera but always locked-on to the origin. This opens up a lot of "how exactly should it work" questions. For an FPS camera this is no good but for a model editor it might be. Should the focus be on a thing or should the camera be moved directly? We can change it to move the camera directly by changing moveBy to panBy:



//camera.js
panBy({ x = 0, y = 0, z = 0 }){
    this.#position[0] += x;
    this.#target[0] += x;
    this.#position[1] += y;
    this.#target[1] += y;
    this.#position[2] += z;
    this.#target[2] += z;
}

By moving the target and the camera the same amount we're no longer fixating on the point and the object shifts in the view naturally as if we were panning.

Tumbling

Tumbling is how we'd move the scene with a mouse or pointer. In this case it might make more sense to fixate on a location and use sliding gestures to determine how we rotate around it similar to a satellite orbiting a planet.

As it turns out this is a lot harder than it seems. I tried a couple different ways and what I settled on still has some quirks to it.

Attempt 1

I first tried doing a fairly trivial rotation hoping that with the lookAt matrix it would all just come together. The theory is simple, we translate the camera to the target, rotate and then translate away the same amount producing a rotation around the object.



orbitByBusted({ lat = 0, long = 0 }) {
    this.#position = multiplyMatrixVector(this.#position,
        multiplyMatrix(
            getTranslationMatrix(this.#target[0], this.#target[1], this.#target[2]),
            multiplyMatrix(
                getRotationXMatrix(lat),
                multiplyMatrix(
                    getRotationYMatrix(long),
                    getTranslationMatrix(-this.#target[0], -this.#target[1], -this.
#target[2])
                )
            )
        ));
}

This kinda works but it starts breaking down once you've rotated past 90 degrees in a direction. I'm not exactly sure why.

Spherical Coordinates

The problem seemed easier when converting to spherical coordinates. Our camera has a target and we want to orbit around it, so all we really care about is where the camera is along a spherical surface. The biggest problem here is that if you look up spherical coordinates you'll get a lot of conflicting info because everyone does things differently.

In spherical coordinates we have 3 coordinates: r, theta (θ), phi(φ). We're already going Greek so things are dicey. There are actually 2 notations, one from physics and one from mathematics. The physics version (which seems to be the more popular when looking it up) will have phi rotating in the horizontal plane. If you find a math version theta is rotating in the horizontal plane (more similar to polar coordinates). You need to be careful figuring out which one you have. Here's a good resource that explains it: https://mathworld.wolfram.com/SphericalCoordinates.html

But there's more, if you carefully check the diagram it's almost always rotated 90 from what's expected with the Z axis pointing upward. This just increases our cognitive load as we need to think about everything with our head tilted.

Even once we sort all those out this actually isn't entirely a great way to represent things because phi is 0 at the poles but more naturally we probably want a side-on view to be 0.

Due to this I propose that instead of pure spherical coordinates we use a latitude, longitude system which is much easier to think about in our head since we've all had years of practice with globes (although we'll continue to use radians unless we're displaying things to the user).

Here's what I came up with:



//trig.js
export function normalizeAngle(angle) {
    if (angle < 0) {
        return TWO_PI - (Math.abs(angle) % TWO_PI);
    }
    return angle % TWO_PI;
}

export function cartesianToLatLng([x, y, z]) {
    const radius = Math.sqrt(x ** 2 + y ** 2 + z ** 2);
    return [
        radius,
        normalizeAngle((Math.PI / 2) - Math.acos(y / radius)),
        normalizeAngle(Math.atan2(x, -z)),
    ];
}
export function latLngToCartesian([radius, lat, lng]){
    lng = -lng + Math.PI / 2;
    return [
        radius * Math.cos(lat) * Math.cos(lng),
        radius * Math.sin(lat),
        radius * -Math.cos(lat) * Math.sin(lng),
    ];
}

It's very similar to some spherical representations. The latitude roughly corresponds to whatever coordinate controls the vertical plane but instead of going from 0 to PI, we subtract a quarter turn so that it goes from -PI/1 to PI/1 to match the northern hemisphere being positive and the southern being negative. The longitude is nearly identical to the horizontal coordinate but I flipped a term because it's normally oriented with the positive Z (in our coordinate system) being 0 but we start looking down the Z axis so we want the negative Z to be 0. The easiest way to convert the X,Y,Z axis confusion was just to change the order of the coordinates coming in by using different parameter name ordering and using the existing equations found on the internet. Then you can just do variable name replacement to put them back into order.

Finally we can do the reverse mapping from spherical to cartesian and at this point I was too frustrated to derive so I used guess and check with some unit tests to get the signs and orientation right.

What we wind up with is a completely non-standard though much easier to understand variant of spherical coordinates based on latitude, longitude and using radians.

Back in the camera class we can add our methods:



//camera.js
orbitBy({ lat = 0, long = 0 }){
    const [r, currentLat, currentLng] = this.getOrbit(); 
    this.#position = latLngToCartesian([r, currentLat + lat, currentLng - long]);
}
lookAt(x, y, z){
    this.#target = [x,y,z];
}
getOrbit(){
    const targetDelta = subtractVector(this.#position, this.#target);
    return cartesianToLatLng(targetDelta);
}

Now that the math is done it's just a matter of finding where we are on the sphere and then adding our new values.

Pointer events

Now we can map these to user gestures.



//wc-geo-gl.js
#initialPointer;
#initialCameraPos;
onPointerDown(e){
    this.#initialPointer = [e.offsetX, e.offsetY];
    this.#initialCameraPos = this.cameras.default.getPosition();
    this.dom.canvas.setPointerCapture(e.pointerId);
    this.dom.canvas.addEventListener("pointermove", this.onPointerMove);
    this.dom.canvas.addEventListener("pointerup", this.onPointerUp);
}
onPointerUp(e){
    this.dom.canvas.removeEventListener("pointermove", this.onPointerMove);
    this.dom.canvas.removeEventListener("pointerup", this.onPointerUp);
    this.dom.canvas.releasePointerCapture(e.pointerId);
}
onPointerMove(e){
    const pointerDelta = [
        e.offsetX - this.#initialPointer[0],
        e.offsetY - this.#initialPointer[1]
    ];
    const radsPerWidth = (180 / degreesPerRad) / this.#width;
    const xRads = pointerDelta[0] * radsPerWidth;
    const yRads = pointerDelta[1] * radsPerWidth * (this.#height / this.#width);

    this.cameras.default.setPosition(this.#initialCameraPos);
    this.cameras.default.orbitBy({ long: xRads, lat: yRads });
    this.render();
}

If you read my article on Bézier Curves. I discuss how to do drag-and-drop using pointer capture. This is the same thing. On pointerdown we capture the initial state, where we pressed and where the camera was. Then we setup 2 events for pointermove and pointerdown. pointermove calculates the distance we've dragged the cursor from the starting point and converts that into a degree. const radsPerWidth = (180 / degreesPerRad) / this.#width; is actually a bit of a fudge factor. You can think of it as meaning "if we drag across a whole screen width the object will have rotated 180deg". I also played around with matching the field of view but that didn't feel sensitive enough. Using smaller values will decrease sensitivity. Technically the Y axis doesn't have to match either, and you can flip the direction if that feels more natural. It's up to you. We also need to reset the camera position each pointermove event otherwise we'll be adding the current mouse delta from every single frame which is not what we want.

I'm also rendering on every movement. This is terribly inefficient since it's way faster than my refresh rate and I should use requestAnimationFrame but I always expect these articles to be shorter than they actually are so we'll skip that. On pointerup we jut release capture and reset the events so we're not calculating mouse movement if nothing is pressed.

The result:

Quirks

I said the final version still has some quirks. This is because when the latitude goes past PI/2 or -PI/2 it basically loses orientation because it's not supposed to go past that value as it means we've actually rotated longitude 180 degrees. What happens instead is that it will start reversing direction as the distance delta passes PI/2 which to the user is a bit weird. There's 2 ways this could be dealt with:

1) Just stop values from exceeding PI and clamp it.
2) Rotate longitude 180 degrees when it passes PI.

These each have usability tradeoffs. The first may seem limiting to the user as they can't move past a certain amount. The latter could lead to confusion as up becomes down and we lose all orientation.

I went with the first option as it made more sense (I fixed it after I pushed the code for this version so you won't see it fixed until the next version in github).



//camera.js
orbitBy({ lat = 0, long = 0 }){
    const [r, currentLat, currentLng] = this.getOrbit(); 
    const newLat = clamp(currentLat + lat, -Math.PI/2, Math.PI/2);
    this.#position = latLngToCartesian([r, newLat, currentLng - long]);
}

Newly added is the clamp to put it within the range -PI/2 to PI/2.



export function clamp(value, low, high) {
    low = low !== undefined ? low : Number.MIN_SAFE_INTEGER;
    high = high !== undefined ? high : Number.MAX_SAFE_INTEGER;
    if (value < low) {
        value = low;
    }
    if (value > high) {
        value = high;
    }
    return value;
}

I also had to fix the cartesianToLatLng function:



export function cartesianToLatLng([x, y, z]) {
    const radius = Math.sqrt(x ** 2 + y ** 2 + z ** 2);
    return [
        radius,
        (Math.PI / 2) - Math.acos(y / radius),
        normalizeAngle(Math.atan2(x, -z)),
    ];
}

The latitude should not be normalized which will allow it to take negative angles, which is what we want because it simplifies a lot of things. Otherwise instead of a clamp it's an awkward piecewise function to normalize the value between PI/2 and 0 and 0 and 3*PI/2.

That was quite a lot and it took me quite a while to work through. Hopefully it was useful!

https://github.com/ndesmic/geogl/tree/v2

DEV Community