DEV Community

Discussion on: We're Stephanie Hurlburt and Rich Geldreich, ask us anything!

Collapse
 
maestromac profile image
Mac Siri • Edited

Hello Stephanie & Rich!

How big of a difference is there between image/texture compression and audio compression? and where does video compression fit into this?

Collapse
 
richgel999 profile image
Rich Geldreich

Audio compression is in many way a very different beast. Good lossy audio compression requires knowledge of psychoacoustics, which isn't something I've personally spent a lot of time studying. A lot of the lower level coding algorithms (such as Huffman or arithmetic coding) are the same, but the higher level algorithms tend to be very specific and tuned to the type of data you are compressing.

One way to build a video codec is to first build an image codec, then use that as a base. This is exactly what we're doing with Basis. Basis's current format is basically the "I Frame" portion of a video codec. (I frames are compressed independently of all the other frames in a video.) Our goal is to eventually work on P (predicted) frames for video after we finish work on our universal format. We'll be working on a universal GPU texture video format for mobile and desktop sometime next year.

Collapse
 
sehurlburt profile image
Stephanie Hurlburt

Definitely! Yes to everything Rich said.

What he's hinting at is that with lossy compression, it's all about human perception. How can we best trick the human brain into thinking there's not a lot of data lost when there actually is? And of course our audio centers will perceive things differently than visual centers.

Visual metrics for quality are super interesting to examine. We use PSNR and SSIM, for example. Those are image quality metrics that attempt to automatically detect, using algorithms, how much the human brain will perceive quality loss. There are more image quality metrics tuned specifically for photographs, but the thing is we deal with all kinds of textures, not just photos (For instance, normal maps and depth maps! How are those perceived by our brain?).

At the end of the day, a computer algorithm won't be able to detect perceived quality loss as good as a human. The best test is always to look at an image and try to judge for yourself how much quality is lost. But that's slower, so in reality we use a combination of human testing and algorithm quality metrics.

We have customers already using Basis for video-- what they do is plug it into their existing video codec, and add the optimizations video needs. Video is images, but you also have to account for humans perceive moving images and optimizations there-- just a bit different. We are mostly focused on optimizing the image part of it now, but folks are more than welcome to use it in video. Our "texture array" feature provides a good start: binomial.info/blog/2017/2/23/intro...