Hello, this is Tecca, with regard to previous post, in this post I will do an update on two candidate packages that I was looking into, and summarize with what my approaches are towards optimization in the end.
- FFmpeg
- FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.
FFmpeg website
FFmpeg Github repo
- Rtm - Realtime Math
- This library is geared towards realtime applications that require their math to be as fast as possible. Much care was taken to maximize inlining opportunities and for code generation to be optimal when a function isn't inlined by passing values in registers whenever possible.
- libjpeg-turbo
- libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86, x86-64, and Arm systems.
I came across these three packages feeling they could greatly benefit from utilizing sve2, both of them does not have SVE/SVE2 optimization implemented. After doing a bit of research, I decided to go with libjpeg for this project.
This(libjpet-turbo) package utilizes SIMD operations, and is already supporting SIMD instructions (MMX, SSE2, AVX2, Neon, AltiVec) and ARM64 architectures. Implementing SVE2 for ARM-v9 seems to be a valid optimization.
Strategy - optimization approach
Three options implementing SVE2 optimizations
- auto vectorization
- inline assembler
- using SVE2 intrinsic
My plan is to start with something small, within the file that utilizes SIMD operation, I will try to implement auto vectorization to it.
Top comments (0)