DEV Community

Discussion on: A C++ Micro Optimization Exercise

Collapse
 
rsa profile image
Ranieri Althoff • Edited

Notably, you would never have to use the SIMD intrinsics manually; using a std::vector or std::array would give the compiler more information (e.g. it knows what is the length beforehand) so it would vectorize the code more automatically. It's amazing what compilers can do today!

Shameless plug, I wrote about that

Other than that, it's more you would write the same code with less boilerplate.

Thread Thread
 
jacqueslucke profile image
Jacques Lucke

That's an interesting article as well. Thanks for the link.

Note, I'm absolutely aware of the fact that compilers do an amazing job, I rely on that every day. However, I do think there are some differences between the algorithm you optimized compared to the one I optimized here.

Most notably, in your algorithm each block can be processed in parallel and you are in no way constrained by branch prediction. Both things are not true for the find_non_zero_indices function in this post, which makes it much harder for the compiler to optimize this automatically. Also, it does not know which case to optimize for, the performance greatly depends on the input data when the algorithm is not turned into something branchless.

If you have some time for it, please try rewriting find_non_zero_indices__baseline using std::vector and std::array and check if it actually becomes faster. I think the signature of the function should stay the same, but if you think it helps performance, feel free to change the signature to whatever works better (without changing the core problem of course).

Thread Thread
 
rsa profile image
Ranieri Althoff

Yes, that is a very interesting idea that I will try and surely learn more :)