Discussion on: A C++ Micro Optimization Exercise

View post

Replies for: I guess I used C++ in the title back then, because I actually coded this in a C++ file. You are right, the title could also just mention C. I'm no...

Notably, you would never have to use the SIMD intrinsics manually; using a std::vector or std::array would give the compiler more information (e.g. it knows what is the length beforehand) so it would vectorize the code more automatically. It's amazing what compilers can do today!

Shameless plug, I wrote about that

Other than that, it's more you would write the same code with less boilerplate.

Jacques Lucke • Nov 25 '20

That's an interesting article as well. Thanks for the link.

Note, I'm absolutely aware of the fact that compilers do an amazing job, I rely on that every day. However, I do think there are some differences between the algorithm you optimized compared to the one I optimized here.

Most notably, in your algorithm each block can be processed in parallel and you are in no way constrained by branch prediction. Both things are not true for the find_non_zero_indices function in this post, which makes it much harder for the compiler to optimize this automatically. Also, it does not know which case to optimize for, the performance greatly depends on the input data when the algorithm is not turned into something branchless.

If you have some time for it, please try rewriting find_non_zero_indices__baseline using std::vector and std::array and check if it actually becomes faster. I think the signature of the function should stay the same, but if you think it helps performance, feel free to change the signature to whatever works better (without changing the core problem of course).

Ranieri Althoff • Nov 25 '20

Yes, that is a very interesting idea that I will try and surely learn more :)