In the above program the until the if statement is evaluated the program follows a single control path. After executing the condition "a<b" the program is divided into two branches based on the expression evaluation result.
The above translates to machine code as follows.
The assembly you show is for -O0. If you compile with -O1 instead, you will get:
main:
mov r0, #10
bx lr
We NEVER want to analyze code compiled with -O0 when it comes to optimization. Why? Because it doesn't make any sense.
eg. 1) Maximum of 2 numbers
Looking at the various implementations, version 1 works very well : see godbolt.org/z/sj6xzM
Much better than the ugly version 2.
I am not familiar enough with the cost of each opcode to say whether version 3 is faster than version 1.
eg. 3) Accessing array element
This time, the so-called "optimized version" is clearly less optimized! See godbolt.org/z/h3dhT7
The modulo operator calls __aeabi_idivmod and we all know that division cost at a lot!
We can also see that there is no conditional in the first version...
Again, there no conditional in the first version. The second version doesn't seems to be faster, but clearly the source code is less readable.
This leads me to your conclusion:
Modern compilers have built-in optimizers which can recognize some patterns and replace them with branch-less counterparts. But these are limited, so it's always better to write optimized code.
This was true 10 or 15 years ago.
I believe this is wrong today.
Modern compilers will optimize code much better than you. People are spending their life (often because it's their job) to write optimizers. They know many more tricks than you.
You have to understand how optimizers in compilers work: they look for a common patterns. When they found one, they optimize your code. Take the max function for instance: the "normal" version will be recognized for sure and optimized aggressively. With the other versions, compilers don't recognized anything. You try to be smart but at the end you fail because you don't let the compiler do its job properly.
Doesn't this mean we should not care when we write code?
Absolutely no.
But we have to understand where we can be smart and where the compiler will be smarter.
We should write clear code, compiler-friendly code. We should choose the correct containers (eg: linked list vs array list). We should find good architecture. We should choose the right algorithm (eg: not all sorting algorithm are the same).
Then we let the compiler works.
If performances are not good, then we use a profiler to see where we must optimize. We never optimize ahead with code that pretends to be "smart". We must understand why a particular piece of code, identified by the profiler, is slow. We optimize this piece only. We profile again.
Hello
The assembly you show is for -O0. If you compile with -O1 instead, you will get:
We NEVER want to analyze code compiled with -O0 when it comes to optimization. Why? Because it doesn't make any sense.
Looking at the various implementations, version 1 works very well : see godbolt.org/z/sj6xzM
Much better than the ugly version 2.
I am not familiar enough with the cost of each opcode to say whether version 3 is faster than version 1.
This time, the so-called "optimized version" is clearly less optimized! See godbolt.org/z/h3dhT7
The modulo operator calls
__aeabi_idivmod
and we all know that division cost at a lot!We can also see that there is no conditional in the first version...
See godbolt.org/z/d13jco
Again, there no conditional in the first version. The second version doesn't seems to be faster, but clearly the source code is less readable.
This leads me to your conclusion:
This was true 10 or 15 years ago.
I believe this is wrong today.
Modern compilers will optimize code much better than you. People are spending their life (often because it's their job) to write optimizers. They know many more tricks than you.
You have to understand how optimizers in compilers work: they look for a common patterns. When they found one, they optimize your code. Take the max function for instance: the "normal" version will be recognized for sure and optimized aggressively. With the other versions, compilers don't recognized anything. You try to be smart but at the end you fail because you don't let the compiler do its job properly.
Doesn't this mean we should not care when we write code?
Absolutely no.
But we have to understand where we can be smart and where the compiler will be smarter.
We should write clear code, compiler-friendly code. We should choose the correct containers (eg: linked list vs array list). We should find good architecture. We should choose the right algorithm (eg: not all sorting algorithm are the same).
Then we let the compiler works.
If performances are not good, then we use a profiler to see where we must optimize. We never optimize ahead with code that pretends to be "smart". We must understand why a particular piece of code, identified by the profiler, is slow. We optimize this piece only. We profile again.
Remember: premature optimization is the root of all evil.
I have tried to change the compiler to "x86-64 clang (trunk)" as the results are quite different for some snippets.
This confirms something important: don't assume, measure, try, measure again.