Intel IACA is the tool I used the most to develop high-performance SIMD optimized kernels. It calculates the throughput of a portion of code and shows critical sections of your resulting assembly (saturated execution ports, register usage graph etc.).
It's a great way to compare two versions of a same code, especially when you want to fine tune your program by trying to push your compiler to use more optimized instructions.
If you don't want to rely on Intel you can just surround the part of your program you want to inspect with a few asm("nop"); in your source and look for a bunch of nop instructions in your disassembled program. However you won't get any information about the throughput of your program.
I made a quick project to compare ASM programs very fast. The shell and CMake scripts take care of building all the programs in every "src__*" folder they find, disassemble them and eventually do the IACA inspection and dependency graph generation in one command. It's not a big thing but it helps when you want to inspect more than one program using IACA without modifying your program, etc...
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Intel IACA is the tool I used the most to develop high-performance SIMD optimized kernels. It calculates the throughput of a portion of code and shows critical sections of your resulting assembly (saturated execution ports, register usage graph etc.).
It's a great way to compare two versions of a same code, especially when you want to fine tune your program by trying to push your compiler to use more optimized instructions.
software.intel.com/en-us/articles/...
But let's be honest: at the end, only benchmark numbers matter, which is why I also use Google Benchmark.
github.com/google/benchmark
I was not aware of these tools and they look pretty good. I'll have to take a look into them.
If you don't want to rely on Intel you can just surround the part of your program you want to inspect with a few asm("nop"); in your source and look for a bunch of nop instructions in your disassembled program. However you won't get any information about the throughput of your program.
I made a quick project to compare ASM programs very fast. The shell and CMake scripts take care of building all the programs in every "src__*" folder they find, disassemble them and eventually do the IACA inspection and dependency graph generation in one command. It's not a big thing but it helps when you want to inspect more than one program using IACA without modifying your program, etc...