Recently I have been re-organizing and re-compiling all third-party dependencies of Nebula Graph, an open-source distributed graph database. And I have come across two interesting issues and would like to share with you.
Segmentation fault happened upon compiling Flex：
Check coredump with gdb:
We can see from the assembly code above that the issue was caused by the allocate_array function. reallocarray returned a pointer, which should be saved in the 64-bit register rax. However, allocate_arraycalled reallocarray and returned the 32-bit register eax. Meanwhile it used instruction cltq to extend eaxto rax.
The possible reason could be that the prototype of reallocarray that allocate_array saw was different than the real prototype.
When looking at the compiing log, I did find such a warning, like implicit declaration of function reallocarray'.
This issue can be resolved by adding CFLAGS=-D_GNU_SOURCE at the configure stage.
Please note that this issue is not supposed to appear every time. However, enabling compiling/link option -pie and core parameter kernel.randomize_va_space helps produce the issue.
- The return type of an implicit declarative function is int in C
- Pay attention to compiler warnings with -Wall and -Wextra enabled. Better enable -Werror under development mode
A while ago I’ve received feedback from Nebula Graph users that they encountered a compiler error: ileggal instruction. See the details in this pull request: https://github.com/vesoft-inc/nebula/issues/978.
Below is the error message:
Since it’s an internal compiler error, my assumption would be that an illegal instruction was encountered in g++ itself. To locate the specific illegal instruction set and the component it belongs to, we need to reproduce the error.
Luckily, the code snippet below can do the magic:
Illegal instruction is sure to trigger SIGIL. Since g++ acts only as the entrance of the compiler, the real compiler is cc1plus.
We can use gdb to perform the compiling process and catch the illegal instruction on spot:
mulx belongs to BMI2 instruction set and the CPU of the machine in error doesn't support this instruction set.
After a thorough investigation, I found that it was GMP, which is one of GCC’s dependencies, that introduced this instruction set. By default, GMP would detect the CPU type of the host machine at the configure stage to make use of the most recent instruction sets, which improves performance while sacrificing the portability of the binary.
To solve the issue, you can try to override two files in the GMP source tree, i.e. config.guess and config.sub with _configfsf.guess_ and configfsf.sub respectively before configure .
- GCC won’t adopt new instruction set due to compatibility issue by default.
- To balance compatibility and performance, you need to do some extra work. For example, select and bind a specific instance for gllibc when it is running.
Finally, if you are interested in compiling the source code of Nebula Graph, please refer to the documentation here.