Jason C. McDonald

Posted on Sep 8, 2018 • Updated on Feb 8, 2019

I'm an Expert in Memory Management & Segfaults, Ask Me Anything!

#ama #cpp #c #programming

I'm an expert-level C and C++ developer, with a specialty in memory management. I have experience writing memory-safe code with both the modern safe techniques and the ancient unsafe techniques. I've used malloc and free without killing myself. I love pointers. I've debugged more than my share of undefined behavior, and authored the canonical StackOverflow question on segfault debugging.

Any burning questions about dynamic allocation, undefined behavior, pointers, memory safety, or anything even remotely related? Ask me anything!

(My main languages are C++, C, and Python, although I also deeply grok the underlying computer science principles.)

Top comments (161)

kyrlon • Sep 1 '20

Hello Jason! I am encountering a peculiar issue. On one system after compiling I am getting no errors with pthreads in C++. On the second I am running across a Segfault

[Thread 0x7ffff4ccc700 (LWP 1735) exited]

Thread 2 "a.out" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff54cd700 (LWP 1734)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

On the other I am recieving no errors. My wild guess is that the issue might stem from the systems having different kernels:

No Problems:
cat /proc/version
Linux version 4.15.0-29-generic (buildd@lgw01-amd64-057) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018

Problems(SegFault):
cat /proc/version
Linux version 5.4.0-42-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020

Please let me know if any other information is needed

Jason C. McDonald • Sep 1 '20

Every time I see "No such file or directory," the first thing I check is the path. What is the current working directory from which you're running a.out, and does the path ../sysdeps/unix/sysv/linux/raise.c indeed exist relative to that working directory? (Remember that .. means "parent directory".)

kyrlon • Sep 1 '20

My working path is the same as the pwd for where I am running a.out. In this case it would be something on the lines of /home/red/recorder. I have no idea of where the path ../sysdeps/unix/sysv/linux/raise.c is coming from, so I would say no it does not exist. I am assuming it is a segfault of some variant when I receive a response of double free or corruption (out) and Aborted (core dumped).

kyrlon • Sep 1 '20

I can provide a recent valgrind and or gdb bt if needed.

Jason C. McDonald • Sep 1 '20

Yeah, that relative path is spooky.

To diagnose the double free or corrupted, you'd want to run your program through Valgrind. Meanwhile, you may need to use gdb to step through your program (compiled with -g) to determine precisely when control leaves your code, onward to the abort.

kyrlon • Sep 1 '20

Alright the first block is a gdb output with a backtrace:


gdb ./a.out                                                                                                                                                                                       [0/0]
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
(gdb) run
Starting program: /home/red/recorder/sidekiq/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
this is the current epoch:  1598981433
Waiting for pps epoch
Info: configured 1PPS source to 1
[New Thread 0x7ffff54cd700 (LWP 4245)]
[New Thread 0x7ffff4ccc700 (LWP 4246)]
Info: starting 0 Rx handle(s) for card 0 on next 1PPS pulse
Info: streaming started on card 0
Error: timestamp error for card 0 handle A1 (blk 0) ... expected 0x0000000000000000 but got 0x0000000000000004 (delta 4)
Info: all 0 card(s) started streaming
Info: sleeping for 3 seconds
Info: signaling to all 1 card(s) to stop streaming
Info: waiting for control thread for card 0
Info: stopping 0 Rx handle(s) on card 0 on next 1PPS pulse
Info: streaming stopped on card 0
double free or corruption (out)
Info: waiting for receive thread for card 0
[Thread 0x7ffff4ccc700 (LWP 4246) exited]

Thread 2 "a.out" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff54cd700 (LWP 4245)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff74f9859 in __GI_abort () at abort.c:79
#2  0x00007ffff75643ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff768e285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff756c47c in malloc_printerr (str=str@entry=0x7ffff7690670 "double free or corruption (out)") at malloc.c:5347
#4  0x00007ffff756e120 in _int_free (av=0x7ffff76bfb80 <main_arena>, p=0x7ffff0000b50, have_lock=<optimized out>) at malloc.c:4314
#5  0x0000555555566af0 in __gnu_cxx::new_allocator<int>::deallocate (this=<synthetic pointer>, __p=0x7ffff0000b60) at /usr/include/c++/9/ext/new_allocator.h:119
#6  std::allocator_traits<std::allocator<int> >::deallocate (__a=<synthetic pointer>..., __n=<optimized out>, __p=0x7ffff0000b60) at /usr/include/c++/9/bits/alloc_traits.h:470
#7  std::_Vector_base<int, std::allocator<int> >::_M_deallocate (this=<synthetic pointer>, __n=<optimized out>, __p=0x7ffff0000b60) at /usr/include/c++/9/bits/stl_vector.h:351
#8  std::_Vector_base<int, std::allocator<int> >::~_Vector_base (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/stl_vector.h:332
#9  std::vector<int, std::allocator<int> >::~vector (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/stl_vector.h:680
#10 sidekiq_class::receive_data (card=0 '\000', p_rconfig=<optimized out>) at src/testing_cpp.cpp:598
#11 0x0000555555567cf8 in sidekiq_class::receive_run (data=<optimized out>) at src/testing_cpp.cpp:705
#12 0x00007ffff7a27609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007ffff75f6103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

kyrlon • Sep 1 '20

This is the valgrind dump (sorry for double comment; couldn't get the markdown to capture all of the code cleanly):



valgrind --leak-check=yes --track-origins=yes ./a.out
==4364== Memcheck, a memory error detector
==4364== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4364== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==4364== Command: ./a.out
==4364==
Error: unable to initialize libsidekiq with status -16
==4364==
==4364== HEAP SUMMARY:
==4364==     in use at exit: 0 bytes in 0 blocks
==4364==   total heap usage: 5 allocs, 5 frees, 75,144 bytes allocated
==4364==
==4364== All heap blocks were freed -- no leaks are possible
==4364==
==4364== For lists of detected and suppressed errors, rerun with: -s
==4364== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
red@red-NUC8v5PNK:~/recorder/sidekiq$ valgrind --leak-check=yes --track-origins=yes ./a.out
==4367== Memcheck, a memory error detector
==4367== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4367== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==4367== Command: ./a.out
==4367==
this is the current epoch:  1598982515
Waiting for pps epoch
Info: configured 1PPS source to 1
Info: starting 0 Rx handle(s) for card 0 on next 1PPS pulse
Info: streaming started on card 0
==4367== Thread 2:
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x11B078: sidekiq_class::update_rx_stats(sidekiq_class::rx_stats*, skiq_rx_block_t*) (testing_cpp.cpp:569)
==4367==    by 0x11B95F: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:646)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x11B964: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:647)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x11B97C: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:657)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Use of uninitialised value of size 8
==4367==    at 0x51AB7BA: _itoa_word (_itoa.c:180)
==4367==    by 0x51C76F4: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x51AB7CC: _itoa_word (_itoa.c:180)
==4367==    by 0x51C76F4: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x51C83A8: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x51C786E: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x51C7AD8: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Use of uninitialised value of size 8
==4367==    at 0x51AB81B: _itoa_word (_itoa.c:179)
==4367==    by 0x51C76F4: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
==4367== Conditional jump or move depends on uninitialised value(s)
==4367==    at 0x51AB82D: _itoa_word (_itoa.c:179)
==4367==    by 0x51C76F4: __vfprintf_internal (vfprintf-internal.c:1687)
==4367==    by 0x51CA021: buffered_vfprintf (vfprintf-internal.c:2377)
==4367==    by 0x51C6EA3: __vfprintf_internal (vfprintf-internal.c:1346)
==4367==    by 0x527E022: __fprintf_chk (fprintf_chk.c:33)
==4367==    by 0x11B9CD: fprintf (stdio2.h:100)
==4367==    by 0x11B9CD: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:662)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Uninitialised value was created by a stack allocation
==4367==    at 0x11B860: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:596)
==4367==
Error: timestamp error for card 0 handle A1 (blk 0) ... expected 0x0000000000000000 but got 0x0000000003938724 (delta 60000036)
Info: all 0 card(s) started streaming
Info: sleeping for 3 seconds
Info: signaling to all 1 card(s) to stop streaming
Info: waiting for control thread for card 0
Info: stopping 0 Rx handle(s) on card 0 on next 1PPS pulse
==4367== Invalid read of size 2
==4367==    at 0x11BB31: __copy_m<short int volatile*, int*> (stl_algobase.h:340)
==4367==    by 0x11BB31: __copy_move_a<false, short int volatile*, int*> (stl_algobase.h:404)
==4367==    by 0x11BB31: __copy_move_a2<false, short int volatile*, int*> (stl_algobase.h:440)
==4367==    by 0x11BB31: copy<short int volatile*, int*> (stl_algobase.h:474)
==4367==    by 0x11BB31: _M_assign_aux<short int volatile*> (vector.tcc:321)
==4367==    by 0x11BB31: _M_assign_dispatch<short int volatile*> (stl_vector.h:1625)
==4367==    by 0x11BB31: assign<short int volatile*> (stl_vector.h:766)
==4367==    by 0x11BB31: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:674)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Address 0x7745000 is 0 bytes after a block of size 33,554,432 alloc'd
==4367==    at 0x483E0F0: memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x483E212: posix_memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x17A722: DmaInterfaceInit (in /home/red/recorder/sidekiq/a.out)
==4367==    by 0x3: ???
==4367==    by 0x1FFF00002F: ???
==4367==    by 0x3FFF: ???
==4367==    by 0x53560BF: ???
==4367==    by 0x7FF: ???
==4367==
==4367== Invalid read of size 2
==4367==    at 0x11BB20: __copy_m<short int volatile*, int*> (stl_algobase.h:342)
==4367==    by 0x11BB20: __copy_move_a<false, short int volatile*, int*> (stl_algobase.h:404)
==4367==    by 0x11BB20: __copy_move_a2<false, short int volatile*, int*> (stl_algobase.h:440)
==4367==    by 0x11BB20: copy<short int volatile*, int*> (stl_algobase.h:474)
==4367==    by 0x11BB20: _M_assign_aux<short int volatile*> (vector.tcc:321)
==4367==    by 0x11BB20: _M_assign_dispatch<short int volatile*> (stl_vector.h:1625)
==4367==    by 0x11BB20: assign<short int volatile*> (stl_vector.h:766)
==4367==    by 0x11BB20: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:674)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==  Address 0x7745002 is 2 bytes after a block of size 33,554,432 alloc'd
==4367==    at 0x483E0F0: memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x483E212: posix_memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x17A722: DmaInterfaceInit (in /home/red/recorder/sidekiq/a.out)
==4367==    by 0x3: ???
==4367==    by 0x1FFF00002F: ???
==4367==    by 0x3FFF: ???
==4367==    by 0x53560BF: ???
==4367==    by 0x7FF: ???
==4367==
==4367==
==4367== Process terminating with default action of signal 11 (SIGSEGV)
==4367==  Bad permissions for mapped region at address 0x7746000
==4367==    at 0x11BB31: __copy_m<short int volatile*, int*> (stl_algobase.h:340)
==4367==    by 0x11BB31: __copy_move_a<false, short int volatile*, int*> (stl_algobase.h:404)
==4367==    by 0x11BB31: __copy_move_a2<false, short int volatile*, int*> (stl_algobase.h:440)
==4367==    by 0x11BB31: copy<short int volatile*, int*> (stl_algobase.h:474)
==4367==    by 0x11BB31: _M_assign_aux<short int volatile*> (vector.tcc:321)
==4367==    by 0x11BB31: _M_assign_dispatch<short int volatile*> (stl_vector.h:1625)
==4367==    by 0x11BB31: assign<short int volatile*> (stl_vector.h:766)
==4367==    by 0x11BB31: sidekiq_class::receive_data(unsigned char, sidekiq_class::radio_config*) (testing_cpp.cpp:674)
==4367==    by 0x11BCF7: sidekiq_class::receive_run(void*) (testing_cpp.cpp:705)
==4367==    by 0x4DDB608: start_thread (pthread_create.c:477)
==4367==    by 0x526F102: clone (clone.S:95)
==4367==
==4367== HEAP SUMMARY:
==4367==     in use at exit: 33,633,366 bytes in 188 blocks
==4367==   total heap usage: 197 allocs, 9 frees, 33,839,742 bytes allocated
==4367==
==4367== Thread 1:
==4367== 288 bytes in 1 blocks are possibly lost in loss record 8 of 16
==4367==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==4367==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==4367==    by 0x4DDC322: allocate_stack (allocatestack.c:622)
==4367==    by 0x4DDC322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==4367==    by 0x11B58C: sidekiq_class::read_iq_on_pps(unsigned long, unsigned long) (testing_cpp.cpp:314)
==4367==    by 0x11AC73: main (testing_cpp.cpp:845)
==4367==
==4367== 288 bytes in 1 blocks are possibly lost in loss record 9 of 16
==4367==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4367==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==4367==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==4367==    by 0x4DDC322: allocate_stack (allocatestack.c:622)
==4367==    by 0x4DDC322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==4367==    by 0x11B5A2: sidekiq_class::read_iq_on_pps(unsigned long, unsigned long) (testing_cpp.cpp:316)
==4367==    by 0x11AC73: main (testing_cpp.cpp:845)
==4367==
==4367== LEAK SUMMARY:
==4367==    definitely lost: 0 bytes in 0 blocks
==4367==    indirectly lost: 0 bytes in 0 blocks
==4367==      possibly lost: 576 bytes in 2 blocks
==4367==    still reachable: 33,632,790 bytes in 186 blocks
==4367==         suppressed: 0 bytes in 0 blocks
==4367== Reachable blocks (those to which a pointer was found) are not shown.
==4367== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4367==
==4367== For lists of detected and suppressed errors, rerun with: -s
==4367== ERROR SUMMARY: 17000 errors from 14 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

kyrlon • Sep 1 '20

The most puzzling part is that I get this error on one system but not the other:

No Problems:
cat /proc/version
Linux version 4.15.0-29-generic (buildd@lgw01-amd64-057) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018

Problems(SegFault):
cat /proc/version
Linux version 5.4.0-42-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020

Jason C. McDonald • Sep 1 '20

You must remember that undefined behavior is exactly that. It may appear to work, and then not work elsewhere. When it doesn't work, anything could happen, including "making demons fly out your nose". So, I'd start by going through that Valgrind output, bit by bit, and fixing each problem in your code it highlights. (The error and location of that error in your code is on the last line of each traceback block in Valgrind.)

Once your //own// code runs Valgrind pure, we can tackle any remaining weirdness.

kyrlon • Sep 1 '20

After furious placements of std::cout everywhere, I discovered that my function that my thread was calling was missing its return statement. Not sure how my first system was able to bypass that issue but it arised on my second system.

Jason C. McDonald • Sep 1 '20

So, it's resolved, then?

kyrlon • Sep 2 '20

The issue yes. My curiosity, not so much lol. It shouldn't have compiled however that behavior I believe isactually allowed, it's grandfathered in from C. Since the function returns a variable and a return is not explicitly called then the first value in the stack frame is reinterpreted as the return type and returned instead

int foo(int bar) {
int blah= 8;
blah += 4;
int zayxxy = 0;
} //returns 12

So I believe that is the reason how my first system was able to execute without any problems. However, I do not fully understand how on the second system the function was never able to terminate the function but stay stuck in a while loop and result in a segfault.

Jason C. McDonald • Sep 2 '20 • Edited

What you described, not returning a value from a non-void function, is actually undefined behavior in C. Therefore, once again, it is legal for the compiler to make demons fly out your nose. Anything can happen. There is no rhyme or reason.

Here's C99 on it — ISO/IEC 9899:1999, section 6.9.1 paragraph 12:

If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.

One system's compiler was able to figure it out anyway, and it worked, which is legal (because anything is). The other system's compiler was not, and it had a snit.

P.S. Thanks for asking! I learned something new today, namely that the above is undefined behavior.

kyrlon • Sep 9 '20

Glad I could help! I have definitely learned alot from this experience as well!

Andrew Clayton • Sep 29 '20

OK, firstly, a small nitpick, you weren't getting a segfault (SIGSEGV) but a SIGABRT

As for failing to return a value from a non-void function, you should at the very least compile with -Wall, which would have caught that. e.g

/* n.c - no return from non-void function */

static int test(void)
{
}

$ gcc -c n.c
$

$ gcc -Wall -c n.c
n.c: In function ‘test’:
n.c:5:1: warning: no return statement in function returning non-void [-Wreturn-type]
    5 | }
      | ^
At top level:
n.c:3:12: warning: ‘test’ defined but not used [-Wunused-function]
    3 | static int test(void)
      |            ^~~~
$

And of course we also get the second warning...

I always compile with at least '-Wall -Wextra'

all_yours • Sep 9 '19

Hello, I am learning robotics and using ros-kinetic with gazebo7. I am trying to launch my model in gazebo but got stuck on a "segmentation fault(core dump)" error at

0x00007fffc96ac0ed in ros::NodeHandle::destruct() ()
from /opt/ros/kinetic/lib/libroscpp.so

Kindly advice

Jason C. McDonald • Sep 9 '19

There are only two ways to debug a segmentation fault, ordinarily:

1) If you have access to the source code for ros-kinetic, you would need to compile it yourself with the -g flag (debug flag), and then try to use it the same way as before. Then, when the segfault occurs, you'll get a file and line number instead of the raw memory address (0x00007fffc96ac0ed), and that will tell you where in the code the segfault is (probably) happening from.

2) To get more information, you can run the code (again, compiled with -g) through a dynamic memory analyser like Valgrind. That will not only give you the file and line number where the segfault is probably occurring, but also a hint about what's going on, and possibly a longer stack track.

Given the information from (1) or (2) (and a snippet of the offending source code), I could probably help you from there.

However, if ros-kinetic is not your project, you'll be best off filing a bug report on their issue tracker.

all_yours • Sep 26 '19 • Edited

Thanks for the advice. I did compile ros-kinetic from source but now gdb wasn't launching, I don't know why. So i reinstalled ros-kinetic from apt and ran it, gdb was working. Well I did find the source file for the function pointing to segmentation fault :-

void NodeHandle::destruct()
{
  delete collection_;

  boost::mutex::scoped_lock lock(g_nh_refcount_mutex);

  --g_nh_refcount;

  if (g_nh_refcount == 0 && g_node_started_by_nh)
  {
    ros::shutdown();
  }
}

The backtrace went till 24 frames I could provide them too if the fault is not in this part of code.
If you could help me find the error it would really boost my learning.

Jason C. McDonald • Sep 26 '19 • Edited

The stack trace would be really helpful. Also, please be sure to precede your code example with three backticks (`) on the line above the example, and three on the line below.

all_yours • Sep 26 '19

The stack trace is here :-

#0  0x00007fffc96ac0ed in ros::NodeHandle::destruct() ()
   from /opt/ros/kinetic/lib/libroscpp.so

#1  0x00007fffc96ac269 in ros::NodeHandle::~NodeHandle() ()
   from /opt/ros/kinetic/lib/libroscpp.so

#2  0x00007fff101bf4b4 in realtime_tools::RealtimePublisher<pr2_mechanism_msgs::MechanismStatistics_<std::allocator<void> > >::~RealtimePublisher (
    this=0x7fff0d7a1508, __in_chrg=<optimized out>)
    at /opt/ros/kinetic/include/realtime_tools/realtime_publisher.h:84

#3  pr2_controller_manager::ControllerManager::~ControllerManager (
    this=0x7fff0d7a0e80, __in_chrg=<optimized out>)
    at /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/pr2_mechanism/pr2_controller_manager/src/controller_manager.cpp:63

#4  0x00007fff101bfed9 in pr2_controller_manager::ControllerManager::~ControllerManager (this=0x7fff0d7a0e80, __in_chrg=<optimized out>)
    at /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/pr2_mechanism/pr2_controller_manager/src/controller_manager.cpp:67

#5  0x00007fff104ab96a in gazebo::GazeboRosControllerManager::~GazeboRosControllerManager (this=0x7fff0d7a0960, __in_chrg=<optimized out>)
    at /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/pr2_simulator/pr2_gazebo_plugins/src/gazebo_ros_controller_manager.cpp:85

#6  0x00007fff104abaf6 in gazebo::GazeboRosControllerManager::~GazeboRosControllerManager (this=0x7fff0d7a0960, __in_chrg=<optimized out>)
    at /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/pr2_simulator/pr2_g---Type <return> to continue, or q <return> to quit---
azebo_plugins/src/gazebo_ros_controller_manager.cpp:94

#7  0x00007ffff5ba80c9 in boost::checked_delete<gazebo::ModelPlugin> (
    x=0x7fff0d7a0960) at /usr/include/boost/core/checked_delete.hpp:34

#8  0x00007ffff5baadd6 in boost::detail::sp_counted_impl_p<gazebo::ModelPlugin>::dispose (this=0x7fff2e023430)
    at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78

#9  0x00007ffff59b4efe in boost::detail::sp_counted_base::release (
    this=0x7fff2e023430)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146

#10 0x00007ffff59b4f91 in boost::detail::shared_count::~shared_count (
    this=0x7fff63ffb3d8, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/shared_count.hpp:443

#11 0x00007ffff5b9f086 in boost::shared_ptr<gazebo::ModelPlugin>::~shared_ptr (
    this=0x7fff63ffb3d0, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/shared_ptr.hpp:323

#12 0x00007ffff5b9a392 in gazebo::physics::Model::LoadPlugin (
    this=0x7fff3c24d3b0, _sdf=std::shared_ptr (count 3, weak 5) 0x7fff368df840)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/Model.cc:1002

#13 0x00007ffff5b999e2 in gazebo::physics::Model::LoadPlugins (
    this=0x7fff3c24d3b0)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/Model.cc:915

#14 0x00007ffff5c08bd9 in gazebo::physics::World::ProcessFactoryMsgs (this=0x14f2480)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/World.cc:1958

#15 0x00007ffff5c0b9de in gazebo::physics::World::ProcessMessages (
    this=0x14f2480)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/World.cc:2282

#16 0x00007ffff5c0069f in gazebo::physics::World::Step (this=0x14f2480)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/World.cc:688

#17 0x00007ffff5bff06c in gazebo::physics::World::RunLoop (this=0x14f2480)
    at /home/deadmanlogan/i_am_from_source/Gazebo-7/gazebo/physics/World.cc:481

#18 0x00007ffff5c2e413 in boost::_mfi::mf0<void, gazebo::physics::World>::operator() (this=0x128c3b8, p=0x14f2480)
    at /usr/include/boost/bind/mem_fn_template.hpp:49

#19 0x00007ffff5c2d4be in boost::_bi::list1<boost::_bi::value<gazebo::physics::World*> >::operator()<boost::_mfi::mf0<void, gazebo::physics::World>, boost::_bi::list0> (this=0x128c3c8, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253

#20 0x00007ffff5c2b74a in boost::_bi::bind_t<void, boost::_mfi::mf0<void, gazebo::physics::World>, boost::_bi::list1<boost::_bi::value<gazebo::physics::World*> > >::operator() (this=0x128c3b8) at /usr/include/boost/bind/bind.hpp:893

#21 0x00007ffff5c3070a in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, gazebo::physics::World>, boost::_bi::list1<boost::_bi::value<gazebo::physics::World*> > > >::run (this=0x128c200)at /usr/include/boost/thread/detail/thread.hpp:116

#22 0x00007ffff35c65d5 in boost::(anonymous namespace)::thread_proxy (
    param=<optimized out>) at libs/thread/src/pthread/thread.cpp:168

#23 0x00007ffff79086ba in start_thread (arg=0x7fff63ffd700)
    at pthread_create.c:333

#24 0x00007ffff64c741d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Jason C. McDonald • Sep 26 '19 • Edited

Awesome, and the code you posted earlier, is that the context for /opt/ros/kinetic/include/realtime_tools/realtime_publisher.h:84?

Also, what is the rest of the Valgrind output? Any more details? Segfaults have many causes, so knowing which one was detected helps narrow down the problem.

all_yours • Sep 26 '19

The code I posted earlier is the context for frame 1&2. Actually this is the gdb output which I posted.

Well I just ran the same in valgrind and it gave:-

deadmanlogan@war:~$ roslaunch pr2_description pr2.launch
... logging to /home/deadmanlogan/.ros/log/12f716ec-e077-11e9-af75-68071520849c/roslaunch-war-13334.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

xacro: Traditional processing is deprecated. Switch to --inorder processing!
To check for compatibility of your document, use option --check-order.
For more infos, see http://wiki.ros.org/xacro#Processing_Order
xacro.py is deprecated; please use xacro instead
started roslaunch server http://war:36573/

SUMMARY
========

PARAMETERS
 * /robo_state_publisher/publish_frequency: 30.0
 * /robot_description: <?xml version="1....
 * /rosdistro: kinetic
 * /rosversion: 1.12.14
 * /use_sim_time: True

NODES
  /
    gazebo (gazebo_ros/gzserver)
    robo_state_publisher (robot_state_publisher/robot_state_publisher)
    urdf_spawner (gazebo_ros/spawn_model)

auto-starting new master
process[master]: started with pid [13348]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 12f716ec-e077-11e9-af75-68071520849c
process[rosout-1]: started with pid [13361]
started core service [/rosout]
process[gazebo-2]: started with pid [13384]
process[urdf_spawner-3]: started with pid [13386]
process[robo_state_publisher-4]: started with pid [13387]
==13384== Memcheck, a memory error detector
==13384== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13384== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13384== Command: /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/gazebo_ros_pkgs/gazebo_ros/scripts/gzserver -e ode worlds/empty.world __name:=gazebo __log:=/home/deadmanlogan/.ros/log/12f716ec-e077-11e9-af75-68071520849c/gazebo-2.log
==13384== 
SpawnModel script started
[INFO] [1569513769.345868, 0.000000]: Loading model XML from ros parameter
[INFO] [1569513769.363155, 0.000000]: Waiting for service /gazebo/spawn_urdf_model
==13447== Warning: invalid file descriptor -1 in syscall close()
==13448== 
==13448== HEAP SUMMARY:
==13448==     in use at exit: 23,063 bytes in 127 blocks
==13448==   total heap usage: 220 allocs, 93 frees, 96,415 bytes allocated
==13448== 
==13448== LEAK SUMMARY:
==13448==    definitely lost: 0 bytes in 0 blocks
==13448==    indirectly lost: 0 bytes in 0 blocks
==13448==      possibly lost: 0 bytes in 0 blocks
==13448==    still reachable: 23,063 bytes in 127 blocks
==13448==         suppressed: 0 bytes in 0 blocks
==13448== Rerun with --leak-check=full to see details of leaked memory
==13448== 
==13448== For counts of detected and suppressed errors, rerun with: -v
==13448== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13447== 
==13447== HEAP SUMMARY:
==13447==     in use at exit: 14,899 bytes in 127 blocks
==13447==   total heap usage: 219 allocs, 92 frees, 88,235 bytes allocated
==13447== 
==13447== LEAK SUMMARY:
==13447==    definitely lost: 0 bytes in 0 blocks
==13447==    indirectly lost: 0 bytes in 0 blocks
==13447==      possibly lost: 0 bytes in 0 blocks
==13447==    still reachable: 14,899 bytes in 127 blocks
==13447==         suppressed: 0 bytes in 0 blocks
==13447== Rerun with --leak-check=full to see details of leaked memory
==13447== 
==13447== For counts of detected and suppressed errors, rerun with: -v
==13447== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13450== Warning: invalid file descriptor -1 in syscall close()
==13451== 
==13451== HEAP SUMMARY:
==13451==     in use at exit: 23,112 bytes in 129 blocks
==13451==   total heap usage: 226 allocs, 97 frees, 96,533 bytes allocated
==13451== 
==13451== LEAK SUMMARY:
==13451==    definitely lost: 0 bytes in 0 blocks
==13451==    indirectly lost: 0 bytes in 0 blocks
==13451==      possibly lost: 0 bytes in 0 blocks
==13451==    still reachable: 23,112 bytes in 129 blocks
==13451==         suppressed: 0 bytes in 0 blocks
==13451== Rerun with --leak-check=full to see details of leaked memory
==13451== 
==13451== For counts of detected and suppressed errors, rerun with: -v
==13451== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13450== 
==13450== HEAP SUMMARY:
==13450==     in use at exit: 14,948 bytes in 129 blocks
==13450==   total heap usage: 225 allocs, 96 frees, 88,353 bytes allocated
==13450== 
==13450== LEAK SUMMARY:
==13450==    definitely lost: 0 bytes in 0 blocks
==13450==    indirectly lost: 0 bytes in 0 blocks
==13450==      possibly lost: 0 bytes in 0 blocks
==13450==    still reachable: 14,948 bytes in 129 blocks
==13450==         suppressed: 0 bytes in 0 blocks
==13450== Rerun with --leak-check=full to see details of leaked memory
==13450== 
==13450== For counts of detected and suppressed errors, rerun with: -v
==13450== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13453== Warning: invalid file descriptor -1 in syscall close()
==13454== 
==13454== HEAP SUMMARY:
==13454==     in use at exit: 23,131 bytes in 129 blocks
==13454==   total heap usage: 231 allocs, 102 frees, 96,638 bytes allocated
==13454== 
==13454== LEAK SUMMARY:
==13454==    definitely lost: 0 bytes in 0 blocks
==13454==    indirectly lost: 0 bytes in 0 blocks
==13454==      possibly lost: 0 bytes in 0 blocks
==13454==    still reachable: 23,131 bytes in 129 blocks
==13454==         suppressed: 0 bytes in 0 blocks
==13454== Rerun with --leak-check=full to see details of leaked memory
==13454== 
==13454== For counts of detected and suppressed errors, rerun with: -v
==13454== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13453== 
==13453== HEAP SUMMARY:
==13453==     in use at exit: 14,967 bytes in 129 blocks
==13453==   total heap usage: 230 allocs, 101 frees, 88,458 bytes allocated
==13453== 
==13453== LEAK SUMMARY:
==13453==    definitely lost: 0 bytes in 0 blocks
==13453==    indirectly lost: 0 bytes in 0 blocks
==13453==      possibly lost: 0 bytes in 0 blocks
==13453==    still reachable: 14,967 bytes in 129 blocks
==13453==         suppressed: 0 bytes in 0 blocks
==13453== Rerun with --leak-check=full to see details of leaked memory
==13453== 
==13453== For counts of detected and suppressed errors, rerun with: -v
==13453== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13456== Warning: invalid file descriptor -1 in syscall close()
==13457== 
==13457== HEAP SUMMARY:
==13457==     in use at exit: 23,146 bytes in 129 blocks
==13457==   total heap usage: 236 allocs, 107 frees, 96,758 bytes allocated
==13457== 
==13457== LEAK SUMMARY:
==13457==    definitely lost: 0 bytes in 0 blocks
==13457==    indirectly lost: 0 bytes in 0 blocks
==13457==      possibly lost: 0 bytes in 0 blocks
==13457==    still reachable: 23,146 bytes in 129 blocks
==13457==         suppressed: 0 bytes in 0 blocks
==13457== Rerun with --leak-check=full to see details of leaked memory
==13457== 
==13457== For counts of detected and suppressed errors, rerun with: -v
==13457== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13456== 
==13456== HEAP SUMMARY:
==13456==     in use at exit: 14,982 bytes in 129 blocks
==13456==   total heap usage: 235 allocs, 106 frees, 88,578 bytes allocated
==13456== 
==13456== LEAK SUMMARY:
==13456==    definitely lost: 0 bytes in 0 blocks
==13456==    indirectly lost: 0 bytes in 0 blocks
==13456==      possibly lost: 0 bytes in 0 blocks
==13456==    still reachable: 14,982 bytes in 129 blocks
==13456==         suppressed: 0 bytes in 0 blocks
==13456== Rerun with --leak-check=full to see details of leaked memory
==13456== 
==13456== For counts of detected and suppressed errors, rerun with: -v
==13456== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13460== Warning: invalid file descriptor -1 in syscall close()
==13461== 
==13461== HEAP SUMMARY:
==13461==     in use at exit: 23,275 bytes in 131 blocks
==13461==   total heap usage: 242 allocs, 111 frees, 96,968 bytes allocated
==13461== 
==13461== LEAK SUMMARY:
==13461==    definitely lost: 0 bytes in 0 blocks
==13461==    indirectly lost: 0 bytes in 0 blocks
==13461==      possibly lost: 0 bytes in 0 blocks
==13461==    still reachable: 23,275 bytes in 131 blocks
==13461==         suppressed: 0 bytes in 0 blocks
==13461== Rerun with --leak-check=full to see details of leaked memory
==13461== 
==13461== For counts of detected and suppressed errors, rerun with: -v
==13461== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13460== 
==13460== HEAP SUMMARY:
==13460==     in use at exit: 15,111 bytes in 131 blocks
==13460==   total heap usage: 241 allocs, 110 frees, 88,788 bytes allocated
==13460== 
==13460== LEAK SUMMARY:
==13460==    definitely lost: 0 bytes in 0 blocks
==13460==    indirectly lost: 0 bytes in 0 blocks
==13460==      possibly lost: 0 bytes in 0 blocks
==13460==    still reachable: 15,111 bytes in 131 blocks
==13460==         suppressed: 0 bytes in 0 blocks
==13460== Rerun with --leak-check=full to see details of leaked memory
==13460== 
==13460== For counts of detected and suppressed errors, rerun with: -v
==13460== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13463== Warning: invalid file descriptor -1 in syscall close()
==13464== 
==13464== HEAP SUMMARY:
==13464==     in use at exit: 23,278 bytes in 131 blocks
==13464==   total heap usage: 247 allocs, 116 frees, 97,149 bytes allocated
==13464== 
==13464== LEAK SUMMARY:
==13464==    definitely lost: 0 bytes in 0 blocks
==13464==    indirectly lost: 0 bytes in 0 blocks
==13464==      possibly lost: 0 bytes in 0 blocks
==13464==    still reachable: 23,278 bytes in 131 blocks
==13464==         suppressed: 0 bytes in 0 blocks
==13464== Rerun with --leak-check=full to see details of leaked memory
==13464== 
==13464== For counts of detected and suppressed errors, rerun with: -v
==13464== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13463== 
==13463== HEAP SUMMARY:
==13463==     in use at exit: 15,114 bytes in 131 blocks
==13463==   total heap usage: 246 allocs, 115 frees, 88,969 bytes allocated
==13463== 
==13463== LEAK SUMMARY:
==13463==    definitely lost: 0 bytes in 0 blocks
==13463==    indirectly lost: 0 bytes in 0 blocks
==13463==      possibly lost: 0 bytes in 0 blocks
==13463==    still reachable: 15,114 bytes in 131 blocks
==13463==         suppressed: 0 bytes in 0 blocks
==13463== Rerun with --leak-check=full to see details of leaked memory
==13463== 
==13463== For counts of detected and suppressed errors, rerun with: -v
==13463== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13466== Warning: invalid file descriptor -1 in syscall close()
==13467== 
==13467== HEAP SUMMARY:
==13467==     in use at exit: 23,388 bytes in 131 blocks
==13467==   total heap usage: 252 allocs, 121 frees, 97,367 bytes allocated
==13467== 
==13467== LEAK SUMMARY:
==13467==    definitely lost: 0 bytes in 0 blocks
==13467==    indirectly lost: 0 bytes in 0 blocks
==13467==      possibly lost: 0 bytes in 0 blocks
==13467==    still reachable: 23,388 bytes in 131 blocks
==13467==         suppressed: 0 bytes in 0 blocks
==13467== Rerun with --leak-check=full to see details of leaked memory
==13467== 
==13467== For counts of detected and suppressed errors, rerun with: -v
==13467== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13466== 
==13466== HEAP SUMMARY:
==13466==     in use at exit: 15,224 bytes in 131 blocks
==13466==   total heap usage: 251 allocs, 120 frees, 89,187 bytes allocated
==13466== 
==13466== LEAK SUMMARY:
==13466==    definitely lost: 0 bytes in 0 blocks
==13466==    indirectly lost: 0 bytes in 0 blocks
==13466==      possibly lost: 0 bytes in 0 blocks
==13466==    still reachable: 15,224 bytes in 131 blocks
==13466==         suppressed: 0 bytes in 0 blocks
==13466== Rerun with --leak-check=full to see details of leaked memory
==13466== 
==13466== For counts of detected and suppressed errors, rerun with: -v
==13466== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13469== Warning: invalid file descriptor -1 in syscall close()
==13470== 
==13470== HEAP SUMMARY:
==13470==     in use at exit: 23,391 bytes in 131 blocks
==13470==   total heap usage: 257 allocs, 126 frees, 97,588 bytes allocated
==13470== 
==13470== LEAK SUMMARY:
==13470==    definitely lost: 0 bytes in 0 blocks
==13470==    indirectly lost: 0 bytes in 0 blocks
==13470==      possibly lost: 0 bytes in 0 blocks
==13470==    still reachable: 23,391 bytes in 131 blocks
==13470==         suppressed: 0 bytes in 0 blocks
==13470== Rerun with --leak-check=full to see details of leaked memory
==13470== 
==13470== For counts of detected and suppressed errors, rerun with: -v
==13470== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13469== 
==13469== HEAP SUMMARY:
==13469==     in use at exit: 15,227 bytes in 131 blocks
==13469==   total heap usage: 256 allocs, 125 frees, 89,408 bytes allocated
==13469== 
==13469== LEAK SUMMARY:
==13469==    definitely lost: 0 bytes in 0 blocks
==13469==    indirectly lost: 0 bytes in 0 blocks
==13469==      possibly lost: 0 bytes in 0 blocks
==13469==    still reachable: 15,227 bytes in 131 blocks
==13469==         suppressed: 0 bytes in 0 blocks
==13469== Rerun with --leak-check=full to see details of leaked memory
==13469== 
==13469== For counts of detected and suppressed errors, rerun with: -v
==13469== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13472== Warning: invalid file descriptor -1 in syscall close()
==13473== 
==13473== HEAP SUMMARY:
==13473==     in use at exit: 23,499 bytes in 131 blocks
==13473==   total heap usage: 262 allocs, 131 frees, 97,917 bytes allocated
==13473== 
==13473== LEAK SUMMARY:
==13473==    definitely lost: 0 bytes in 0 blocks
==13473==    indirectly lost: 0 bytes in 0 blocks
==13473==      possibly lost: 0 bytes in 0 blocks
==13473==    still reachable: 23,499 bytes in 131 blocks
==13473==         suppressed: 0 bytes in 0 blocks
==13473== Rerun with --leak-check=full to see details of leaked memory
==13473== 
==13473== For counts of detected and suppressed errors, rerun with: -v
==13473== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13472== 
==13472== HEAP SUMMARY:
==13472==     in use at exit: 15,335 bytes in 131 blocks
==13472==   total heap usage: 261 allocs, 130 frees, 89,737 bytes allocated
==13472== 
==13472== LEAK SUMMARY:
==13472==    definitely lost: 0 bytes in 0 blocks
==13472==    indirectly lost: 0 bytes in 0 blocks
==13472==      possibly lost: 0 bytes in 0 blocks
==13472==    still reachable: 15,335 bytes in 131 blocks
==13472==         suppressed: 0 bytes in 0 blocks
==13472== Rerun with --leak-check=full to see details of leaked memory
==13472== 
==13472== For counts of detected and suppressed errors, rerun with: -v
==13472== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13446== Warning: invalid file descriptor -1 in syscall close()
==13475== 
==13475== HEAP SUMMARY:
==13475==     in use at exit: 22,463 bytes in 130 blocks
==13475==   total heap usage: 266 allocs, 136 frees, 98,740 bytes allocated
==13475== 
==13475== LEAK SUMMARY:
==13475==    definitely lost: 0 bytes in 0 blocks
==13475==    indirectly lost: 0 bytes in 0 blocks
==13475==      possibly lost: 0 bytes in 0 blocks
==13475==    still reachable: 22,463 bytes in 130 blocks
==13475==         suppressed: 0 bytes in 0 blocks
==13475== Rerun with --leak-check=full to see details of leaked memory
==13475== 
==13475== For counts of detected and suppressed errors, rerun with: -v
==13475== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==13446== 
==13446== HEAP SUMMARY:
==13446==     in use at exit: 13,282 bytes in 129 blocks
==13446==   total heap usage: 264 allocs, 135 frees, 90,047 bytes allocated
==13446== 
==13446== LEAK SUMMARY:
==13446==    definitely lost: 0 bytes in 0 blocks
==13446==    indirectly lost: 0 bytes in 0 blocks
==13446==      possibly lost: 0 bytes in 0 blocks
==13446==    still reachable: 13,282 bytes in 129 blocks
==13446==         suppressed: 0 bytes in 0 blocks
==13446== Rerun with --leak-check=full to see details of leaked memory
==13446== 
==13446== For counts of detected and suppressed errors, rerun with: -v
==13446== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[ INFO] [1569513770.755880492]: Finished loading Gazebo ROS API Plugin.
[ INFO] [1569513770.756283215]: waitForService: Service [/gazebo/set_physics_properties] has not been advertised, waiting...
[INFO] [1569513771.174828, 0.000000]: Calling service /gazebo/spawn_urdf_model
[ INFO] [1569513771.195916634, 0.023000000]: waitForService: Service [/gazebo/set_physics_properties] is now available.
Warning [parser_urdf.cc:1236] multiple inconsistent <gravity> exists due to fixed joint reduction overwriting previous value [true] with [false].
Warning [parser_urdf.cc:1236] multiple inconsistent <gravity> exists due to fixed joint reduction overwriting previous value [false] with [true].
[ INFO] [1569513772.678038791, 0.125000000]: Laser Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513772.678107422, 0.125000000]: Starting Laser Plugin (ns = /)
[ INFO] [1569513772.680463523, 0.125000000]: Laser Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513772.984139421, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513772.988030718, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513772.989881251, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513772.991364043, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513772.993889515, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513772.999240022, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513773.006935967, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513773.025098634, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513773.042621440, 0.125000000]: bayer simulation maybe computationally expensive.
[ WARN] [1569513773.042710689, 0.125000000]: The <focal_length>[320.000105] you have provided for camera_ [wide_stereo_l_stereo_camera_sensor] is inconsistent with specified image_width [640] and HFOV [1.570800].   Please double check to see that focal_length = width_ / (2.0 * tan(HFOV/2.0)), the explected focal_lengtth value is [319.998825], please update your camera_ model description accordingly.
[ INFO] [1569513773.044398440, 0.125000000]: bayer simulation maybe computationally expensive.
[ WARN] [1569513773.044485198, 0.125000000]: The <focal_length>[320.000105] you have provided for camera_ [wide_stereo_r_stereo_camera_sensor] is inconsistent with specified image_width [640] and HFOV [1.570800].   Please double check to see that focal_length = width_ / (2.0 * tan(HFOV/2.0)), the explected focal_lengtth value is [319.998825], please update your camera_ model description accordingly.
[ INFO] [1569513773.074322637, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513773.078731037, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513773.081757643, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513773.084314022, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513773.098842781, 0.125000000]: trigger_mode trigger_mode streaming
[ WARN] [1569513773.099203446, 0.125000000]: The <focal_length>[320.000105] you have provided for camera_ [l_forearm_cam_sensor] is inconsistent with specified image_width [640] and HFOV [1.570800].   Please double check to see that focal_length = width_ / (2.0 * tan(HFOV/2.0)), the explected focal_lengtth value is [319.998825], please update your camera_ model description accordingly.
[ INFO] [1569513773.775768464, 0.125000000]: Laser Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513773.775826138, 0.125000000]: Starting Laser Plugin (ns = /)
[ INFO] [1569513773.777002086, 0.125000000]: Laser Plugin (ns = /)  <tf_prefix_>, set to ""
[ INFO] [1569513773.798381464, 0.125000000]: Camera Plugin: Using the 'robotNamespace' param: '/'
[ INFO] [1569513773.801075468, 0.125000000]: Camera Plugin (ns = /)  <tf_prefix_>, set to ""
[ WARN] [1569513773.812511053, 0.125000000]: The <focal_length>[320.000105] you have provided for camera_ [r_forearm_cam_sensor] is inconsistent with specified image_width [640] and HFOV [1.570800].   Please double check to see that focal_length = width_ / (2.0 * tan(HFOV/2.0)), the explected focal_lengtth value is [319.998825], please update your camera_ model description accordingly.
[INFO] [1569513773.845363, 0.125000]: Spawn status: SpawnModel: Successfully spawned entity
[ INFO] [1569513773.864237580, 0.125000000]: Physics dynamic reconfigure ready.
[ INFO] [1569513773.912607340, 0.125000000]: starting gazebo_ros_controller_manager plugin in ns: /
[ INFO] [1569513773.913094029, 0.125000000]: Callback thread id=7f4d2487f700
[ INFO] [1569513773.915389709, 0.125000000]: gazebo controller manager plugin is waiting for urdf: //robot_description on the param server.  (make sure there is a rosparam by that name in the ros parameter server, otherwise, this plugin blocks simulation forever).
[ INFO] [1569513774.019387318, 0.125000000]: gazebo controller manager got pr2.xml from param server, parsing it...
[urdf_spawner-3] process has finished cleanly
log file: /home/deadmanlogan/.ros/log/12f716ec-e077-11e9-af75-68071520849c/urdf_spawner-3*.log
Segmentation fault (core dumped)
==13384== 
==13384== HEAP SUMMARY:
==13384==     in use at exit: 12,284 bytes in 120 blocks
==13384==   total heap usage: 214 allocs, 94 frees, 86,495 bytes allocated
==13384== 
==13384== LEAK SUMMARY:
==13384==    definitely lost: 0 bytes in 0 blocks
==13384==    indirectly lost: 0 bytes in 0 blocks
==13384==      possibly lost: 0 bytes in 0 blocks
==13384==    still reachable: 12,284 bytes in 120 blocks
==13384==         suppressed: 0 bytes in 0 blocks
==13384== Rerun with --leak-check=full to see details of leaked memory
==13384== 
==13384== For counts of detected and suppressed errors, rerun with: -v
==13384== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[gazebo-2] process has died [pid 13384, exit code 139, cmd valgrind /home/deadmanlogan/i_am_from_source/ros_catkin_ws/src/gazebo_ros_pkgs/gazebo_ros/scripts/gzserver -e ode worlds/empty.world __name:=gazebo __log:=/home/deadmanlogan/.ros/log/12f716ec-e077-11e9-af75-68071520849c/gazebo-2.log].
log file: /home/deadmanlogan/.ros/log/12f716ec-e077-11e9-af75-68071520849c/gazebo-2*.log

Speaking of any more details, I have Ubuntu xenial and trying to use my "launch file" to launch my robotic model in gazebo7(robotic simulation software) and this simulation software is giving segmentation fault on running my launch file. Since this launch file is readymade from Github I think probably there is no error in that launch file.

What do you think is causing the error based on my provided information?

Jason C. McDonald • Sep 26 '19

o.O

Wow, I've never seen this one before. The segfault is occurring, but Valgrind doesn't seem to be catching it.

I'm curious how you're invoking Valgrind. Usually I'd just pass the executable right to it:

$ valgrind roslaunch pr2_description pr2.launch

all_yours • Sep 26 '19

I invoked valgrind by specifying it as an option in the launch file itself and the same way i invoked gdb.

<node name="gazebo" pkg="gazebo_ros"  type="$(arg script_type)" respawn="$(arg respawn_gazebo)" output="$(arg output)" launch-prefix="valgrind"

I am very stressed with this problem but i don't want to give up.
What do you suggest for this problem?

Jason C. McDonald • Sep 26 '19 • Edited

You know, I'd be really curious to know what would happen if you ran the launch file itself through Valgrind! If you look at the output from a moment ago, there's quite a lot that is occuring outside of Valgrind (all the lines not preceded with ==nnnnn== (where nnnnn is some number). The segfault at the end appears to be occuring outside of that context as well. That leads me to believe the segfault might actually be within the launch file.

all_yours • Sep 26 '19

I just ran it through valgrind

Comment deleted

Jason C. McDonald • Sep 26 '19

Yikes. Could you delete that comment chain and put it in a Gist or bpaste.net or some such? It'll be easier to read.

In any case, that confirmed my suspicion; the launcher is the problem. it's not memory pure at all.

Comment deleted

all_yours • Sep 26 '19

After ending the process manually I further got the output

^C[robo_state_publisher-4] killing on exit
[rosout-1] killing on exit
[master] killing on exit
shutting down processing monitor...
... shutting down processing monitor complete
done
==15557== Invalid read of size 4
==15557==    at 0x41964F: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4D07FA: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AA262: ??? (in /usr/bin/python2.7)
==15557==    by 0x4E0C11: ??? (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x62f2020 is 2,592 bytes inside an unallocated block of size 2,768 in arena "client"
==15557== 
==15557== Invalid read of size 4
==15557==    at 0x41964F: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4AA262: ??? (in /usr/bin/python2.7)
==15557==    by 0x4E0C11: ??? (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x7f37020 is 128 bytes inside a block of size 552 free'd
==15557==    at 0x4C2EDEB: free (vg_replace_malloc.c:530)
==15557==    by 0x50C4362: fclose@@GLIBC_2.2.5 (iofclose.c:84)
==15557==    by 0x43CEFB: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A63FD: PyObject_Call (in /usr/bin/python2.7)
==15557==    by 0x5385A5: _PyObject_CallMethod_SizeT (in /usr/bin/python2.7)
==15557==    by 0x53F4CE: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AEF42: PyObject_CallFunctionObjArgs (in /usr/bin/python2.7)
==15557==    by 0x4BF668: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C2C3B: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C24E9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==  Block was alloc'd at
==15557==    at 0x4C2DB8F: malloc (vg_replace_malloc.c:299)
==15557==    by 0x50C4CDC: __fopen_internal (iofopen.c:69)
==15557==    by 0x53D247: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AB6FA: ??? (in /usr/bin/python2.7)
==15557==    by 0x53CDBE: ??? (in /usr/bin/python2.7)
==15557==    by 0x4BD1D9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C2C3B: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C24E9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C24E9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557== 
==15557== Invalid read of size 4
==15557==    at 0x41964F: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x7eea020 is 0 bytes inside a block of size 8 free'd
==15557==    at 0x4C2EDEB: free (vg_replace_malloc.c:530)
==15557==    by 0x49B1E4: ??? (in /usr/bin/python2.7)
==15557==    by 0x4D878E: ??? (in /usr/bin/python2.7)
==15557==    by 0x4BD778: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4D6218: ??? (in /usr/bin/python2.7)
==15557==    by 0x4EEC7D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A63FD: PyObject_Call (in /usr/bin/python2.7)
==15557==    by 0x4C6C2F: PyEval_CallObjectWithKeywords (in /usr/bin/python2.7)
==15557==    by 0x6EB480C: ??? (in /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so)
==15557==    by 0x6EBCF3D: ??? (in /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so)
==15557==    by 0x710D68F: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.0)
==15557==  Block was alloc'd at
==15557==    at 0x4C2DB8F: malloc (vg_replace_malloc.c:299)
==15557==    by 0x493F0E: PyList_New (in /usr/bin/python2.7)
==15557==    by 0x510D4D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4BD1D9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4D6218: ??? (in /usr/bin/python2.7)
==15557==    by 0x4EEC7D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A63FD: PyObject_Call (in /usr/bin/python2.7)
==15557==    by 0x4C6C2F: PyEval_CallObjectWithKeywords (in /usr/bin/python2.7)
==15557==    by 0x6EB480C: ??? (in /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so)
==15557==    by 0x6EBCF3D: ??? (in /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so)
==15557==    by 0x710D68F: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.0)
==15557== 
==15557== Invalid read of size 4
==15557==    at 0x41964F: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4D0C94: ??? (in /usr/bin/python2.7)
==15557==    by 0x4D086B: ??? (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x611c020 is 48,736 bytes inside a block of size 49,152 free'd
==15557==    at 0x4C2EDEB: free (vg_replace_malloc.c:530)
==15557==    by 0x4AA3C4: ??? (in /usr/bin/python2.7)
==15557==    by 0x495BCA: PyDict_SetItem (in /usr/bin/python2.7)
==15557==    by 0x4FC278: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Block was alloc'd at
==15557==    at 0x4C2FB55: calloc (vg_replace_malloc.c:711)
==15557==    by 0x498D2C: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A252E: PyDict_Merge (in /usr/bin/python2.7)
==15557==    by 0x512275: ??? (in /usr/bin/python2.7)
==15557==    by 0x4BD1D9: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4C210E: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4BA915: PyEval_EvalCodeEx (in /usr/bin/python2.7)
==15557==    by 0x4C2C3B: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4C210E: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4C210E: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4C210E: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557==    by 0x4C210E: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==15557== 
==15557== Invalid read of size 4
==15557==    at 0x41964F: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4D0B76: ??? (in /usr/bin/python2.7)
==15557==    by 0x4D086B: ??? (in /usr/bin/python2.7)
==15557==    by 0x495BCA: PyDict_SetItem (in /usr/bin/python2.7)
==15557==    by 0x4FC278: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBBBD: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x6083020 is 3,200 bytes inside a block of size 3,218 free'd
==15557==    at 0x4C2EDEB: free (vg_replace_malloc.c:530)
==15557==    by 0x4D0A85: ??? (in /usr/bin/python2.7)
==15557==    by 0x4D086B: ??? (in /usr/bin/python2.7)
==15557==    by 0x495BCA: PyDict_SetItem (in /usr/bin/python2.7)
==15557==    by 0x4FC278: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBBBD: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Block was alloc'd at
==15557==    at 0x4C2DB8F: malloc (vg_replace_malloc.c:299)
==15557==    by 0x4A0021: PyString_FromStringAndSize (in /usr/bin/python2.7)
==15557==    by 0x4B3F50: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B425C: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B414F: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B4272: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B3E65: PyMarshal_ReadObjectFromString (in /usr/bin/python2.7)
==15557==    by 0x4B3DE5: PyMarshal_ReadLastObjectFromFile (in /usr/bin/python2.7)
==15557==    by 0x4B3D2D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B390B: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A4C20: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A42B2: PyImport_ImportModuleLevel (in /usr/bin/python2.7)
==15557== 
==15557== Conditional jump or move depends on uninitialised value(s)
==15557==    at 0x419658: PyObject_Free (in /usr/bin/python2.7)
==15557==    by 0x4D0C94: ??? (in /usr/bin/python2.7)
==15557==    by 0x4D086B: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AA262: ??? (in /usr/bin/python2.7)
==15557==    by 0x4E0C11: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AA0E3: ??? (in /usr/bin/python2.7)
==15557==    by 0x4E0BFB: ??? (in /usr/bin/python2.7)
==15557==    by 0x4AA262: ??? (in /usr/bin/python2.7)
==15557==    by 0x4E0C11: ??? (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBBBD: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557== 
==15557== Invalid read of size 4
==15557==    at 0x502477: PyGrammar_RemoveAccelerators (in /usr/bin/python2.7)
==15557==    by 0x4F8DF3: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Address 0x615a020 is 304 bytes inside a block of size 617 free'd
==15557==    at 0x4C2EDEB: free (vg_replace_malloc.c:530)
==15557==    by 0x4D07FA: ??? (in /usr/bin/python2.7)
==15557==    by 0x4FC2C9: _PyModule_Clear (in /usr/bin/python2.7)
==15557==    by 0x4FBADC: PyImport_Cleanup (in /usr/bin/python2.7)
==15557==    by 0x4F8D83: Py_Finalize (in /usr/bin/python2.7)
==15557==    by 0x4936F1: Py_Main (in /usr/bin/python2.7)
==15557==    by 0x507782F: (below main) (libc-start.c:291)
==15557==  Block was alloc'd at
==15557==    at 0x4C2DB8F: malloc (vg_replace_malloc.c:299)
==15557==    by 0x4A0021: PyString_FromStringAndSize (in /usr/bin/python2.7)
==15557==    by 0x4B3F50: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B407D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B4272: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B414F: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B4272: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B3E65: PyMarshal_ReadObjectFromString (in /usr/bin/python2.7)
==15557==    by 0x4B3DE5: PyMarshal_ReadLastObjectFromFile (in /usr/bin/python2.7)
==15557==    by 0x4B3D2D: ??? (in /usr/bin/python2.7)
==15557==    by 0x4B390B: ??? (in /usr/bin/python2.7)
==15557==    by 0x4A4C20: ??? (in /usr/bin/python2.7)
==15557== 
==15557== 
==15557== HEAP SUMMARY:
==15557==     in use at exit: 3,393,686 bytes in 5,832 blocks
==15557==   total heap usage: 278,397 allocs, 272,565 frees, 371,732,938 bytes allocated
==15557== 
==15557== LEAK SUMMARY:
==15557==    definitely lost: 0 bytes in 0 blocks
==15557==    indirectly lost: 0 bytes in 0 blocks
==15557==      possibly lost: 55,704 bytes in 96 blocks
==15557==    still reachable: 3,337,982 bytes in 5,736 blocks
==15557==         suppressed: 0 bytes in 0 blocks
==15557== Rerun with --leak-check=full to see details of leaked memory
==15557== 
==15557== For counts of detected and suppressed errors, rerun with: -v
==15557== Use --track-origins=yes to see where uninitialised values come from
==15557== ERROR SUMMARY: 9917 errors from 128 contexts (suppressed: 0 from 0)

So this was the whole output I got , sorry for uploading this in parts(character limitation).

I hope this gives something useful to track down the issue.

all_yours • Sep 26 '19

I apologise for making such a long comment chain.
I have now made a gist of running the launch file through valgrind in

gist.github.com/rishabh900/41fd6df...

And the above comment is the output after i terminated the process manually.
So what do you think of now?

Jason C. McDonald • Sep 26 '19 • Edited

Did you write the launcher script, or is that third-party? It's clearly written in Python, and the issue is definitely there. I just can't narrow in on the specific issue, because the memory issues are being thrown by the interpreter (e.g. at 0x41964F: PyObject_Free (in /usr/bin/python2.7)). That indicates that something odd has been done within the Python code, but I won't be able to diagnose this further without really fully understanding the launcher's source code, and I'm afraid I don't have time to learn it.

If this is third-party code, open an issue against the launcher project, and include the above output of Valgrind.

Sergiu Mureşan • Sep 8 '18 • Edited

Great to see a fellow low-level programmer on here!

I worked on a game engine written in C and was having many issues related to wrongly using the realloc function for dynamically allocated memory. What I did was forget to assign the reallocated memory's pointer to the return value of the function. It took me weeks before I found the underlying problem since only in some cases it would blow up. How would you go about debugging a situation like:

int* p = calloc(5, sizeof(int));
// some code
realloc(p, 6 * sizeof(int)); // notice no assignment

Do you use some sort of special tools? Or just some coding standards to not let this happen?

Jason C. McDonald • Sep 8 '18

Whenever I'm working with memory, I pair two different tools: Valgrind and Goldilocks (PawLIB).

Valgrind is a pretty ubiquitous tool on UNIX platforms which will show me all of the memory issues encountered while running, even if the undefined behavior doesn't cause any overt problems. My code isn't done until it's Valgrind-pure. However, Valgrind only monitors the execution, so...

Goldilocks is a testing framework I developed at MousePaw Media, as a part of PawLIB. You could technically use any testing framework, but the benefit to Goldilocks is that it bakes the tests into the final executable, instead of requiring an additional framework to run the tests. That way, you can start the normal executable, run each of the tests you wrote, and see which ones Valgrind complains about.

Mind you, this does require you to write a lot of comprehensive behavioral tests...but you really should be doing that anyway in production code. ;)

Likai Liu • Sep 10 '18

My approach for this specific problem is to use a compiler that warns about unused return value, such as gcc or clang. I know that stdlib.h on Linux and Mac OS X already decorates realloc() with warn_unused_result attribute.

stackoverflow.com/a/2889601

But just naively setting p = realloc(p, ...) is also wrong, since if the allocation fails, p would be set to NULL but the original object is still allocated. The original pointer is lost and now a memory leak. Use reallocf() which frees the original memory if it could not be resized.

Sergiu Mureşan • Sep 10 '18

That's a really nice feature, didn't know about it.

But wouldn't that mean data loss in case the memory can't be resized? Wouldn't that become an unrecoverable error?

Jason C. McDonald • Sep 10 '18

@liulk Ha, I completely forgot to mention Clang! It does indeed have the best warnings of any compiler I've used. I almost always compile with -Wall -Wextra -Wpedantic -Werror; that last one (as you know, although the reader might not) causes the build to fail on any warnings.

I also use cppcheck as part of my autoreview workflow, and resolve all linter warnings before committing to the production branch.

Likai Liu • Sep 10 '18

@codevault You're right, reallocf() would just free the memory and cause data loss, so it would serve a different use case than realloc(). The more general solution would be to always use this pattern, which is more verbose:

void *q = realloc(p, new_size);
if (q == NULL) {
  // do error handling.
  return;
}
p = q;

I just find that in most of my use cases, I would end up freeing p in the error handling, so I would just use reallocf() which results in less verbose code.

Sergiu Mureşan • Sep 10 '18

I see, that makes sense. I can see myself freeing the memory most of the time when reallocation fails.

Good to note. Thanks!

Jason C. McDonald • Sep 8 '18

I should add, I use another tool from PawLIB called IOChannel - basically, a std::cout wrapper - that allows me to cleanly print the address and raw memory from literally any pointer, without having to use a debugger. This can make debugging some problems infinitely easier, especially when you're contending with a Heisenbug that goes away if compiled with -g, but appears when compiled with -O2.

Sergiu Mureşan • Sep 10 '18

Thanks for the response!

Unfortunately, I didn't find a version of Valgrind for Windows. I tried DrMemory but, after lots of struggle, it didn't give me any helpful information and dropped the ball. Do you have experience with low-level on Windows or just work exclusively on Linux since it is more convenient?

Jason C. McDonald • Sep 10 '18

I rarely use Windows for development, as its development toolchain is almost invariably miles behind its UNIX-based counterparts.

If you're on Windows 10, I strongly recommend setting up the Windows Subsystem for Linux [WSL]. That will give you access to the Linux development environment for compiling and testing. Then, use the LLVM Clang compiler on both the WSL and the Visual Studio environments. That way, once you know it compiles and runs Valgrind-pure on WSL, you can trust that it will work on VS Clang.

Scott E. Rodgers • Jul 7 '22

RE my previous post, (which I do not see): I figured out the problem. As I suspected, when I traced the spaghetti logic, I was doing something with undefined behavior. Fixed it, and there is no problem on either instance now. Thanks for all your contribution!

Priyansh Jain • Sep 8 '18 • Edited

Are rust and golang going to take over C and C++? In terms of desktop software/web development. Also how would you, as a C++ expert, rate these languages? Do they have potential?

Jason C. McDonald • Sep 8 '18 • Edited

I strongly believe that (virtually) all languages have their place. FORTRAN and COBOL have firmly established places in the world, and are almost certain never to lose them on account of their reliability and precedence.

C and C++ likewise have this precedence, making up a sizable chunk of our source code. It's the old "if it ain't broke, don't fix it" concept; I doubt the entire collection of software that makes up a standard Linux-based operating system will ever be rewritten from C to Rust, because most of what already exists works quite well.

That said, I think Rust and golang have a lot of potential as languages, especially Rust.

(In my personal opinion, golang is a rather hipster language, but that's based in my feelings towards it, not in anything practical; so take that with a grain of salt.)

Rust looks especially interesting in the area of error handling. I'll admit, I haven't had the time to learn it very well yet, but it's DEFINITELY high on my list!

In other words, Rust and golang will probably find established places in the programming world, but they won't be displacing C, C++, or any other established language. Every tool has its place, and a quarter inch drill bit doesn't replace a 5mm drill bit.

Mihail Malo • Sep 8 '18

Rust would take over everything, but the leftist collectivist community threw a baton roue of so-called "golang" at it, and now we will all perish and return to a dark age of literal witch hunts. (Because we are all out of Moore's law)
RIP information technology and human civilization in general

Priyansh Jain • Sep 8 '18

LOL

Mihail Malo • Sep 8 '18

Nihilist

Shantanu Banerjee • May 21 '21 • Edited

Hello Jason!! I am having some problem of memory corruption sometimes. Could you please help me out with some possible cause. Image is attached.

This issue is not always happening but sometimes. I have some memory crunch in my arm board. Is this the cause of this undefined behavior?

dev-to-uploads.s3.amazonaws.com/up...

Jason C. McDonald • May 21 '21

Yes, this is undefined behavior. The line marked #3 indicates a double-free is occuring somewhere in your code, wherein an already-freed pointer is being freed again. The behavior of a double-free is undefined.

Unfortunately, I don't have enough context here (especially in an image) to debug this for you. It looks like you're already using valgrind. Try compiling your own code with debug flags (-g), and then running it through Valgrind and repeating whatever action triggers this error. The resulting stack trace should point you to the line in your code where the double-free is originating.

Shantanu Banerjee • May 23 '21

Thank you very much for your reply.
I am unable to use valgrind on target due to very limited system memory on my embedded device. Could you please suggest any alternate approach which use low resource?
I have compiled my application with -ggdb3.
I have encounter this problem only with wide character text (std::wstring) and during the object destruction (out of scope) this corruption is happening.

Images
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...
dev-to-uploads.s3.amazonaws.com/up...

Shantanu Banerjee • May 23 '21

Memory dump of variable from frame #6 of stack trace
dev-to-uploads.s3.amazonaws.com/up...

Could you please provide any hint?

Jason C. McDonald • May 23 '21

If you have memory bugs in your code, it should also work if you compile and test it on your regular machine, even though the architecture is different. Have you tried that?

Shantanu Banerjee • May 24 '21

Thanks for your advice. I haven't tried to run this on regular pc. This is pretty big task but let me try this. Could you please provide any opinion to use ASAN compiled binary on target?

Jason C. McDonald • May 24 '21

That might work. I haven't worked with ARM boards or microcircuits before, so I don't have much insight there.

Lindenouwen • Apr 4 '21

Hi Jason,

I had a problem when mixing C and C++ code. Everything compiles just fine but when I run it I get a segfault. This happens when the code goes from C++ to C
I implemented it as follows:

include "Signature.hpp"

include

extern "C"{
#include "sign.h"
#include "sign.c"
#include "params.h"
#include "aes256ctr.h"
#include "aes256ctr.c"
#include "ntt.h"
#include "ntt.c"
#include "packing.h"
#include "packing.c"
#include "poly.h"
#include "poly.c"
#include "polyvec.h"
#include "polyvec.c"
#include "reduce.h"
#include "reduce.c"
#include "randombytes.h"
#include "randombytes.c"
#include "fips202.h"
#include "fips202.c"
#include "rounding.h"
#include "rounding.c"
#include "symmetric.h"
#include "symmetric-aes.c"

int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_keypair(uint8_t *,uint8_t *);

}
using namespace std;
// function to generate a keypair, mainly uses the c function from the Dilithium PQ scheme
void Signature::generateKeyPair(){
uint8_t* pk = getPublicKeyAddress();
uint8_t* sk = getSecretKeyAddress();
PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_keypair(pk, sk);
cout << "I succeeded" << endl;
}
I really thought I implemented this right and I do not know what I am doing wrong. Could you help me?

Jason C. McDonald • Apr 5 '21

For a start, you might want to wrap your code in your post above with three backticks on the line above, and three on the line below. Otherwise, it's hard to read the code. ;)

Lindenouwen • Apr 5 '21

Dear Jason,

I am so sorry, this is my first time on the platform. I have used valgrind and this gives me that the segfault is caused by an invalid write of size 8. Here is my code (main function):

#include "Signature.hpp"
#include "Signature.cpp"
#include <cstdint>
#include <iostream>
#include <cstddef>
#include <cstring>
using namespace std;
extern "C"{
     #include "api.h"
    }
int main(){
    char publicKey[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_PUBLICKEYBYTES];
    char secretKey[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_SECRETKEYBYTES];
    uint8_t* pk;
    pk = (uint8_t *) &publicKey;
    uint8_t* sk;
    sk = (uint8_t *) &secretKey;
    // string passwd = "admin";
    Signature sig(pk, sk);
    sig.generateKeyPair();
    cout << "I succeeded in generating a key pair." << endl;
    string mes = "I am checking if this works";
    size_t mlen = mes.length();
    char message[mlen+1];
    strcpy(message, mes.c_str());
    uint8_t* m;
    m = (uint8_t*) &message;
    mlen = sizeof(*m);
    char signedMessage[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES+mlen];
    uint8_t* sm;
    sm = (uint8_t*) &signedMessage;
    size_t smLength = sizeof(*sm);
    size_t* smlen;
    smlen = (size_t*) smLength;
    sig.signMessage(sm, smlen, m, mlen);
    cout << "I succeeded in signing a message." << endl;
    cout << *sm << endl;
    return 0;
}

and this is the signature class from which the c code is called:

#include "Signature.hpp"
#include <cstdint>
#include <iostream>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cstddef>
#include <cryptopp/sha3.h>
#include <cryptopp/cryptlib.h>
#include <cryptopp/hex.h>
#include <cryptopp/filters.h>
#include <cryptopp/sha.h>
#include <cryptopp/hex.h>
#include <cryptopp/files.h>

using namespace CryptoPP;

extern "C"{
     #include "api.h"
    }
using namespace std;
void Signature::signMessage(uint8_t* signedMessage, std::size_t* smLength, const uint8_t* message, std::size_t mLength){
    // if (checkPasswd(passwd)==true){
        const uint8_t *sk = getSecretKeyAddress();
        cout << "I get here" << endl;
        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign(signedMessage, smLength, message, mLength, sk);
        cout << "not here" << endl;
    // }
}

The PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign is a c function that was built for the NIST competition for post quantum signature schemes.
Do you know what I am doing wrong here?

Jason C. McDonald • Apr 5 '21

I have used valgrind and this gives me that the segfault is caused by an invalid write of size 8.

More importantly, what file name and line number does it say that invalid write occurs on? (Consider posted the full output of Valgrind's error message, with traceback.)

Lindenouwen • Apr 6 '21

==6225== Memcheck, a memory error detector
==6225== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6225== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==6225== Command: ./signatures
==6225== 
I succeeded in generating a key pair.
I get here
==6225== Invalid write of size 8
==6225==    at 0x109D39: PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_signature (sign.c:179)
==6225==    by 0x109DED: PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign (sign.c:210)
==6225==    by 0x1096D2: Signature::signMessage(unsigned char*, unsigned long*, unsigned char const*, unsigned long) (Signature.cpp:73)
==6225==    by 0x10946C: main (main.cpp:35)
==6225==  Address 0x1 is not stack'd, malloc'd or (recently) free'd
==6225== 
==6225== 
==6225== Process terminating with default action of signal 11 (SIGSEGV)
==6225==  Access not within mapped region at address 0x1
==6225==    at 0x109D39: PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_signature (sign.c:179)
==6225==    by 0x109DED: PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign (sign.c:210)
==6225==    by 0x1096D2: Signature::signMessage(unsigned char*, unsigned long*, unsigned char const*, unsigned long) (Signature.cpp:73)
==6225==    by 0x10946C: main (main.cpp:35)
==6225==  If you believe this happened as a result of a stack
==6225==  overflow in your program's main thread (unlikely but
==6225==  possible), you can try to increase the size of the
==6225==  main thread stack using the --main-stacksize= flag.
==6225==  The main thread stack size used in this run was 8388608.
I got infor loop works==6225== 
==6225== HEAP SUMMARY:
==6225==     in use at exit: 28 bytes in 1 blocks
==6225==   total heap usage: 11 allocs, 10 frees, 75,396 bytes allocated
==6225== 
==6225== LEAK SUMMARY:
==6225==    definitely lost: 0 bytes in 0 blocks
==6225==    indirectly lost: 0 bytes in 0 blocks
==6225==      possibly lost: 0 bytes in 0 blocks
==6225==    still reachable: 28 bytes in 1 blocks
==6225==         suppressed: 0 bytes in 0 blocks
==6225== Rerun with --leak-check=full to see details of leaked memory
==6225== 
==6225== For lists of detected and suppressed errors, rerun with: -s
==6225== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Here is the valgrind error message. The problem happens inside another file than the two above this one although the problem can't be there because, as said above, this is code from an almost NIST standard in post quantum and has been reviewed in correctness. For completeness, here is the code:

#include "packing.h"
#include "params.h"
#include "poly.h"
#include "polyvec.h"
#include "sign.h"
#include "randombytes.h"
#include "symmetric.h"
#include "fips202.h"
#include <stdio.h>

#include <stdint.h>

/*************************************************
* Name:        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_keypair
*
* Description: Generates public and private key.
*
* Arguments:   - uint8_t *pk: pointer to output public key (allocated
*                             array of PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_PUBLICKEYBYTES bytes)
*              - uint8_t *sk: pointer to output private key (allocated
*                             array of PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_SECRETKEYBYTES bytes)
*
* Returns 0 (success)
**************************************************/
int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_keypair(uint8_t *pk, uint8_t *sk) {
    uint8_t seedbuf[2 * SEEDBYTES + CRHBYTES];
    uint8_t tr[SEEDBYTES];
    const uint8_t *rho, *rhoprime, *key;
    polyvecl mat[K];
    polyvecl s1, s1hat;
    polyveck s2, t1, t0;
    /* Get randomness for rho, rhoprime and key */
    randombytes(seedbuf, SEEDBYTES);
    shake256(seedbuf, 2 * SEEDBYTES + CRHBYTES, seedbuf, SEEDBYTES);
    rho = seedbuf;
    rhoprime = rho + SEEDBYTES;
    key = rhoprime + CRHBYTES;

    /* Expand matrix */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_expand(mat, rho);

    /* Sample short vectors s1 and s2 */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_uniform_eta(&s1, rhoprime, 0);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_uniform_eta(&s2, rhoprime, L);

    /* Matrix-vector multiplication */
    s1hat = s1;
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_ntt(&s1hat);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_pointwise_montgomery(&t1, mat, &s1hat);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_reduce(&t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_invntt_tomont(&t1);

    /* Add error vector s2 */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_add(&t1, &t1, &s2);

    /* Extract t1 and write public key */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_caddq(&t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_power2round(&t1, &t0, &t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_pack_pk(pk, rho, &t1);

    /* Compute H(rho, t1) and write secret key */
    shake256(tr, SEEDBYTES, pk, PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_PUBLICKEYBYTES);
    PQCLEAN_DILITHIUM5AES_CLEAN_pack_sk(sk, rho, tr, key, &t0, &s1, &s2);

    return 0;
}

/*************************************************
* Name:        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_signature
*
* Description: Computes signature.
*
* Arguments:   - uint8_t *sig:   pointer to output signature (of length PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES)
*              - size_t *siglen: pointer to output length of signature
*              - uint8_t *m:     pointer to message to be signed
*              - size_t mlen:    length of message
*              - uint8_t *sk:    pointer to bit-packed secret key
*
* Returns 0 (success)
**************************************************/
int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_signature(uint8_t *sig,
        size_t *siglen,
        const uint8_t *m,
        size_t mlen,
        const uint8_t *sk) {
    unsigned int n;
    uint8_t seedbuf[3 * SEEDBYTES + 2 * CRHBYTES];
    uint8_t *rho, *tr, *key, *mu, *rhoprime;
    uint16_t nonce = 0;
    polyvecl mat[K], s1, y, z;
    polyveck t0, s2, w1, w0, h;
    poly cp;
    shake256incctx state;

    rho = seedbuf;
    tr = rho + SEEDBYTES;
    key = tr + SEEDBYTES;
    mu = key + SEEDBYTES;
    rhoprime = mu + CRHBYTES;
    PQCLEAN_DILITHIUM5AES_CLEAN_unpack_sk(rho, tr, key, &t0, &s1, &s2, sk);

     /* Compute CRH(tr, msg) */
    shake256_inc_init(&state);
    shake256_inc_absorb(&state, tr, SEEDBYTES);
    shake256_inc_absorb(&state, m, mlen);
    shake256_inc_finalize(&state);
    shake256_inc_squeeze(mu, CRHBYTES, &state);
    shake256_inc_ctx_release(&state);

    shake256(rhoprime, CRHBYTES, key, SEEDBYTES + CRHBYTES);

    /* Expand matrix and transform vectors */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_expand(mat, rho);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_ntt(&s1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_ntt(&s2);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_ntt(&t0);

rej:
    /* Sample intermediate vector y */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_uniform_gamma1(&y, rhoprime, nonce++);

    /* Matrix-vector multiplication */
    z = y;
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_ntt(&z);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_pointwise_montgomery(&w1, mat, &z);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_reduce(&w1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_invntt_tomont(&w1);

    /* Decompose w and call the random oracle */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_caddq(&w1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_decompose(&w1, &w0, &w1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_pack_w1(sig, &w1);

    shake256_inc_init(&state);
    shake256_inc_absorb(&state, mu, CRHBYTES);
    shake256_inc_absorb(&state, sig, K * POLYW1_PACKEDBYTES);
    shake256_inc_finalize(&state);
    shake256_inc_squeeze(sig, SEEDBYTES, &state);
    shake256_inc_ctx_release(&state);
    PQCLEAN_DILITHIUM5AES_CLEAN_poly_challenge(&cp, sig);
    PQCLEAN_DILITHIUM5AES_CLEAN_poly_ntt(&cp);

    /* Compute z, reject if it reveals secret */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_pointwise_poly_montgomery(&z, &cp, &s1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_invntt_tomont(&z);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_add(&z, &z, &y);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_reduce(&z);
    if (PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_chknorm(&z, GAMMA1 - BETA)) {
        goto rej;
    }

    /* Check that subtracting cs2 does not change high bits of w and low bits
     * do not reveal secret information */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_pointwise_poly_montgomery(&h, &cp, &s2);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_invntt_tomont(&h);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_sub(&w0, &w0, &h);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_reduce(&w0);
    if (PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_chknorm(&w0, GAMMA2 - BETA)) {
        goto rej;
    }

    /* Compute hints for w1 */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_pointwise_poly_montgomery(&h, &cp, &t0);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_invntt_tomont(&h);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_reduce(&h);
    if (PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_chknorm(&h, GAMMA2)) {
        goto rej;
    }

    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_add(&w0, &w0, &h);
    n = PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_make_hint(&h, &w0, &w1);
    if (n > OMEGA) {
        goto rej;
    }

    /* Write signature */
    PQCLEAN_DILITHIUM5AES_CLEAN_pack_sig(sig, sig, &z, &h);
    //this line is the one that gives the segmentation fault. fault is something with an invalid write somehow.
   *siglen = PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES;
    return 0;
}

/*************************************************
* Name:        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign
*
* Description: Compute signed message.
*
* Arguments:   - uint8_t *sm: pointer to output signed message (allocated
*                             array with PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES + mlen bytes),
*                             can be equal to m
*              - size_t *smlen: pointer to output length of signed
*                               message
*              - const uint8_t *m: pointer to message to be signed
*              - size_t mlen: length of message
*              - const uint8_t *sk: pointer to bit-packed secret key
*
* Returns 0 (success)
**************************************************/
int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign(uint8_t *sm,
        size_t *smlen,
        const uint8_t *m,
        size_t mlen,
        const uint8_t *sk) {
    size_t i;
    printf("I got in");
    for (i = 0; i < mlen; ++i) {
        sm[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES + mlen - 1 - i] = m[mlen - 1 - i];
    }
    printf("for loop works");
    PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_signature(sm, smlen, sm + PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES, mlen, sk);

    *smlen += mlen;
    return 0;
}

/*************************************************
* Name:        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_verify
*
* Description: Verifies signature.
*
* Arguments:   - uint8_t *m: pointer to input signature
*              - size_t siglen: length of signature
*              - const uint8_t *m: pointer to message
*              - size_t mlen: length of message
*              - const uint8_t *pk: pointer to bit-packed public key
*
* Returns 0 if signature could be verified correctly and -1 otherwise
**************************************************/
int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_verify(const uint8_t *sig,
        size_t siglen,
        const uint8_t *m,
        size_t mlen,
        const uint8_t *pk) {
    unsigned int i;
    uint8_t buf[K * POLYW1_PACKEDBYTES];
    uint8_t rho[SEEDBYTES];
    uint8_t mu[CRHBYTES];
    uint8_t c[SEEDBYTES];
    uint8_t c2[SEEDBYTES];
    poly cp;
    polyvecl mat[K], z;
    polyveck t1, w1, h;
    shake256incctx state;

    if (siglen != PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES) {
        return -1;
    }

    PQCLEAN_DILITHIUM5AES_CLEAN_unpack_pk(rho, &t1, pk);
    if (PQCLEAN_DILITHIUM5AES_CLEAN_unpack_sig(c, &z, &h, sig)) {
        return -1;
    }
    if (PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_chknorm(&z, GAMMA1 - BETA)) {
        return -1;
    }

    /* Compute CRH(H(rho, t1), msg) */
    shake256(mu, SEEDBYTES, pk, PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_PUBLICKEYBYTES);
    shake256_inc_init(&state);
    shake256_inc_absorb(&state, mu, SEEDBYTES);
    shake256_inc_absorb(&state, m, mlen);
    shake256_inc_finalize(&state);
    shake256_inc_squeeze(mu, CRHBYTES, &state);
    shake256_inc_ctx_release(&state);

    /* Matrix-vector multiplication; compute Az - c2^dt1 */
    PQCLEAN_DILITHIUM5AES_CLEAN_poly_challenge(&cp, c);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_expand(mat, rho);

    PQCLEAN_DILITHIUM5AES_CLEAN_polyvecl_ntt(&z);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyvec_matrix_pointwise_montgomery(&w1, mat, &z);

    PQCLEAN_DILITHIUM5AES_CLEAN_poly_ntt(&cp);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_shiftl(&t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_ntt(&t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_pointwise_poly_montgomery(&t1, &cp, &t1);

    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_sub(&w1, &w1, &t1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_reduce(&w1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_invntt_tomont(&w1);

    /* Reconstruct w1 */
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_caddq(&w1);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_use_hint(&w1, &w1, &h);
    PQCLEAN_DILITHIUM5AES_CLEAN_polyveck_pack_w1(buf, &w1);

    /* Call random oracle and verify PQCLEAN_DILITHIUM5AES_CLEAN_challenge */
    shake256_inc_init(&state);
    shake256_inc_absorb(&state, mu, CRHBYTES);
    shake256_inc_absorb(&state, buf, K * POLYW1_PACKEDBYTES);
    shake256_inc_finalize(&state);
    shake256_inc_squeeze(c2, SEEDBYTES, &state);
    shake256_inc_ctx_release(&state);
    for (i = 0; i < SEEDBYTES; ++i) {
        if (c[i] != c2[i]) {
            return -1;
        }
    }

    return 0;
}

/*************************************************
* Name:        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_open
*
* Description: Verify signed message.
*
* Arguments:   - uint8_t *m: pointer to output message (allocated
*                            array with smlen bytes), can be equal to sm
*              - size_t *mlen: pointer to output length of message
*              - const uint8_t *sm: pointer to signed message
*              - size_t smlen: length of signed message
*              - const uint8_t *pk: pointer to bit-packed public key
*
* Returns 0 if signed message could be verified correctly and -1 otherwise
**************************************************/
int PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_open(uint8_t *m,
        size_t *mlen,
        const uint8_t *sm,
        size_t smlen,
        const uint8_t *pk) {
    size_t i;

    if (smlen < PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES) {
        goto badsig;
    }

    *mlen = smlen - PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES;
    if (PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign_verify(sm, PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES, sm + PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES, *mlen, pk)) {
        goto badsig;
    } else {
        /* All good, copy msg, return 0 */
        for (i = 0; i < *mlen; ++i) {
            m[i] = sm[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES + i];
        }
        return 0;
    }

badsig:
    /* Signature verification failed */
    *mlen = (size_t) -1;
    for (i = 0; i < smlen; ++i) {
        m[i] = 0;
    }

    return -1;
}

Thank you btw for helping me!

Jason C. McDonald • Apr 6 '21 • Edited

The problem isn't always at the end of the stack trace, especially where memory management is involved. I don't think the library is at fault.

Actually, I think the problem is somewhere in Signature.cpp. Can you post the function in that file that includes line 73?

==6225==    by 0x1096D2: Signature::signMessage(unsigned char*, unsigned long*, unsigned char const*, unsigned long) (Signature.cpp:73)

Lindenouwen • Apr 7 '21

Okay, The Signature.cpp function in which this happens is a rather short one:

#include "Signature.hpp"
#include <cstdint>
#include <iostream>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cstddef>
#include <cryptopp/sha3.h>
#include <cryptopp/cryptlib.h>
#include <cryptopp/hex.h>
#include <cryptopp/filters.h>
#include <cryptopp/sha.h>
#include <cryptopp/hex.h>
#include <cryptopp/files.h>

using namespace CryptoPP;

extern "C"{
     #include "api.h"
    }
using namespace std;
/*************************************************
* Name:        signMessage
*
* Description: Computes the signed message
*
* Arguments:   - uint8_t *sm: pointer to output signed
*                             message, can be equal to m
*              - size_t *smlen: pointer to output length
*                               of signed message
*              - const uint8_t *m: pointer to message to be signed
*              - size_t mlen: length of message
*
* Extra information: -*sk: pointer to secret key
*
* No Return value
**************************************************/
void Signature::signMessage(uint8_t* signedMessage, std::size_t* smLength, const uint8_t* message, std::size_t mLength){
    // if (checkPasswd(passwd)==true){
        const uint8_t *sk = getSecretKeyAddress();
        cout << "I get here" << endl;
        PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign(signedMessage, smLength, message, mLength, sk);
        cout << "not here" << endl;
    // }
}

PQCLEAN_DILITHIUM5AES_CLEAN_crypto_sign is the c function from the library
So I think the problem might be even further back, in main.cpp, but I cannot figure out what I am doing wrong.
The main.cpp is written as follows:

#include "Signature.hpp"
#include "Signature.cpp"
#include <cstdint>
#include <iostream>
#include <cstddef>
#include <cstring>
using namespace std;
extern "C"{
     #include "api.h"
    }
int main(){
    char publicKey[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_PUBLICKEYBYTES];
    char secretKey[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_SECRETKEYBYTES];
    uint8_t* pk;
    pk = (uint8_t *) &publicKey;
    uint8_t* sk;
    sk = (uint8_t *) &secretKey;
    // string passwd = "admin";
    Signature sig(pk, sk);
    sig.generateKeyPair();
    cout << "I succeeded in generating a key pair." << endl;
    string mes = "I am checking if this works";
    size_t mlen = mes.length();
    char message[mlen+1];
    strcpy(message, mes.c_str());
    uint8_t* m;
    m = (uint8_t*) &message;
    mlen = sizeof(*m);
    char signedMessage[PQCLEAN_DILITHIUM5AES_CLEAN_CRYPTO_BYTES+mlen];
    uint8_t* sm;
    sm = (uint8_t*) &signedMessage;
    size_t smLength = sizeof(*sm);
    size_t* smlen;
    smlen = (size_t*) smLength;
    sig.signMessage(sm, smlen, m, mlen);
    cout << "I succeeded in signing a message." << endl;
    cout << *sm << endl;
    return 0;
}

Jason C. McDonald • Apr 7 '21 • Edited

I think I found it. The problem is in main.cpp:

    size_t smLength = sizeof(*sm);
    size_t* smlen;
    smlen = (size_t*) smLength;
    sig.signMessage(sm, smlen, m, mlen);

There are two problems here.

sizeof(*sm) gives you the length of the value pointed to by sm. This will not give you the length of the string, but rather the size of the first item in the string/array, being a character. In other words, the value here is 1, regardless of how long the string starting at pointer sm actually is. I'll come back to this.
Even if smLength were correct, (size_t*) smLength; is not going to give you the address to smLength for storing in the pointer. Instead, this is casting, or reinterpreting, the value, the integer 1 in this case, as a pointer. But what's at pointer 0x00000001? Who knows. This is known as a wild pointer.

If you look throughout your code, you'll see these same mistakes occur other times, such as with mlen.

Don't take this as a criticism, but I'm dubious about where you learned to use pointers. This is pretty much a royal hash. If you learned this from an online tutorial, example, or article, I'd strongly recommend avoiding it in the future. (Check to make sure you didn't just misunderstand first.)

The reason for the segfault is that wild pointer in #2. It's looking in the wrong place for a value, and it isn't finding it. You're lucky it failed with a segfault; it could have read in literally anything that was stored at that address in memory, whether it was coincidentally correct, subtly wrong, or absolutely bonkers. These sorts of bugs are really unpredictable like that.

Here's the operations you're misunderstanding in this code:

Getting a Pointer

To get the address of a value, for storing in a pointer, use the & operator:

size_t myValue = 42;
size_t* ptrToMyValue = &myValue;

Getting the length of a c-string (char array)

#include <string.h>

// All three of these are valid ways to declare a c-string
char* stringA = "Hi";
char stringB[] = "there";
char stringC[20] = "Lindenouwen";

size_t lengthA = strlen(stringA);  // 2
size_t lengthB = strlen(stringB);  // 5
size_t lengthC = strlen(stringC); // 11 (NOT 20!)

By the way...

One more pro-tip: stop using the using namespace std; trick. It's an antipattern; in production code, it's all too easy to lose track of what-comes-from-where. Namespaces exist to reduce that confusion, but using namespace negates that namespace. Most examples use it for brevity; good production code never does.

Instead, explicitly spell out namespace each time:

#include <iostream>
#include <string>

std::string greeting = "Hello, world!";
std::cout << greeting << std::endl;

Save yourself weeks of headaches and refactoring now. Never use using namespace again.

Lindenouwen • Apr 8 '21

Thanks for your help!
I think it works now :p

Jason C. McDonald • Apr 8 '21

Excellent!

You will want to get into the habit of always testing your code thoroughly in Valgrind and addressing everything it complains about; even memory leaks. Undefined behavior has a habit of hiding until the most inconvient and unexpected moment. Write tests for your code (you should be doing that anyway!), and then execute those tests both outside of and within Valgrind.

NatePolizogo • Nov 30 '19

Hey i write a program about TSP ant colony optimization and i get a segmentation fault when i compile and run the code, but when i debug it through gdb the program just runs flawlessly. What am i missing ?

Jason C. McDonald • Nov 30 '19 • Edited

You are dealing with what is called a Heisenbug, which is a bug, usually undefined behavior, whose behavior disappears when using debugging tools.

The first thing you should do is run the program through Valgrind (valgrind ./myprogram). Ideally, you should do this on the Debug version of your program (compiler flag -g). This may provide you information on what memory errors exist in your code, and where they are in the source. Fix everything Valgrind complains about.

However, if after doing that, you're still segfaulting, and even Valgrind can't pick up on any more errors, you're in for a bigger fight.

Start by reading my popular Stack Overflow Q&A Definitive List of Common Reasons for Segmentation Faults. This will attune your programming sense to what to look out for.

(I didn't include my personal favorite in that list: lambdas returning references can cause some particularly nasty undefined behavior.)

If you have an idea of when the segmentation fault occurs functionally, that can help you figure out what function(s) may be involved. If you can, try to create a Minimum Reproducable Example that has the segfault.
Print off the problem area of the code on paper. Desk check it with a red pen and a pad of paper. This means you act as the compiler, running the code mentally, and noting the value of each variable. I've caught a number of bugs this way.
If you're desperate, you can run the Release target of the program through Valgrind, although this will give you raw memory addresses instead of line numbers and file names. If you're very clever with a disassembler like Nemiver, and know how to read assembly code, you may be able to work backwards to isolate the problem. However, this is extremely hard; it will help a lot if you can do this with your Minimum Reproducible Example instead of the full program.

Good luck!

NatePolizogo • Dec 1 '19

I think I kinda located the problem but i cant understand why is this happening. As you can see at the image above i for some reason decides to be whatever value it want's despite the fact that it is in a for loop.

NatePolizogo • Dec 1 '19

thepracticaldev.s3.amazonaws.com/i...

Jason C. McDonald • Dec 1 '19

for some reason decides to be whatever value it want's despite the fact that it is in a for loop.

This means it is reading from uninitialized memory. Common reasons for this:

You declared a variable, or dynamically allocated memory, but never initialized the memory with a value.
You are using a pointer (or reference) to either a position in memory which has already been freed (dangling pointer/reference), or which has never been allocated (wild pointer/reference). This can happen with either the heap or the stack; it's not limited to dynamic allocation.
You are exceeding the boundaries of an array or string (buffer overrun).

bluhm-alexander • Sep 8 '18

Have you ever worked with the Motorola 68000. I really like that CPU. In your opinion do you think assembly language is still best for super low level hardware or do you think C is on par with assembly code?

Jason C. McDonald • Sep 8 '18

Ironically, I just added 68K Assembly to my list of languages to learn soon! I have a TI-89 calculator (Motorola 68000), and dearly want to play with it.

Up to this point, my assembly work has been largely limited to the X86 and X64 languages, in the context of Intel and AMD processors.

C is actually further up the stack than people think, and it isn't always the best choice for a given architecture. If you need total control, Assembly will always give that to you far and beyond any other language.

However, Assembly is also a pain in the butt (if an endearing one to certain classifications of nerds such as myself). If you have access to a higher level language that is reasonably optimized for that platform, and you don't need ultimate control, use it instead of Assembly.

In other words, "just because we can doesn't mean we should." If you can't make a reasoned argument for the language you're using, you're probably using the wrong language. :)

bluhm-alexander • Sep 11 '18

Thank you for the reply, I always value getting a second opinion. The reason I'm asking this question is because I am building a game on the Sega Genesis and I've been using A C compiler to do it.

So far it hasn't been an issue because the C compiler was built for the Sega Genesis and it has a lot of nifty features to take advantage of the hardware features such as DMA. More importantly it has sound drivers which are incredibly useful because I do not want to go around writing my own Sound Driver because I am not experienced with writing such a program.

I have recently run into a few short comings with the compiler. First and foremost being that the routines I've written in C don't seem to load as fast onto the screen as compared to Assembly.

I think I will compromise by writing my screen drawing routines in Assembly and then including them in my C code. I think that would be best for me because then I would have access to features in the C compiler as well as having access to the speed of Assembly. The problem is that I am not experienced with Assembly code. Fortunately for me, 68k assembly seems to be the easiest Assembly to learn.

By the way the C compiler I'm using is called SGDK (Sega Genesis Development Kit)

What do you think about mixing languages, is it something to be avoided?

Jason C. McDonald • Sep 12 '18

It really depends on the languages!

There's no trouble combining C and Assembly; ultimately, C is compiled down to Assembly, at which point any Assembly code you wrote outright is just inserted in. Then, the whole thing is assembled down to binary on that particular platform.

However, you can run into varying degrees of performance issues when mixing other languages. It has to be taken on a case-by-case basis.

Bravo on making a game for Sega Genesis! Keep us posted on dev.to how that goes.

I highly recommend picking up "Game Engine Architecture" by Jason Gregory. It addresses many of the issues you're facing, and hundreds more besides, from a C and C++ perspective. He even talks about console development.

shrivp1 • Jul 2 '21

Hi Jason,

If we encounter a segfault with error code 4 or any other such error code -

localhost kernel: [139154.090095] xxxxx_process[11909]: segfault at 21 ip 00007ff5704b5254 sp 00007ff556bbbb98 error 4 in libmpi_global-release.so[7ff56c832000+6eee000]

However we see no assertions/errors and no core dumps been generated, how do we go along to debug such issues ?

Core dump configuration been verified and is correct -
Limit Soft Limit Hard Limit Units
Max core file size unlimited unlimited bytes

also the flag to generate full cores is been enabled !!!!

Jason C. McDonald • Jul 2 '21

In almost all cases, it's very hard to debug a segfault without compiling the code in question with debug symbols (-g) and running it through Valgrind.

Core dumps are just snapshots of the raw memory when the program crashed, and will seldom provide any clues unless you are very familiar with the entire raw memory layout of your program.

shrivp1 • Jul 3 '21

Thank you for your response !!!
The problem here is we are facing this issue specifically in our client environment, we cannot reproduce this issue in house to try compile our code using -g or use valgrind by attaching it to a process

Using valgrind would add performance overhead in customer environment, so it's not a viable option, any other means to track this live on a client environment for a particular process ?

Jason C. McDonald • Jul 3 '21 • Edited

Off the top of my head, I don't know of any practical ways to debug a segfault in production like you describe. You could use logs and observations to determine what behavior(s) precede the segfault, and use that to focus in on part of your code base.

Meanwhile, your best bet would be to try and isolate what's different about their environment versus your test environment, and try and replicate it.

In any case, this won't be easy. This roadblock you're running into is exactly why it is so often said "if we can't reproduce it, we can't fix it".

shrivp1 • Jul 5 '21

Thank you Jason for the advise..
Will see if we can try to identify a diff in production env and in general..
Logs weren't much helpful to logically conclude in this case

Comment deleted

Jason C. McDonald • May 19 '20 • Edited

I'm not as well acquainted with assembly as I would like yet, but here's my first thought: are you absolutely certain of the size (in byes) of memory addresses on your machine? This looks like you're blowing past the end of program memory.

klapauciusisgreat • May 19 '20

Yes, I'm sure. However, I made some progress:

in the original 32 bit code, I had an instruction like:

        mov $cold_start,%esi    // Initialise interpreter.

that would work on macos (32 bit)

I found a 64 bit port somewhere that was instead using

    mov $cold_start,%rsi    // Initialise interpreter.

which is what I expected, but the apple clang assembler does not like this syntax because in 64 bit mode I have to use position independent addressing modes.

So I tried

mov cold_start(%rip),%rsi       // Initialise interpreter.

but it seems that derefences cold_start instead of just putting the address in. Using $cold_start(%rip) gives errors.

I guess I just don't understand the apple assembler syntax esp for 64 bit code. Looking ...

edA‑qa mort‑ora‑y • Sep 8 '18

Given that you're writing a fairly low-level, perhaps wait-free concurrent algorithm, it's possible that you have a bug that trips up about every millionth execution of the code. It's a race condition that corrupts a vital structure. Any attempt to trace upsets the condition leading to the error, thus making it go away. Your only choice is a tedious hand execution and logical reasoning.

My question is, how do you avoid throwing the computer out the window?

Jason C. McDonald • Sep 8 '18 • Edited

Avoid? I love tedious hand execution and logical reasoning! (No, seriously.) One of my absolute favorite things to do in programming is to print off the source, sit down with a pen, a hot beverage, a blank notebook, and a jazz soundtrack...and then spend the next hour or three just desk-checking the entire thing.

Mmmmmmmmmmmmmmmmmmmmmm, bliss. ^{,^}

Why do you think I specialized in memory management and undefined behavior? I ADORE it!

Now, if you don't have my particular mental condition, and actually don't enjoy desk-checking for Heisenbugs, my advice is this: get off the computer. Print off the source, cozy up in your favorite chair in a relaxing environment, and desk-check it.

Likai Liu • Sep 10 '18

I also recommend writing unit tests that makes the race condition more likely to happen. For example, if the code normally runs with < 10 threads, test it with 1000 threads. Sometimes code is well-behaved when the data entered are far apart, so try testing with consecutive values. If it's the opposite, test with random values.

What I learned over the years is that race-freedom is not composeable: code using several mutexes incorrectly could still suffer race condition, even though a single mutex is race-free on its own. When testing wait-free algorithms, start with very small primitives and gradually add onto it. And write plenty of assert() on the non-volatile local variables of the shared volatile variables the code might be using. When assert triggers under the debugger, you'll be able to see which invariants are violated in that snapshot.

AlejandroSilvestri • Jun 9 '19

Hi there, I am pleasantly surprised to discover this site.

Some people using my code had segmentation fault, I'm looking for a way to generate console output these people can copy and send me, so I can track down the issue.

When SEGFAULT happens, game is over. I'd like to capture this, inspect some variables and cout them before exiting, something like try catch. But I believe cout after SEGFAULT leads into undefined behaviour, so...

Any suggestion? Thank you.

Jason C. McDonald • Jun 9 '19

Unfortunately, it is not possible to "catch" a segfault, nor continue program behavior safely (if at all) after it has been raised. Therefore, you have to take the opposite approach, and log everything that happened leading up to the segfault.

You can also have your tester describe (or screen record, especially if it's a game) what happened leading up to the segmentation fault. Then, you should be able to replicate that on your own machine.

Mind you, "replication" won't necessarily mean you can recreate the segmentation fault itself, since it's one of an infinite number of possible behaviors in response to some illegal memory action your code is taking (ergo "it is legal for the compiler to make demons fly out of your nose"). That's what it meant by undefined behavior. However, by replicating the same steps as your tester while running the Debug build of the application (compiled with -g) under Valgrind, you should be able to catch the problem.

There is also a more proactive approach you can take, especially if you're using C++: modernize your code base. Refactor the code - by hand mind you, NOT by using find-and-replace or some other automated tool - to make use of smart pointers like std::unique_ptr and std::shared_ptr instead of raw pointers, new, and delete. This will eliminate most memory errors, since the smart pointers handle object lifetime and whatnot (formally known as RAII). Refactoring is not a "quick fix", but it's the most resilient fix.

AlejandroSilvestri • Jun 9 '19

Thank you very much. You confirmed my approach is right: cout everything!

In my specific case, my code is appended to third party code where the segfault happens, so it's hard to trace and I don't even have the chance to fix.

Jason C. McDonald • Jun 9 '19 • Edited

At the risk of self-promotion, I wrote something called IOChannel which is designed to better control cout-style logging, based on category and priority. You can also route messages to different places, including to functions that will write them out to a file instead of printing them to the console. It's part of PawLIB, which is still in development, but 1.0 is stable. (Yes, totally open source)

humanoidevolador • Jul 20 '19

Hi Jason, pleased to meet you.
I'd like to present you a question about C memory management, if you were so nice. In short, it's about a program that lets you input or random generate a set of 5 numbers between 1 and 50, and 2 numbers between 1 and 12 (EuroMillions lottery). Then it can keep rolling over and over (more than 139 million times on average) until it gets the same set of numbers. The issue is that, despite using dynamic allocation and de allocation for structures, as code executes it would eventually exhaust the RAM memory until the process gets killed by the system to avoid stall.
I've tried a lighter version of the code (just 5 numbers from 1 to 48, 1.7 million loops on average) with no problems, and ensured the code is actually recycling the memory used with each pass. Any ideas of what could be wrong with it?
Thanks in advance, regards.

Jason C. McDonald • Jul 20 '19

I'd really need to see some code to be able to debug this, but here's the first two things I'd look for:

Double check that things are actually deallocating; you'd be amazed at how often one thinks they have free'd memory when they haven't. You may be able to run it through Valgrind or another dynamic analyzer to check that. (It sounds like you've done that, though.)
You may have some other variable you didn't think about, either on the stack or on the heap.

If it wouldn't be too much trouble, can you put the code in a GitHub Gist or another paste bin? I might be able to catch the problem better if I read it.

humanoidevolador • Jul 20 '19

Thank you so much for your quick and kind response Jason! I will paste the code to GitHub and share here a.s.a.p, I'm a newbie to it, as well as to many other things. Will take your advice and check out what you suggested.
We'll keep on touch.
Regards!

radhika20padia • May 6 '21

Hi
I have a multi threaded C++ application. Its a read write case i.e. T1 write/mallocs a char* and T2 reads and takes actions according.
Can relloc result in segfault? I donot want to use free as that can result in segfault when the other thread tries to read. Does relloc also comes with the same uncertainty for segfault?

Jason C. McDonald • May 6 '21

First, the deeper problem: if you're using C++, why are you using realloc? This is C-style memory management, which certainly has its place, even in modern C++ code...but very very rarely. In almost all cases, it's better to use std::shared_ptr or std::unique_ptr.

That said, realloc is perhaps one of the few reasons to use C-style allocation, but only if you need it for optimization purposes. (Remember, premature optimization is the root of all evil!)

For the rest of the answer, I'll assume you have a very good, well-reasoned argument for using realloc and C-style memory management in C++. If you don't, stop right now and rethink your code in terms of modern C++ memory management. It's treacherous using free/malloc/calloc/realloc, and even new/delete, without having a very clear and well-defined architectural reason.

realloc comes from the C language, so the best way to find out what it does, and if there is any potential undefined behavior, is to check the official standard.

(You can purchase the official standard, but if you're anything like me, you don't want to drop nearly a hundred quid just to read the thing; you can just read the final draft instead, which is pretty much the same but for a few minor corrections. You can find the official PDF here.)

So, here's a few important pieces from 7.22.3.5...

The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size.

There's the answer to your first question. realloc will free the old pointer, the same as free does. Thus, if you access/dereference that freed pointer after calling realloc on it, the behavior is undefined. A segfault is just one possible form of undefined behavior.

The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.

This is also an area of potential weirdness! If you realloc with a larger size, the new space might contain literally anything. In other words, if you start with an array of four integers, and realloc it to an array of eight integers, indices [3] through 7 is not guaranteed to be set to 0, but could be anything. Thus, you'll have to be careful to initialize the new values in those spaces before reading from them.

If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size.

From this, we know that passing a null pointer to realloc is not going to be a problem; it'll just behave like malloc would. Good to know.

Otherwise, if ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free or realloc function, the behavior is undefined.

Don't pass a dangling or wild pointer to realloc; that would result in undefined behavior, such as (but not necessarily being) a segmentation fault.

If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.

This is interesting! If you pass a pointer to realloc, but it cannot reallocate for any reason, it will not free the pointer. How do we know if it could reallocate? Read on...

The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.

If there's a problem allocating, realloc will return a null pointer, indicating that the pointer we passed in was not freed, and thus is still valid and safe to use.

Also, important note about assumptions here: the new, reallocated pointer might actually be the same as the old pointer. Or it might not.

All that said, you should use realloc with the following assumptions:

You can pass a valid pointer, or a null pointer, to realloc.
After reallocation, any non-null pointer passed to realloc will be freed.
If realloc returns a null pointer, it couldn't reallocate, and you can keep using the pointer you passed in (it was not freed.)
If realloc returns a pointer, it's valid, regardless of whether it seems to be the same as the one you passed in.

radhika20padia • May 8 '21

Thanks for the response. I solved the problem in a very simple manner.
Your detailed reponse really helped me think in the right direction.

View full discussion (162 comments)

DEV Community

I'm an Expert in Memory Management & Segfaults, Ask Me Anything!

Top comments (161)

include "Signature.hpp"

include

include

Getting a Pointer

Getting the length of a c-string (char array)

By the way...

Read next

How to build: a v0.dev clone (Next.js, GPT4 & CopilotKit)

Can someone help me with the coding here?

Difference between technology and methodology - Dgi Host.com

Soft Delete using Entity Framework Core