DEV Community

Dmitry Daw
Dmitry Daw

Posted on • Edited on

How to fix a segfault in Ruby

Let's say you got a "Segmentation fault" error in Ruby

[BUG] Segmentation fault at 0x0000000000000028
ruby 3.4.0preview1 (2024-05-16 master 9d69619623) [x86_64-linux-musl]

-- Machine register context ------------------------------------------------
 RIP: 0x00007fefe4cd4886 RBP: 0x0000000000000001 RSP: 0x00007fefc95d3a10
 RAX: 0x0000000000000001 RBX: 0x00007fefc94212e0 RCX: 0x00007fefc95d0b70
 RDX: 0x0000000000000010 RDI: 0x0000000000000000 RSI: 0x00007fefc95d08f0
  R8: 0x0000000000000000  R9: 0x0000000000000000 R10: 0x0000000000000000
 R11: 0x0000000000000217 R12: 0x00007fefc9421340 R13: 0x00007fff5a0ec750
 R14: 0x00007fefe4649b10 R15: 0x00007fefc95d3b38 EFL: 0x0000000000010202

-- Other runtime information -----------------------------------------------

...
Enter fullscreen mode Exit fullscreen mode

0x0000000000000028 near zero points out that something is NULL, but that's not much. To get more info, you could run your program under gdb

/app # gdb -q --args ruby test.rb

(gdb) 
Enter fullscreen mode Exit fullscreen mode

Here we are in a debugger. To run your program write run

(gdb) run
Starting program: /usr/local/bin/ruby test.rb
warning: Error disabling address space randomization: Operation not permitted
[New LWP 36]
[New LWP 37]
[New LWP 38]
execution expired

Thread 4 "ruby" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 38]
0x00007f0a2c33b886 in freeaddrinfo (p=0x0) at src/network/freeaddrinfo.c:10
warning: 10     src/network/freeaddrinfo.c: No such file or directory
Enter fullscreen mode Exit fullscreen mode

Okay, we see there is something with system's src/network/freeaddrinfo.c.
Let's get backtrace and check variables

(gdb) bt
#0  0x00007f0a2c33b886 in freeaddrinfo (p=0x0) at src/network/freeaddrinfo.c:10
#1  0x00007f0a10c1e940 in do_getaddrinfo (ptr=0x7f0a10f61200) at raddrinfo.c:426
#2  0x00007f0a2c35c349 in start (p=0x7f0a10afaa88) at src/thread/pthread_create.c:207
#3  0x00007f0a2c35e95f in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC

(gdb) info args
p = 0x0
Enter fullscreen mode Exit fullscreen mode

Okay, now we see that the problem comes from ruby's raddrinfo.c, and there is argument p that is NULL.

We could also go up in the stack, and check variables

(gdb) info locals
cnt = 1
b = <optimized out>

(gdb) frame 1
#1  0x00007f1068ec6940 in do_getaddrinfo (ptr=0x7f1068cf0c40) at raddrinfo.c:426
warning: 426    raddrinfo.c: No such file or directory

(gdb) info args
ptr = 0x7f1068cf0c40

(gdb) info locals
arg = 0x7f1068cf0c40
err = <optimized out>
gai_errno = <optimized out>
need_free = 0
Enter fullscreen mode Exit fullscreen mode

Now we're prepared to look what is happening in the code. Lets check raddrinfo.c, line 426

// ext/socket/raddrinfo.c
...
        if (arg->cancelled) {
            freeaddrinfo(arg->ai);
        }
...
Enter fullscreen mode Exit fullscreen mode

Indeed freeaddrinfo is called.
Now could make a bug in https://bugs.ruby-lang.org/, or try to debug it by yourself.

I've tried :)

We're on Alpine, on ruby:3.3.3-alpine. And on different system, e.g. ruby:3.3.3 is all okay, so it should be something with Alpine.

Some search tells us that indeed: freeaddrinfo in Alpine's musl library does not accept NULL pointer(link), in difference with glibc(which is used e.g. in Ubuntu)(link)

So let's fix it.

Firstly we need to build ruby. For convenience lets create a small Dockerfile

FROM alpine:3.20

WORKDIR /usr/src/app

RUN apk update && apk add autoconf gcc build-base ruby ruby-dev openssl openssl-dev yaml-dev zlib-dev yaml gdb

CMD sh
Enter fullscreen mode Exit fullscreen mode

Clone ruby

$ git clone --depth 1 git@github.com:ruby/ruby.git
$ cd ruby
Enter fullscreen mode Exit fullscreen mode

Go inside our alpine docker container

$ docker build -t my-ruby-develop .
$ docker run -it --rm -v $(pwd):/usr/src/app -w /usr/src/app  my-ruby-develop sh
Enter fullscreen mode Exit fullscreen mode

And build the latest ruby version - to check the bug is still present.

$ ./autogen.sh
$ mkdir build && cd build
$ mkdir rubies
$ ../configure --prefix="/usr/src/myapp/build/rubies/ruby-master" && make && make install
Enter fullscreen mode Exit fullscreen mode

Now we have our latest ruby in /usr/src/myapp/build/rubies/ruby-master folder.

Let's check the problem is still exist

/app # /usr/src/app/build/rubies/ruby-master/bin/ruby test.rb
Operation timed out - user specified timeout
[BUG] Segmentation fault at 0x0000000000000028

-- Machine register context ------------------------------------------------
 RIP: 0x00007f561acf6886 RBP: 0x0000000000000001 RSP: 0x00007f55ff5d2a10
 RAX: 0x0000000000000001 RBX: 0x00007f55ff43ff30 RCX: 0x00007f55ff5cfb70
 RDX: 0x0000000000000010 RDI: 0x0000000000000000 RSI: 0x00007f55ff5cf8f0
  R8: 0x0000000000000000  R9: 0x0000000000000000 R10: 0x0000000000000000
 R11: 0x0000000000000217 R12: 0x00007f55ff43ff90 R13: 0x00007f55ff236040
 R14: 0x00007f55ff236b38 R15: 0x00007f55ff5d2b38 EFL: 0x0000000000010202
Enter fullscreen mode Exit fullscreen mode

It is for sure.

Then we need to

  • check in which case the problem is happening
  • find if it should be changed inside Alpine or Ruby
  • are there other places that could be related to the same problem
  • make a reproducible example

etc etc - typical engineering work.

In our case, it should be changed inside ruby. Let's make the change

// ext/socket/raddrinfo.c
...
        if (arg->cancelled) {
            if (arg->ai) freeaddrinfo(arg->ai);
        }
...
Enter fullscreen mode Exit fullscreen mode

Make the ruby(with make clean first) and try again. It works!

/app # make clean && ../configure --prefix="/usr/src/myapp/build/rubies/ruby-master" && make && make install
/app # /usr/src/app/build/rubies/ruby-master/bin/ruby test.rb
Good
Enter fullscreen mode Exit fullscreen mode

Great! Now we can run the tests, for related file, and the whole set

$ make test-all TESTS=../test/socket/test_addrinfo.rb
$ make test-all
$ make test-spec
Enter fullscreen mode Exit fullscreen mode

And if possible - write a test for your change(in this case it is hard to write a reliable test because of getaddrinfo internals).

Don't forget to describe a bug in ruby's bugtracker, and attach all useful info(e.g. https://bugs.ruby-lang.org/issues/20592)

And voila! You made the world a bit better https://github.com/ruby/ruby/commit/fba8aff7af450e476e97b62385427dfa51850955

Links

Top comments (0)