Running Linux Programs as Unikernels on macOS

#unikernels #rust #blockchain #linux

Someone emailed me the other day and was having some trouble getting
their program to work under OPS.

He had downloaded a binary release from github and knowing that OPS only runs linux binaries chose the linux version. Further, there are numerous
examples of people running linux programs under ops on osx. Hell, just do a

ops pkg list

for a short list.

However, for some reason it just wasn't working for him.

user@users-MacBook-Pro Desktop % ops run solana-validator    
*errors.errorString libstdc++.so.6: file does not exist
/Users/eyberg/go/src/github.com/nanovms/ops/lepton/ldd_darwin.go:100 (0x4c55779)
/Users/eyberg/go/src/github.com/nanovms/ops/lepton/ldd_darwin.go:129
(0x4c5607d)
/Users/eyberg/go/src/github.com/nanovms/ops/lepton/image.go:261
(0x4c52d0e)
/Users/eyberg/go/src/github.com/nanovms/ops/lepton/image.go:24
(0x4c50f5f)
/Users/eyberg/go/src/github.com/nanovms/ops/cmd/run.go:15 (0x4ca4189)
/Users/eyberg/go/src/github.com/nanovms/ops/cmd/run.go:118 (0x4ca4f75)
/Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
(0x4c8bf3e)
/Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
(0x4c8cb5b)
/Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
(0x4ca8d88)
/Users/eyberg/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
(0x4ca8d83)
/usr/local/go/src/runtime/proc.go:200 (0x402f57c)
  main: return
/usr/local/go/src/runtime/asm_amd64.s:1337 (0x405a671)
  goexit: GLOBL shifts<>(SB),RODATA,$256

panic: libstdc++.so.6: file does not exist

After I saw the word mac and then libstdc++ I knew immediately what was wrong but I also realized that it might not be readily apparent to many people and so this is what this blogpost is about.

The binary in question is dynamically linked. What this means is that
there are several libraries that the dynamic loader (ld) will try and
load at runtime versus statically linked where all the libraries are
packaged inside the binary. This leads to a fatter binary but you can be
assured that everything is present when needed. I won't jump into a
static/dynamic flamewar as they each have their own usecases.

I should also point out that normally you can't run linux binaries directly on a mac - this is actually a uniquely interesting unikernel aspect as we don't need to boot up vagrant or a linux vm. Traditional virtualization software virtualizes the operating system. You can see using a tool like OPS as virtualizing the application instead.

One of the reasons I decided to write this post was not just the email but I've noticed in the past few years I've seen quite a lot of people that use containers and go to refer to their programs as statically linked when it is in fact absolutely not.

Let's look at this example go webserver:

package main

import (
    "fmt"
    "net/http"
)

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Welcome to my website!")
    })

    fs := http.FileServer(http.Dir("static/"))
    http.Handle("/static/", http.StripPrefix("/static/", fs))

    http.ListenAndServe(":8080", nil)
}

Fairly simple - all it does is listen on 8080 and serve up requests. If we compile it:

go build

We see that by default it will be dynamically linked against a few libs:

eyberg@box:~/z$ ldd z
        linux-vdso.so.1 (0x00007ffe8b1db000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2e091f4000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2e08e03000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2e09413000)
eyberg@box:~/z$ file z
z: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, not stripped

VDSO gives us a fast clock. Libpthread gives us threads. Libc contains everything you'd find in section 3 of the man pages and finally LD is our loader.

You can link your go programs statically via something like this:

eyberg@box:~/z$ go build  -buildmode=pie -ldflags "-linkmode external -extldflags -static"
# _/home/eyberg/z
/tmp/go-link-057347370/000004.o: In function `_cgo_7e1b3c2abc8d_C2func_getaddrinfo':
/tmp/go-build/cgo-gcc-prolog:57: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
eyberg@box:~/z$ ldd z
        not a dynamic executable
eyberg@box:~/z$ file z
z: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=0671d22e52baf812ed9f500a8d9ee7848b8c809e, not stripped

Both ldd && file now report that it is statically linked. However, you'll notice (at least in go 1.12) it now outputs a message stating that getaddrinfo is still required! That's cause even though ldd output and file is telling us that it is indeed static we still rely on libnss for dns. You can force go to use its built in resolver instead.

Getaddrinfo is found here:

eyberg@box:~/s/solana-release/bin$ readelf --dyn-syms /lib/x86_64-linux-gnu/libc-2.27.so  | grep getaddr
   873: 0000000000107bc0  3261 FUNC    GLOBAL DEFAULT   13 getaddrinfo@@GLIBC_2.2.5
  1601: 00000000001413c0    43 FUNC    GLOBAL DEFAULT   13 inet6_rth_getaddr@@GLIBC_2.5

You'll note that your libraries are probably stripped which means you can't use something like nm to find the symbols but readelf surfaces them up in the .dynsym section.

Also, even that is not enough - you'll probably make use of various functions found in these files as well:

eyberg@box:~/z$ ls /lib/x86_64-linux-gnu/libnss_files.so.2
/lib/x86_64-linux-gnu/libnss_files.so.2
eyberg@box:~/z$ ls /lib/x86_64-linux-gnu/libnss_dns.so.2
/lib/x86_64-linux-gnu/libnss_dns.so.2

The point here is that just sticking a go program in a container does not inherently make your binary 'statically linked' and the problem is that, that is not what a lot of people have been stating and this source of
confusion is probably what leads to misunderstandings such as this.

So let's go back to our example at the beginning of the article.

The program in question is the solana blockchain.

You can download and unzip like so:

wget https://github.com/solana-labs/solana/releases/download/v0.23.2/solana-release-x86_64-unknown-linux-gnu.tar.bz2
bunzip2 solana-release-x86_64-unknown-linux-gnu.tar.bz2
tar xf solana-release-x86_64-unknown-linux-gnu.tar

You can see from the ldd output that it is trying to load a library that is dynamically linked (libstdc++) and that's what our error in the trace was.

eyberg@box:~/s/solana-release/bin$ ldd solana
        linux-vdso.so.1 (0x00007ffe338df000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f06976a8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f06984a2000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f06974a4000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f069729c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f069707d000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0696e65000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0696ac7000)

As you can see this program works out of the box on linux.

eyberg@box:~/s/solana-release/bin$ ops run -c config.json solana-validator
[solana-validator --ledger /]
booting /home/eyberg/.ops/images/solana-validator.img ...
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
assigned: 10.0.2.15
solana-validator 0.23.2
log file: solana-validator-7wgZy3ozTRyvLtRN5o3Xh6m5aEvEo2XBxP5yPfaCny1t-20200210-204507.log

eyberg@box:~/s/solana-release/bin$ cat config.json
{
  "Args": ["solana-validator", "--ledger", "/"]
}

This is all fine but if you want to run it on osx you'll need to do one
of two things:

1) Either build from source and statically link it (so ldd doesn't show
output).

2) Download the libraries it's linked to and manually add them to the
filesystem.

The reason this works out of the box on linux is that OPS looks at and
try to load the libraries as we build the disk image. This isn't
possible on a mac cause mac uses mach-o which is a different format that Nanos does not and probably won't ever support as nanos is explicitly designed to run linux server-side programs.

For example when we build common packages like nginx or node, you'll see from the ops output that all required libraries are placed on the filesystem in a known location.

➜  ~    ops pkg contents node_v13.6.0
File :/node
File :/package.manifest
Dir :/sysroot
Dir :/sysroot/lib
Dir :/sysroot/lib/x86_64-linux-gnu
File :/sysroot/lib/x86_64-linux-gnu/libc.so.6
File :/sysroot/lib/x86_64-linux-gnu/libdl.so.2
File :/sysroot/lib/x86_64-linux-gnu/libgcc_s.so.1
File :/sysroot/lib/x86_64-linux-gnu/libm.so.6
File :/sysroot/lib/x86_64-linux-gnu/libnss_dns.so.2
File :/sysroot/lib/x86_64-linux-gnu/libnss_files.so.2
File :/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
Dir :/sysroot/lib64
File :/sysroot/lib64/ld-linux-x86-64.so.2
Dir :/sysroot/proc
File :/sysroot/proc/meminfo
Dir :/sysroot/usr
Dir :/sysroot/usr/lib
Dir :/sysroot/usr/lib/x86_64-linux-gnu
File :/sysroot/usr/lib/x86_64-linux-gnu/libstdc++.so.6

Without getting into the semantics of building your own package which you can find here the easiest way to do this with your own app is in your project directory you can create a directory: