Hey,
Another day I was trying to make sure that a given process that I was running was using a specific binary that I had built, but I couldn’t figure out: ps
would only show me the non-absolute path.
# How could I know what is the absolute path of the
# `hugo` binary, assuming that I could have multiple
# `hugo` binaries in `$PATH`?
ps
PID TTY TIME CMD
4153 ttys000 5:14.98 hugo serve <<<
9035 ttys001 0:00.04 /Applications/iTerm.app/Content...
9037 ttys001 0:00.10 -bash
9086 ttys001 0:02.27 /usr/local/Cellar/macvim/8.1-15...
9236 ttys002 0:00.04 /Applications/iTerm.app/Content...
9238 ttys002 0:00.10 -bash
If I were using Linux though, I thought, that’d be easy: head to /proc
, search for the pid
of the process and then check what exe
links to; done.
# (on a Linux machine ...)
#
# See `hugo` will still not show up with the absolute path
# like on MacOS.
ps aux | grep hugo
ubuntu 2275 0.0 0.0 101852 748 pts/0 Sl+ 00:26 0:00 hugo serve
# Given that the proc filesystem can provide us with some
# more information about the process, check out the `exe`
# link (which should provide a link to the actual executable).
stat /proc/2275/exe
File: /proc/2275/exe -> /usr/local/bin/hugo
Size: 0 Blocks: 0 IO Block: 1024 symbolic link
Device: 4h/4d Inode: 140106 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1001/ ubuntu) Gid: ( 1001/ ubuntu)
Access: 2018-09-24 00:26:37.167004005 +0000
Modify: 2018-09-24 00:26:27.391004005 +0000
Change: 2018-09-24 00:26:27.391004005 +0000
Birth: -
In this post, I go through how we can gather such information on a MacOS, and what the procfs
in Linux is all about.
tl;dr: /proc
on Linux is dope; on MacoS: compile a little code that uses proc_pidpath
from libproc
, or install pidpath
.
The /proc filesystem in Linux
In “Linux land”, there’s this thing called “procfs”.
It’s a virtual filesystem - in the sense that there are no real regular files in your disk that map to the filesystem representation - that allows a user (in userspace) to perform some introspection about its current running process and others as well.
From the kernel docs:
The proc file system acts as an interface to internal data structures in the kernel.
It can be used to obtain information about the system and to change certain kernel parameters at runtime (sysctl).
The way the interaction with it is set up is pretty nifty:
- each process receives a given path under
/proc
(like,/proc/<pid>
), and then - as subdirectories of this path, various other files and subdirectories are present to allow deeper introspection about the specific pid.
# Display the files and directories present at the
# very root of `/proc`.
#
# Here we can find the list of PIDs that we can access,
# as well as some more system-wide information and
# settings that we can tweak.
ls -lah /proc
total 4.0K
dr-xr-xr-x 124 root root 0 Sep 24 23:56 .
drwxr-xr-x 24 root root 4.0K Sep 24 23:57 ..
dr-xr-xr-x 9 root root 0 Sep 24 23:56 1
dr-xr-xr-x 9 root root 0 Sep 25 00:54 2016
dr-xr-xr-x 9 root root 0 Sep 24 23:57 417
...
-r--r--r-- 1 root root 0 Sep 25 01:25 sched_debug
-r--r--r-- 1 root root 0 Sep 25 01:25 schedstat
dr-xr-xr-x 4 root root 0 Sep 25 01:25 scsi
lrwxrwxrwx 1 root root 0 Sep 24 23:56 self -> 2574
...
# Getting into a specific pid path, we're able to
# gather more information about the specifics of
# a given process.
ls -lah /proc/472
total 0
dr-xr-xr-x 9 root root 0 Sep 24 23:57 .
dr-xr-xr-x 123 root root 0 Sep 24 23:56 ..
...
lrwxrwxrwx 1 root root 0 Sep 25 01:30 cwd -> /
-r-------- 1 root root 0 Sep 25 01:30 environ
lrwxrwxrwx 1 root root 0 Sep 24 23:57 exe -> /lib/systemd/systemd-udevd
dr-x------ 2 root root 0 Sep 24 23:57 fd
lrwxrwxrwx 1 root root 0 Sep 25 01:30 root -> /
-rw-r--r-- 1 root root 0 Sep 25 01:30 sched
...
Not only that, procfs
is very helpful when you’re not sure if a process is blocked on something you didn’t expect (like a write(2)
to an nfs
mount point that is malformed due to a bad set of servers not responding), or something simple as your process sleeping when you didn’t want to:
# Sleep for 33 days on the background
sleep 33d &
[1] 2786
# Check what's the state of the process
cat /proc/2786/stat
2786 (sleep) S ...
| | |
| | `-> state (interruptible sleep)
| `-> command being run (sleep command)
`-> process id (the pid we used before)
# Check what's the stack trace (from the kernel
# perspective) that led the process to this
# sleep state
cat /proc/2786/stack
[<0>] hrtimer_nanosleep+0xd8/0x1d0
[<0>] SyS_nanosleep+0x72/0xa0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff
procfs under the hood
What’s interesting about being virtual is that the implementation of procfs
is able to generate the representation of the filesystem on the fly - whenever you issue an I/O call like read(2)
, Linux answers back with what you asked for, be it the list of file descriptors opened by a given process, or the list of environment variables that were set at process startup time.
For instance, if tracing the execution of cat /proc/<pid>/meminfo
down, we can find the path that the read(2)
syscall takes:
# stack trace of `cat /proc/<pid>/meminfo`
meminfo_proc_show
proc_reg_read
__vfs_read
vfs_read
sys_read
do_syscall_64
entry_SYSCALL_64_after_hwframe
# stack trace of `cat /file.txt`
# on an ext4 mount point
ext4_file_read_iter
__vfs_read
vfs_read
sys_read
do_syscall_64
entry_SYSCALL_64_after_hwframe
Very different from a regular read
(as shown in the second stack trace), there’s no real file on disk being accessed - just meminfo_proc_show
returning the contents related to what the user asked for: virtual memory stuff.
By the way, if you’re interested in knowing more about related subjects, a great reference for this type of knowledge is The Linux Programming Interface: A Linux and UNIX System Programming Handbook.
Now to MacOS.
The libproc library in MacOS
Differently from Linux, it feels like we can’t know all that much about how things work on MacOS.
After searching a bit on how to accomplish how to gather information about a process, libproc
showed up.
As mentioned in libproc.h
:
/*
* This header file contains private interfaces
* to obtain process information.
*
* These interfaces are subject to change in future releases.
*/
One thing to note:
- the interfaces are private - no guaranteed compatibility with future releases.
This has been elucidated by an Apple staff member on post at the Apple’s developer forum regarding gathering process information:
[…] Apple has not put a lot of effort into providing APIs for getting this sort of information.
What APIs that do exist were either inherited from OS X’s predecessor OSs or were added primarily to meet our internal requirements rather than the needs for third-party developers.
Thus, you will find a lot of places where these APIs are: incomplete; incorrect; poorly documented and aren’t as binary compatible as they should be.
Anyway, we can still make use of it - more specifically, we can make use of proc_pidpath
, a method that takes a pid
(the pid
of the process that we want to know more about), a buffer where the path should be written to, and the buffer size.
int proc_pidpath(
int pid, // pid of the process to know more about
void * buffer, // buffer to fill with the abs path
uint32_t buffersize // size of the buffer
);
That said, we can go ahead and create our Go binary that can handle both Linux and MacOS by specifying two different compilation targets.
A Golang binary that suits Linux and MacOS
Given that libproc
will not be a thing under Linux, we can start by creating a pidpath_linux.go
file that is meant to be compiled only on Linux, and another file, pidpath_darwin.go
, aimed at MacOS machines.
The Linux one is rather simple: it follows the /proc/<pid>/exe
symlink, and that’s it:
// +build linux
package main
import (
"os"
"strconv"
)
func GetExePathFromPid(pid int) (path string, err error) {
path, err = os.Readlink("/proc/" + strconv.Itoa(pid) + "/exe")
return
}
The MacOS version though, needs a little bit more.
Given that we’d access libproc
via C, we can leverage CGO.
// +build darwin
package main
// #include <libproc.h>
// #include <stdlib.h>
// #include <errno.h>
import "C"
import (
"fmt"
"unsafe"
)
// bufSize references the constant that the implementation
// of proc_pidpath uses under the hood to make sure that
// no overflows happen.
//
// See https://opensource.apple.com/source/xnu/xnu-2782.40.9/libsyscall/wrappers/libproc/libproc.c
const bufSize = C.PROC_PIDPATHINFO_MAXSIZE
func GetExePathFromPid(pid int) (path string, err error) {
// Allocate in the C heap a string (char* terminated
// with `/0`) of size `bufSize` and then make sure
// that we free that memory that gets allocated
// in C (see the `defer` below).
buf := C.CString(string(make([]byte, bufSize)))
defer C.free(unsafe.Pointer(buf))
// Call the C function `proc_pidpath` from the included
// header file (libproc.h).
ret, err := C.proc_pidpath(C.int(pid), unsafe.Pointer(buf), bufSize)
if ret <= 0 {
err = fmt.Errorf("failed to retrieve pid path: %v", err)
return
}
// Convert the C string back to a Go string.
path = C.GoString(buf)
return
}
That done, we can now consume GetExePathFromPid
in our application.
To see that in place, check out cirocosta/pidpath.
Closing thoughts
It was interesting to me to check out how different things are in MacOS land.
Although I use a Macbook Pro as a personal computer (and a Mac at work), I’ve not really paid attention to these little details.
Also, /proc
is just so valuable! Definitely worth knowing more about other functionality over there. Make sure you check out The Linux Programming Interface: A Linux and UNIX System Programming Handbook.
If you have any questions, or suggestions to improve this blog post, please let me know! I’m cirowrc, and I’d love to chat.
Have a good one!
Top comments (0)