Hey there. Sorry for the delay after parts 1 and 2. I've been busy doing some changes, will make a 4th article about those.
As said before, we'll be running this init on a qemu VM.
While running it, a thing I have noticed (running getty only) is that we're used to udev
where all things Just Work™️ when plugged. systemd
has involved into all those things contributing to create mysticism around the PID 1 and what it does.
No more theory, let's just dive inside this thing.
For the test I used a VM with root on 9p (could be NFS, but was just another test) so the dev and test env are shared and I can push changes and reboot to get them (at first I did poweroff, mount, copy, unmount and run it again).
I did test on 2 chroots, one with Arch Linux and another one with Void (it's init-agnostic so is easier to get things done). Also compiled a kernel to have all virtio modules builtin (Arch doesn't have a working udev and didn't bother to get one so this is the bruteforce solution).
Compiling
Source is here: https://gitlab.com/mrvik/go-pid1
Go has one of the fastest compilers, so you can't slack off.
We're using make
to manage all the targets so you need:
- make (tested with GNU Make)
- go toolchain (package is called go or golang depending on the distro)
Then make all
or simply make
and done, you have an out
dir with all the utils statically linked (as they are in pure Go).
You should copy them to your VM under /usr/local/bin
to avoid conflicts and link init
and service-launcher
to root (service launcher path is hardcoded, this should be changed).
There is a folder on project called examples
, copy it as /etc/yml-services
so we have something to talk about.
Void w/ go on PID 1
Here is my qemu
cmdline:
qemu-system-x86_64 -accel kvm \
-kernel ../linux-5.10.4/arch/x86/boot/bzImage \
-initrd root/boot/initramfs-linux-fallback.img \
-append "root=192.168.1.10 rootfstype=9p rootflags=aname=$PWD/void,msize=16000,version=9p2000.L,uname=root,access=user rw init=/init ip=dhcp loglevel=3" \
-m 1024 -nic user,model=virtio-net-pci -vga qxl
- kernel is from the compilation dir (latest stable release at the time of preparing this). Did also test with mine (vanilla
linux
package from Arch) and worked, so Void's should also work. - initrd comes from my system (yes, kernel version doesn't match, but it doesn't matter as all modules are already built in). It was created using
mkinitcpio
with thenet
hook to enable networking on the early userspace. Maybe this could be done withdracut
but, whatever, it works. - append has a lot of joy. I'm using
v9fs
as root so all those arguments are for that purpose andip=dhcp
does what you already know. The most important thing here isinit=/init
as it's the location where I did copy theinit
target. Settingloglevel
is important to avoid being flooded with kernel logs. - Other boring things are the memory, a virtio net device (needed for 9p on root) and the VGA model.
And here we are! All lines printed by go-pid1 have the same format (from log
package). We can login and see what's going on here:
htop
shows the current hierarchy, every process is child of service-launcher
(who has started everything present on /etc/yml-services
ending with .yml
or .yaml
. dhclient
is forking, so he's now child of PID 1 (see next article on how do we manage this now).
The control socket
On Part 2 we mentioned service-launcher
exposes a socket so we can control daemons and a utility called slc
to connect to the socket w/o openbsd-netcat installed (-U
option support).
List contains currently loaded services and their state. It's worth noting that dhclient
shows as not started (not running) as this process has forked and main process exited. We cannot track it's descendents (see next article to track progress on this).
We can use the restart
command (also the start
and stop
) with a service to make the needed actions.
Also, the journal
command shows logs for the specified service (we face a lot of "operation not permitted" errors w/ root on 9p).
And it works. The socket can potentially run or expose any function from the service-launcher.
Run init systems for fun and... fun
As we're not dropping capabilities nor doing any type of confining, we can virtually execute all processes.
The only limitation is the PID. Init must be PID 1, but it's already coped.
pid_namespaces(7)
to the rescue! Look at cmd/run-on-pid-ns
. It will create a new pid_namespace(7)
, time_namespace(7)
and mount_namespace(7)
. The time namespace is just to isolate uptime inside the namespace. Mount namespace allows us to have a different procfs and whatever mounts does the other init system.
Let's try it. The examples/systemd.yml.disabled
service starts /sbin/init
using run-on-pid-ns
. It's disabled by default, you may want to disable all other services (at least agetty to avoid clashes).
We're now on Void so let's try runit first.
Here we can see the full process hierarchy and /sbin/init is not PID 1. That's because htop
uses /proc
which didn't get remounted so we see the full hierarchy (but cannot kill them because the PID is not accessible heh!).
So if we mount procfs
again, we get processes on the current namespace and htop
(and others) work properly.
Another caveat is the reboot/poweroff logic. We cannot do it from the init system as it won't be able to do the reboot(2)
syscall (blocked because it's working on a pid namespace). We can anyway do slc reboot
or slc poweroff
.
Let's try systemd now. systemd
is more complex and makes use of a lot of things, but works surprisingly well.
I'm using Arch Linux to test as it uses systemd by default.
As you can see, it works but we can see all processes until we mount procfs
.
This init is isolated on a mount namespace so, if we launch htop
from inside systemd and outside, we see the effect.
The /run
dir is not remounted, so the socket exposed by service-launcher
is there and we can connect to it.
What's next
We're missing some useful features systemd
has (ok, we're missing a lot).
- Templates: so useful for things like
agetty@<tty>
ordhclient@<interface>
. It's queued but not planned. - Follow forking processes. Some processes do
fork(2)
and we can't follow them. We're currently working here, stay tuned on the 4th article to follow the process. - Better handle of mounts. Also, working on this, but w/o priority.
- Set machine hostname at start from
/etc/hostname
. - Document commands on the socket.
- You may have missed the reload command (something like
daemon-reload
on systemd. It's not currently implemented, but its planned. - Fix race conditions on
service-launcher
. More likely, the process state. This may be looked with the reload feature.
There are some other features on systemd that we're not going to support like:
- Confining all processes on
cgroups
- Derived from previous point, do firewalling on
cgroups
- Securing folders against those processes (RO home directory or private tmp).
Next article will address "How to follow slippery processes". Stay tuned for more.
Top comments (0)