loading...

Systemd And Timing Issues

ferricoxide profile image Thomas H Jones II Originally published at thjones2.blogspot.com on ・3 min read

Unlike a lot of Linux people, I'm not a knee-jerk hater of systemd. My "salaried UNIX" background, up through 2008, was primarily with OSes like Solaris and AIX. With Solaris, in particular, I was used to sytemd-type init-systems due to SMF.

That said, making the switch from RHEL and CentOS 6 to RHEL and CentOS 7 hasn't been without its issues. The change from upstart to systemd is a lot more dramatic than from SysV-init to upstart.

Much of the pain with systemd comes with COTS software originally written to work on EL6. Some vendors really only due fairly cursory testing before saying something is EL7 compatible. Many — especially earlier in the EL 7 lifecycle — didn't bother creating systemd services at all. They simply relied on systemd-sysv-generator utility to do the dirty work for them.

While the systemd-sysv-generator utility does a fairly decent job, one of the places it can fall down is if the legacy-init script (files hosted in /etc/rc.d/init.d) is actually a symbolic link to someplace else in the filesystem. Even then, it's not super much a problem if "someplace else" is still within the "/" filesystem. However, if your SOPs include "segregate OS and application onto different filesystems", then "someplace else" can very much be a problem — when "someplace else" is on a different filesystem from "/".

Recently, I was asked to automate the installation of some COTS software with the "it works on EL6 so it ought to work on EL7" type of "compatibility". Not only did the software not come with systemd service files, its legacy-init files linked out to software installed in /opt. Our shop's SOPs are of the "applications on their own filesystems" variety. Thus, the /opt/<APPLICATION> directory is actually its own filesystem hosted on its own storage device. After doing the installation, I'd reboot the system. ...And when the system came back, even though there was a boot script in /etc/rc.d/init.d, the service wasn't starting. Poring over the logs, I eventually found:

 systemd-sysv-generator[NNN]: stat() failed on /etc/rc.d/init.d/<script\_name>
 No such file or directory

This struck me odd given that the link and its destination very much did exist.

Turns out, systemd invokes the systemd-sysv-generator utility very early in the system-initialization proces. It invokes it so early, in fact, that the /opt/ filesystem has yet to be mounted when it runs. Thus, when it's looking to do the conversion the file the sym-link points to actually does not yet exist.

My first thought was, "screw it: I'll just write a systemd service file for the stupid application." Unfortunately, the application's starter was kind of a rats nest of suck and fail; complete and utter lossage. Trying to invoke it from directly via a systemd service definition resulted in the application's packaged controller-process not knowing where to find a stupid number of its sub-components. Brittle. So, I searched for other alternatives...

Eventually, my searches led me to both the nugget about when systemd invokes the systemd-sysv-generator utility and how to overcome the "sym-link to a yet-to-be-mounted-filesystem" problem. Under systemd-enabled systems, there's a new-with-systemd mount-option you can place in /etc/fstabx-initrd.mount. You also need to make sure that your filesystem's fs_passno is set to 0 ...and if your filesystem lives on an LVM2 volume, you need to update your GRUB2 config to ensure that the LVM gets onlined prior to systemd invoking the systemd-sysv-generator utility. Fugly.

At any rate, once I implemented this fix, the systemd-sysv-generator utility became happy with the sym-linked legacy-init script ...And my vendor's crappy application was happy to restart on reboot.

Given that I'm deploying on AWS, I was able to accommodate setting these fstab options by doing:

 mounts:
   - ["/dev/nvme1n1", "/opt/<application> , "auto", "defaults,x-initrd.mount", "0", "0"]

Within my cloud-init declaration-block. This should work in any context that allows you to use cloud-init.

I wish I could say that this was the worst problem I've run into with this particular application. But, really, this application is an all around steaming pile of technology.

Posted on by:

ferricoxide profile

Thomas H Jones II

@ferricoxide

Been using UNIX since the late 80s; Linux since the mid-90s; virtualization since the early 2000s and spent the past few years working in the cloud space.

Discussion

markdown guide
 

Oh, boy! I only occasionally venture into DevOps, and the pain is real. Things that are supposed to "just work" don't, and you're left alone in the dark wildrness. I struggled with lsyncd on Ubuntu 16.04 until I switched to an NFS setup. I still remember me pulling my hair out!

 

Yeah. I can't say that the organizations we're doing enablement for are really "DevOps". We're mostly trying to get them moving in that direction (since they say that's where they want to be). The whole "self-sustaining deployments" and "infrastructure as code" thing is a hard row to hoe with a lot of the COTS software they want to use. So, while it's possible to automate many things — at least partially — you can definitely tell when you're working with (against?) a product that you can tell was never looked at with automated deployment in mind.

Attempts at automation often run smack into the wall of "GUI-oriented management". Most frequently, one can automate the baseline deployment of COTS software ...and even redeployment, but there's no good method to automate the configuration of much COTS software. I still remember one Java-based application I was automating, I'd located the file that contained its JDBC configuration. Was really pissed when, after sorting out how to pre-place a configured JDBC file, discovered that the application would simply blow away that file on first-start. Grr...