DEV Community

Sherine Khoury
Sherine Khoury

Posted on

A gopher’s journey to the center of container images

Blissful past...

A couple of years ago, I would never have thought that I would get that interested in the underlying structure of containers, not to mention going into the journey of building one in Golang.

I was living the blissful life of an engineer who simply uses podman pull or docker push, creates ContainerFile or Dockerfile, then runs command lines to build images from such files... sitting back and watching the standard output list the layers being built, then pushed one by one with nice digests to the registry of my choosing, under the tag of my choosing.

All changed when...

All this changed when I started contributing to oc-mirror, around a year ago. oc-mirror is a plugin of OpenShift's CLI, and targets disconnected clusters. It mirrors all images needed by such clusters in order to install and upgrade OpenShift as well as all its Day-2 operators from operator catalogs.
Suddenly, the underground world of containers unraveled.

Most of the logic of oc-mirror is about extracting metadata from images such as release images and operator catalog images, interpreting the contents of these images in order to determine the list of images that constitute a release or an operator, and later copy those images to an archive or to a partially disconnected registry.

Nevertheless, some of the activities also include building multi-arch images. This is the case for the graph image. Without going into the details of what this image is useful for, let's just say that the graph image is simply a UBI9 image, to which we copy some metadata in /var/lib, and whose CMD we modify, so that this image can become an init container for the disconnected cluster to use.

Let's start building

In this article, we are roughly trying to build in Golang the equivalent of this very simple Containerfile:

FROM registry.access.redhat.com/ubi9/ubi:latest

RUN curl -L -o cincinnati-graph-data.tar.gz https://api.openshift.com/api/upgrades_info/graph-data

RUN mkdir -p /var/lib/cincinnati-graph-data && tar xvzf cincinnati-graph-data.tar.gz -C /var/lib/cincinnati-graph-data/ --no-overwrite-dir --no-same-owner

CMD ["/bin/bash", "-c" ,"exec cp -rp /var/lib/cincinnati-graph-data/* /var/lib/cincinnati/graph-data"]
Enter fullscreen mode Exit fullscreen mode

I'll try to describe the three paths I explored to achieve this task. I'm aware these are probably not the only possibilities, and probably not always adapted to what your context is:

Context A - having root capabilities: Using containers/buildah

For the task of building the graph image, my first idea was to rely on buildah.
In fact, our design was already heavily relying on containers/image for all things regarding copying images from one registry to the other, or from one registry to an archive. The obvious choice was to use the same suite of modules in order to keep dependencies to a minimum.

My implementation effort was greatly guided by Buildah's tutorial 4-Include in your build tool.

I'm assuming here that the golang binary that I'm building can have root privileges. If this is not your context, and you'd like to run this binary as non-root, you will need a special setup of the builder (which you can find in the next section).

With the assumption that root privileges are available, the implementation is fairly simple. As you'll see below, each instruction of the Containerfile has an equivalent method in the builder interface.

I encountered one small gotcha: Any files or folders that you want to copy/add to the image need to be in the current working directory.

For our development, this was a little incovenience: why would someone using the tool in his home directory suddenly end up with Openshift's upgrade graph metadata poluting his home?! But this could easily be worked around by cleaning up in a defer statement when the builder was done (regardless of the build outcome: success or failure).

All the code is available here.

Now let's break down what needs to be done:

Initializing the builder - FROM instruction

I want to initialize the builder on ubi9 image. This is passed in the BuilderOptions like this:

const(
    graphBaseImage              string = "registry.access.redhat.com/ubi9/ubi:latest"
)
// ... truncated code
builderOpts := buildah.BuilderOptions{
  FromImage:    graphBaseImage,
  Capabilities: capabilitiesForRoot,
  Logger:       logger,
}
builder, err := buildah.NewBuilder(context.TODO(), buildStore, builderOpts)
Enter fullscreen mode Exit fullscreen mode

Adding a layer - ADD instruction

Given that I have prepared the files that need to be copied to the image in graphDataUntarFolder, I can add the content of the whole folder using builder.Add. The AddAndCopyOptions can help set the userID and groupID owning these files and folders inside the container.

    addOptions := buildah.AddAndCopyOptions{Chown: "0:0", PreserveOwnership: false}
    addErr := builder.Add(graphDataDir, false, addOptions, graphDataUntarFolder)
Enter fullscreen mode Exit fullscreen mode

Updating the command - CMD instruction

Next, we want to setup the command of the container image. This is rather straightforward:

    builder.SetCmd([]string{"/bin/bash", "-c", fmt.Sprintf("exec cp -rp %s/* %s", graphDataDir, graphDataMountPath)})
Enter fullscreen mode Exit fullscreen mode

Building and pushing

It's now time to build the image and push it. By default, you can push to the store by first preparing the image reference like so:

imageRef, err := is.Transport.ParseStoreReference(buildStore, "docker.io/myusername/my-image")
Enter fullscreen mode Exit fullscreen mode

But in my case, I opted for pushing it directly to the destination registry, like so:

    imageRef, err := alltransports.ParseImageName("docker://localhost:7000/" + graphImageName)
  // ... truncated code
    imageId, _, _, err := builder.Commit(context.TODO(), imageRef, buildah.CommitOptions{})
Enter fullscreen mode Exit fullscreen mode

Context B - using buildah as non root

oc-mirror being a CLI plugin, it should not require any extra root permissions in order to build images.

Buildah provides a way to run as non-root. But before we delve into that, a small parenthesis on the configuration of the store that Buildah uses:

Store defaults

Buildah relies on a build store for keeping track of layers, images pulled, built, etc. For setting up the build store, I simply used all the default setups available in the buildah module, like so:

    logger := logrus.New()
    logger.Level = logrus.DebugLevel
    buildStoreOptions, err := storage.DefaultStoreOptionsAutoDetectUID()
  // ... truncated code
    conf, err := config.Default()
  // ... truncated code
    capabilitiesForRoot, err := conf.Capabilities("root", nil, nil)
  // ... truncated code
    buildStore, err := storage.GetStore(buildStoreOptions)
    // ... truncated code
    defer buildStore.Shutdown(false)
    builderOpts := buildah.BuilderOptions{
        FromImage:    graphBaseImage,
        Capabilities: capabilitiesForRoot,
        Logger:       logger,
    }
    builder, err := buildah.NewBuilder(context.TODO(), buildStore, builderOpts)
  // ... truncated code
Enter fullscreen mode Exit fullscreen mode

Setup for non-root execution

In order to integrate the buildah module to your golang product without root privileges, buildah's recommendation is to pause the execution of the go binary, create a user namespace where it could be root, and re-execute the binary in that user namespace.

This is achieved by adding the following lines in main.go, as early as you can in the main function:

if buildah.InitReexec() {
  return
}
unshare.MaybeReexecUsingUserNamespace(false)
Enter fullscreen mode Exit fullscreen mode

This has to be added in the main function: you have to keep in mind that the execution will restart from the beginning, so any initializations will be done a second time.

Impacts on debugging

Re-executing has a few impacts on the way we debug our code:
This modifies the debugging process: In order to debug, I had to launch dlv debugger in a user namespace:

podman unshare dlv debug --headless --listen=:43987 main.go 
Enter fullscreen mode Exit fullscreen mode

PS: if you need to pass arguments to main, you can add -- to the command above, then append any arguments you have.

Once the command above is triggered, it is possible to use delve to debug (either using dlv directly or attaching to it with a client).

If you use VSCode, it is possible to attach it to the dlv process running in the background. This is achieved by adding the following code to the configurations[] inside of the launch.json:

{
    "name": "Attach Package",
    "type": "go",
    "debugAdapter": "dlv-dap",
    "request": "attach",
    "mode": "remote",
    "host": "localhost",
    "port": 43987,
},
{
    "name": "Attach Tests",
    "type": "go",
    "debugAdapter": "dlv-dap",
    "request": "attach",
    "mode": "remote",
    "host": "localhost",
    "port": 43987,
}
Enter fullscreen mode Exit fullscreen mode
Impacts on users

Finally, for the use cases where our binary must run in a container, or in a pod on a Kubernetes cluster, it is important to setup securityContext and to list all the capabilities necessary to be able to run the binary inside the container. Among these capabilities, you need to include CAP_SETGID and CAP_SETUID. Other capabilities might as well be needed.

Full code

graph-data-image-builder

Context B - Using go-containerregistry as non-root

I also explored another module, go-containerregistry, in order to build images without root privileges. The approach is completely different, and we can manipulate each component of the container image separately. This can present an advantage, if you're looking for a way to fine tune things.

Preparing for use of go-container-registry

In order to start using the remote package of go-container-registry to pull/push images, you need to set :

  • nameOptions: StrictValidation vs WeakValidation, and the possibility for default registries to be used while referring to container images
  • remoteOptions: which group all configurations related to pulling and pushing images, such as:
    • connection proxies, timeouts, keepAlives, use of http2 or http1.1
    • configuration files containing credentials for registries
    • TLS verification explicit disabling (if needed)
    nameOptions := []name.Option{
        name.StrictValidation,
    }
    remoteOptions := []remote.Option{
        remote.WithAuthFromKeychain(authn.DefaultKeychain), // this will try to find .docker/config first, $XDG_RUNTIME_DIR/containers/auth.json second
        remote.WithContext(context.TODO()),
        // doesn't seem possible to use registries.conf here.
    }
Enter fullscreen mode Exit fullscreen mode

Pulling the origin image

Each image we want to build needs to be copied to a folder of your choosing on local disk. That folder (layoutDir) will contain the image layout, with any manifest-list, oci index, manifest, config, and layers...

This is achieved by using remote and layout like so:

  imgRef := "registry.access.redhat.com/ubi9/ubi:latest"
    ref, err := name.ParseReference(imgRef, b.NameOpts...)
    if err != nil {
        return "nil", err
    }
    idx, err := remote.Index(ref, b.RemoteOpts...)
    if err != nil {
        return "", err
    }
    layoutPath:= layout.Write(layoutDir, idx)
    return layoutPath
Enter fullscreen mode Exit fullscreen mode

Creating a layer

Adding a layer from a tar can be achieved very easily using tarball.

Given that outputFile is a string containing the path to a tar file, LayerFromFile automatically untars the tar file contents and constructs a layer from that.

outputFile could be anywhere on the filesystem. There are no restrictions to it being saved to the working directory like in buildah.

  layerOptions := []tarball.LayerOption{}
  layer, err := tarball.LayerFromFile(outputFile, layerOptions...)
  if err != nil {
    return nil, err
  }
Enter fullscreen mode Exit fullscreen mode

Updating the command

For changing anything inside an image, mutate is needed.

This is slightly more complicated than what this snippet shows, due to the fact that an image might be a dockerv2-2 manifest list or oci index, itself containing several manifests (image for a specific architecture and OS).

In order to modify the command for the multi-arch image, we'd need to update the config of each of the underlying manifests.

But let's keep that out for now, and focus on how to modify the command for a single manifest. The full code is here.

// layoutPath is the result of layout.Write from the previous snippet
idx, err := layoutPath.ImageIndex()
if err != nil {
    return err
}
idxManifest, err := idx.IndexManifest()
if err != nil {
    return err
}
manifest := idxManifest.Manifests[0]
currentHash := *manifest.Digest.DeepCopy()
img, err := idx.Image(currentHash)
cfg, err := img.ConfigFile()
if err != nil {
  return nil, err
}
cfg.Config.Cmd = cmd
img, err = mutate.Config(img, cfg.Config)
if err != nil {
  return nil, err
}
Enter fullscreen mode Exit fullscreen mode

Building and pushing

Adding the layer

Same as for the modification of the command, adding a layer is achieved with mutate.

// `img` is the single arch image from the index. We get it by calling `idx.Image(currentHash)` like in the previous snippet
// `layer` is the 
additions := make([]mutate.Addendum, 0, len(layers))
for _, layer := range layers {
  additions = append(additions, mutate.Addendum{Layer: layer, MediaType: mt})
}
img, err = mutate.Append(img, additions...)
if err != nil {
  return nil, err
  }
Enter fullscreen mode Exit fullscreen mode
Building new manifests and index

Once a layer is added, or a Config modified, the manifest of the image should be updated. To be more exact, we need to remove the old manifest from the index, and add a new one.

This is done by creating a new descriptor for the img that was updated in previous snippets

desc, err := partial.Descriptor(img)
if err != nil {
    return nil, err
}
Enter fullscreen mode Exit fullscreen mode

Next, we need to update the image index, by replacing the descriptor:

add := mutate.IndexAddendum{
    Add:        img,
    Descriptor: *desc,
}
modifiedIndex := mutate.AppendManifests(mutate.RemoveManifests(idx, match.Digests(currentHash)), add)
resultIdx = modifiedIndex
Enter fullscreen mode Exit fullscreen mode

Full code

Conclusion

Using buildah is much more simple: out of the box, it has support for multi-arch image building, as well as support for registries.conf, which was a requirement for our product.

Furthermore, and like shown in this blog entry, each Containerfile instruction maps to a builder method. This makes the builder very easy to use.

go-containerregistry has all the necessary interfaces and methods to manipulate all the building blocks of container images, regardless of their format (dockerv2-1, dockerv2-2 or oci). It is probably worth investigating whether another golang module builds on top of go-containerregistry and provides an experience closer to that of a builder, abstracting away all the lower level changes, and allowing for building multi-arch images easily. But that's a subject for a next blog...

Top comments (0)