How to totally over-engineer your CV, part two

#r #docker #cv #career

In Part I of this series we focused on the text part and how to generate it automatically using GitHub actions.
What would be a resume without a totally unnecessary programming timeline part, though? Something to show that you've been there, done that, gone through Pascal, you even used Windows in the 90s, and, well, a visual representation of your skills that gives an idea of how you can approach, and defeat, any JavaScript framework you're thrown at.
So let's start with that timeline in a CSV file, something like this:

Item, Group, Start Date, End Date
Basic, Languages, 1983-12-25, 1990-12-01
Pascal, Languages, 1988-09-01, 1993-01-01
C, Languages, 1989-01-01, 1993-12-01

... and so on. There's the item itself, the stuff you have experience with, then a group. I have Languages, Environments like Linux or cloud, and then Tools like Docker; suit yourself here.

You might want "Sewing techniques" like "needlepoint" or "macramé" or "Zombie vanquishing" "with baseball bat" "sawed-off shotgun", whatever. Hey, any job, up to and including zombie exterminator, is well served with an over-engineered CV.

Start and end date are self-describing, only I am going to be using December this year, for purely aesthetic reasons, to indicate the stuff I'm still using now.

I will render this using R; as usual, there are good R visualization libraries for mostly anything. I will use vistime, an R module for visualizing timelines. But with a twist:

library("vistime")
library("ggplot2")

data <- read.csv("data/programming.csv")

g <- gg_vistime(data, col.event="Item", col.start="Start.Date", col.end="End.Date", col.group="Group") + theme(axis.text.x = element_text(angle=90, color='blue4',size=14) )+coord_flip()

g.d <- ggplot_build(g)

g.d$data[[4]]$angle <- 90

rebuilt <- ggplot_gtable(g.d)

png(filename="img/timeline.png", width=240, height=960)
plot(rebuilt)
dev.off()

The twist is that I wanted a vertical timeline, instead of an horizontal one; basically to fill one of the margins of one of the pages which was sitting empty, instead of occupying a big part of a page with a regular, horizontal, timeline. That's done in the second statement: a ggplot2 data structure is generated, which is basically a timeline with the coord_flip statement that flips coordinates to make it vertical. However, the problem was that bar labels were not flipped. Besides, since some items started at pretty much the same time, they overlapped. The result was not nice, and had the right-level-of-engineering. So we had to overengineer it with the next 6 sentences.

I knew that I had everything that's in the chart available in the g data structure. Any other language, it would have been easy to dis-assemble that thing into its composed objects, and deal with the part that actually does the labeling. Not so easy with R. In order to have access to the data structure, you need to issue the ggplot_build order.

Deconstructing through a command called build, that's rich...

That creates a table with a series of data frames, and you can work with that. The fourth element

and yes, R starts its arrays with 1, just like Pascal. See? Learning Pascal was not so useless, after all (in fact, it was pretty useful and the best thing you could do in the 80s).

contains a dataframe with all the bars. Flipping them is as easy as changing the angle to the right one, which happens to be the right angle too:

g.d$data[[4]]$angle <- 90

After that, you still have a dis-assembled chart, which you need to assemble back, and then actually save to a file to have it ready.

Running that is simply a matter of issuing a command from the command line. Creating a workflow that generates it is another matter. This might be the first version you could think about

name: "Programming timeline"
on:
  push:
    paths:
      - 'data/*.csv'

jobs:
  creates-timeline:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      - name: Instala R
        uses: r-lib/actions/setup-r@v1
        with:
          r-version: '3.5.3'
      - name: Install packages
        run: sudo apt-get install libcurl4-openssl-dev
      - name: Install dependencies
        run: |
          install.packages(c("ggplot2", "curl", "httr", "plotly", "vistime"))
        shell: Rscript {0}
      - name: Ejecuta el script
        run: Rscript .github/workflows/timeline.R

Off the bat, pretty straightforward. Set up R using one of its version managers, install the downstream dependency packages you need, then install the actual packages that are going to be using (including some dependencies that should probably not be there but are anyway, to check that they are installed in the right order). And then run the script.

That takes 7 minutes. That's a lot for a timeline. Installing R packages includes compiling some C, and sometimes Fortran, source, plus all the rest. 7 minutes. We need to cut that down, if only to not include in the timeline a substantial part that says "Generating this timeline".
Again, using a cache was discarded off the bat. Too much stuff in too many different places. But this time using a Docker image did make sense. That image will include pretty much the same, only it will come in a single, convenient, package:

FROM r-base

LABEL version="0.0.1" maintainer="JJMerelo@GMail.com"

RUN apt-get update \
    && apt-get install -y libcurl4-openssl-dev r-cran-ggplot2 libssl-dev r-cran-httr git\
    && R -e "install.packages(c( 'plotly', 'vistime'))"

WORKDIR /home/docker

ENTRYPOINT ["Rscript"]

Not a lot of overengineering here; just a straightforward installation of needed packages, and an entry point that enables us to use it as a R script runner.

Did I say not a lot of overengineering? Well, just the right amount. This is public in Docker Hub, which will try and compile a new image every single time you do a push. In most cases, there will be nothing new in the Docker image, it will just be some data added to the CV. And yes, I could have a path-filtered Github Action that uploaded it either to Docker Hub or to GitHub registry? But why? It's quite simple to do just the compilations you need, using Docker Hub hooks written in Perl, like this one:

use Git;

my $repo = Git->repository (Directory => '.');
my @modified_files = $repo->command('diff', "--name-only", "HEAD", "HEAD^");
die("No Dockerfile modified in the last commit\n")  unless grep( /Dockerfile/, @modified_files);

This one is actually pretty standard, and can be used in any repo as long as your Dockerfile is in the root dir. It checks if Dockerfile is in the last commit, and dies if it does not, signalling the Docker Hub pipeline that it should not continue.

This appears as a failure in the pipeline; there's no way to make the pipeline just stop instead of failing.

Again, this is possible because Perl is also installed in the Docker Hub runner. It's everywhere.

We can then use this Docker image-that's-only-regenerated-when-it-needs-to to in a renewed GitHub action:

name: "Programming timeline"
on:
  push:
    paths:
      - 'data/*.csv'

jobs:
  creates-timeline:
    runs-on: ubuntu-latest
    container: jjmerelo/cv
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      - name: Ejecuta el script
        run: Rscript .github/workflows/timeline.R
      - name: Checks in results
        shell: bash {0}
        run: |
          if [[ $(git status -s) ]]; then
              git config --global user.email "jjmerelo@gmail.com"
              git config --global user.name "CVDataBot"
              git commit -am "Update timeline chart"
              git push
          else
              echo "🟏 No Changes"
          fi

The bulk of it is actually checking if there's been some change in the image (even if the data changes, maybe the image does not, for instance, if it's only a change in whitespace). But there's a small piece of overengineering there too, we'll get to that later.

As shown in the container key, this GitHub action is run inside the container, so what's going to be available there is only what you put into the container. This is preferred to running it inside any other container and then docker pulling the image; it saves a bit of time. But then it runs the source checkout in the container too. This is a clever piece of software: it will run no matter what; it will download your source using its own devices... But we need to checkout the actual git repository, because we will need to push to it afterwards. This is why the container uses this:

apt-get install -y libcurl4-openssl-dev r-cran-ggplot2 libssl-dev r-cran-httr git

Installing git where it, in principle, would not have been needed. If present, actions/checkout uses git, and then we have our repo available for committing and pushing and whatever!

And it's also got another goodie: bash. By default, the shell used in a GitHub step is sh. You can change that, however (yes, including Perl, and yes, I added that piece to the documentation through a PR), and since doing the expression and checking with shell was a bit of a nuisance (and no, we were not going to install Perl in that container (although, come to think of it, it might be there already...))

... Anyway, an alternative way of just showing off that you know not only bash, but the difference between the syntax of bash and plain sh.

This new version takes less than a minute, with the bulk of it going to downloading and setting up the container

With a bit of more over-engineering, we could create a slimmed-down R container... Maybe in the future.

With that, your just-in-time generated CV is ready for downloading

Was it worth the while?

Well, creating a CV is positively boring. Using what you know to make it prettier or automatize its generation, and make it also a bit more entertaining, is definitely worth the while. It also shows craftsmanship, and a passion for optimization.

Also a bit of a tendency to overengineer stuff that, again, could be created using a wordprocessor. But that's not always bad, right?