Jorge

Posted on Dec 2, 2019 • Originally published at jorge.aguilera.soy on Nov 30, 2019

Jobs

#micronaut #kubernetes #k8s #groovy

This is the third part of a series of posts about how I’ll develop an application in Kubernetes (k8s)

first post: Idea (https://jorge.aguilera.soy/blog/prestamos-bibliotecas/k8s-1.html)
second post: Infraestructure (https://jorge.aguilera.soy/blog/prestamos-bibliotecas/k8s-2.html)
third post: Job (https://jorge.aguilera.soy/blog/prestamos-bibliotecas/k8s-3.html)

The main ot these posts is to document the process of deploying a solution in k8s at the sametime I’m writting the application so probably all posts will have a lot of errors and mistakes that I need tocorrect in the next post.

| | Be aware that I’m a very nobel with Kubernetes and these are my first steps with it.I hope to catch up the attention of people with more knowledge than me and maybe they can review these posts and suggestto us some improvements. |

| | I’ve created a git repository at https://gitlab.com/puravida-software/k8s-bibliomadridwith the code of the application |

Gradle multiproject

I’ve splitted the application into a multimodule Gradle projects with:

com-puravida-biblios-model as a Micronaut jar library with the model and the repository.Also this library will have the liquidbase files to update the database schema
com-puravida-biblios-etl as a Micronaut cli application who takes a year and a month asarguments to download the csv file and import it into the database.

These are typical project created with the cli as:

mn create-app --lang groovy --profile cli com.puravida.biblios.etl

As at some point I’ll need a database (PostgreSQL) I use Okteto’s hability to develop anapplication into a kubernetes cluster with a PostgreSQL deployed (see step2) in a similarway as if I use a local instance of PostgreSQL or TestContainer,etc.

| | For this use case, Okteto doesn’t add a lot of value because I have only a databasedependency that I can solve with local solutions. If the application requires more artifactsas specific services, databases or tools that are difficult to install and maintain inevery developer desktop, Okteto can be a good solution to develop directly in a kubernetescluster. |

So basically I’ve created an okteto.ini file in the root of the gradle proyect

okteto.ini

name: gradle
image: gradle:latest
command:
- bash
volumes:
  - /home/gradle/.gradle
forward:
  - 8080:8080
  - 8088:8088
environment:
      - POSTGRES_PASSWORD=okteto
      - POSTGRES_USER=okteto
      - POSTGRES_DB=okteto
      - POSTGRESQL_SERVICE_HOST=10.0.7.172

And when I want to develop directly in the cluster I only need to execute:

$ okteto up
groovy:groovy$ ./gradlew build

With okteto up I create a new Pod called gradle with the gradle docker image where I can run commands as build, run, etc.

Also I can edit files in my local disk and okteto will synchronize them with theremote pod. In the same way I can run the application in the podand connect to it with my IntelliJ in order to debug it as if it was running in mylaptop

As you can see I’ve injected some environment values related to my Postgre databaseso the application can works against it (insert records at develop time, etc)

Model

By the moment the model is very simple with only two Domain Object and two repositories:

Job

Once I had the etl ready, able to download a file and parse and insert into the databaseI wanted to have a kubernetes way to run it using differents years and months.

One posibility is to have a pod and via command line or with a web endpoint invoque the importprocess but in this case I’ve used a Job.

Kubernetes has the possibility to run a container via a Job in a lot of different scenaries(one-shot, with a chron, rety if fails, etc). For my purpose I want to run a single Job,and launch it mannualy (once I verified a new file is ready in the portal of OpenData).

This job needs to connect to the database so I’ll need to use the ConfigMap where I savedthe connection details (host, user, password and database). Also I need to indicate theyear and month to process (via arguments command line)

After some reseachs (trial and error) I learnt some lessons:

When you run a Job (with kubectl apply -f job-file.yaml for example),kubernetes creates a new pod and until you don’t remove it withkubectl delete name-of-the-job-id it remains into your cluster (as finished). This can beusefull to inspect the logs for example
You can run a Job multiple times and kubernetes will create new pods every time but you needto inspect what’s the last executed.
Once executed a Job you CAN’T modify the spec and try again to run it.You’ll have a field inmmutable error. To solve it simple delete the old job and retry

As I want to run a job using different arguments (year and month) I can’t use a single yaml filewithout deleting jobs previously executed, so I ended with a template, and using sed commandI replace some variables to produce final yaml files:

apiVersion: batch/v1
kind: Job
metadata:
  name: com-puravida-biblios-etl-$YEAR-$MONTH
  labels:
    jobgroup: etl
spec:
  backoffLimit: 0
  template:
    spec:
      containers:
        - name: com-puravida-biblios-etl
          image: jagedn/k8s-bibliomadrid-etl:$VERSION
          args: ["-y", "$YEAR", "-m", "$MONTH"]
          envFrom:
            - configMapRef:
                name: postgres-config
      restartPolicy: Never
      terminationGracePeriodSeconds: 0

as you can see we have 3 variables (YEAR, MONTH and VERSION) to replace

etl-tpl.yaml

$ sed -e 's/$YEAR/2019/g' -e 's/$MONTH/04/g' -e 's/$VERSION/0.1/g' k8s/etl-tpl.yaml > k8s/jobs/etl-2019-04.yaml
$ kubectl apply -f k8s/jobs/etl-2019-04.yaml
$ kubectl logs -f com-puravida-biblios-etl-2019-06-fdx6f

In this way not only I’ll have one file per year-month that I can run in every namespace I had (dev and prod)but also I’ll have in the repo all files versioned

Next step

Once we have some datas into our database we can develop a new service able to serve them via REST (or maybe GrapQL?)to Internet

Acknowledgment

Thanks to Pablo Chico de Guzman, @pchico83, to confirm a Job is a good alternative to do a ETL

Thanks to JJ Merelo, @jjmerelo, for suggesting me to use Celery (https://twitter.com/jjmerelo/status/1199935927931592705)Another tool to learn!!!

Thanks to Fran, @franco87ES, for confirm I can use ConfigMap in a Job (https://twitter.com/franco87ES/status/1200059717965504513) I was lost because after modified a yaml I wanted to re-apply it, without removed the old job and Kubernetes was rejecting the action with a 'field inmutable' error and I thought was due to the ConfigMap section. Thanks to Fran’sadvice I digged more into the problem and at the end I realized my error.

DEV Community

Jobs

Gradle multiproject

Model

Job

Next step

Acknowledgment

Top comments (0)

Read next

AI Matches Radiologist-Level Accuracy in Medical Image Analysis and Diagnosis

Introduction to Routing Protocols

Multi-account DNS with AWS CDK

AI Breakthrough: New Model Converts Any Voice Instantly Without Training Data