DEV Community

Dev
Dev

Posted on • Updated on

Journey Through DevOps - Part 3: Higher

This is 3 part series documenting my journey through Software development. For Journey Through DevOps - Part 2: The Awakening, click here

This post will be an insight into the automation that we use in our projects. This is not a tutorial but a demonstration of how automation can change the software process.

Background

We manage the deployment of 2 applications - Chatwoot and Rasa-X. We aim to deliver content through the Natural Language Processing framework of Rasa and use Chatwoot to track and intervene in the chat, if necessary.

Technology stack

  • Cloud: AWS
  • Deployment: Helm, Kubernetes(1.21, Elastic Kubernetes Service)
  • Datastore: Postgres, Redis (AWS managed)
  • Applications: Rasa-X(Python), Chatwoot(Ruby on Rails)
  • VCS: Gitlab

Setup

Kubernetes

We use AWS EKS for deploying our applications. The pods run on both FarGate and Managed Node-groups to get the best of both worlds. Since the micro-services of chatwoot are stateless we use FarGate, whereas the Rasa-X deployment uses node groups for certain pods that need fewer resources to run and also need to be stateful.

Infrastructure

For AWS infrastructure, we use Terraform. The repo is connected as a Version control system to Terraform cloud allowing us to focus more on the infrastructure itself instead of having to CI pipelines.

Helm

We use Helm charts for both Rasa-X and Chatwoot, latter of which was built by us. Helm allows us to have a standard way to deploy the application instead of a lot of YAML files which need modification every time there's an update.

The Automation

Automation by its nature, is intuitive. General rule of thumb that we follow, is that if a particular task is done more than 3 times, it is best to automate it.

Let's start with the lightweight components.

Chatwoot

Since chatwoot is an open source project and we don't add a lot of custom code to chatwoot, there isn't any need for CI/CD pipelines for chatwoot. Every update, it's easier to run

helm upgrade <release-name> chatwoot/chatwoot

Infrastructure

Since we're using Terraform cloud, this is also very simple. Although we did consider using Terraform locally in pipelines, this was an overhead considering as our team was small. Note that you can use Terraform cloud for both execution of Terraform and storing the Terraform state alone.

Rasa-X

This is the component that involves the maximum number of repetitions, and therefore is fully automated.

Rasa-X has several components but we'll focus on the important ones for now, such as Rasa open source and Rasa Custom Actions server. Former is the NLP part of Rasa and the latter is a python server that lets you run custom events in your chatbot such as fetching data from a database. Since chatbots are iterative, it was imperative that we automate this process before moving ahead.

CI/CD

This part requires you to repetitively train NLP models and test them just as much. Hence use gitlab-ci pipelines to train a model, subsequently test it and upload the results as artefacts.


build-actions:
  image: docker:20.10.7
  stage: build
  services:
    - docker:20.10.7-dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - chmod +x ./ci.sh
    - ./ci.sh

train-job:       
  stage: train
  script:
    - rasa train --fixed-model-name $CI_COMMIT_SHORT_SHA
  artifacts:
    paths:
      - models/
    expire_in: 1 day

data-validate-job:  
  stage: validate    
  script:
    - rasa data validate

core-test-job:   
  stage: test    
  script:
    - rasa test core --model models/ --stories test/ --out results

nlu-test-job:   
  stage: test    
  script:
    - rasa test nlu --nlu data/nlu.yml --cross-validation

upload-job:
  stage: report
  script:
    - echo "Upload results"
  artifacts:
    paths:
      - results/
Enter fullscreen mode Exit fullscreen mode

This can be broken into 2 parts. The custom actions server part and the Rasa bot.
Custom actions is a python server, so we dockerize to allow for easier development and standardisation. The "./ci.sh" is a shell script to read the branch name from the environment variable and set the docker image tag accordingly, to build and push it to the registry.

The next parts of this pipeline, train a model, validate that model for any conflicts and errors and finally run tests on them. All the tests are uploaded as artefacts in the pipeline. This allows us to log every model and iteration. The --fixed-model-name tag is to ensure that models have a predictable name which can be used further in the pipeline. Rasa defaults to using Timestamp which can be unpredictable.

Branches:
  • main: This is the branch that represents production, so every piece of code that resides here is battle tested and verified. Hence it is safe to assume that with every push into this branch, we can update the NLP model in the server.

Docker tag for custom actions: stable

deploy-job: 
  stage: deploy 
  #before_script: []
  #image: registry.gitlab.com/gitlab-org/cloud-deploy/aws-base:latest  # see the note below
  script:
    - apt-get update
    #- apt install git-all -y
    - curl -k -F "model=@models/$CI_COMMIT_SHORT_SHA.tar.gz" "http://rasax-url.com/api/projects/default/models?api_token=$RASAXTOKEN"
    - echo "Application successfully deployed."
    - aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}
    - curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
    - helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig
    - helm repo add rasa-x https://rasahq.github.io/rasa-x-helm  
    - helm upgrade rasa rasa-x/rasa-x -n rasa --set app.name=$CI_REGISTRY/weunlearn/wulu2.0/rasa_actions --set "app.tag=stable" --reuse-values   # Redeploys the kubernetes deployment with a new image name while reusing already existing values      

  rules:
    - if: '$CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == "main"'
Enter fullscreen mode Exit fullscreen mode

This stage in gitlab pipelines that is only executed when the event is a push event and the branch committed to is main. This prevents any unnecessary triggering of model training for minute changes in various other branches.

  • develop: Since this is a non-production environment, there is no deployment from this branch. The deploy stage from the previous section is not executed. However, since this branch serves as a reference point before updating production code in the main branch, it has a static name.

The test results are uploaded as before. This allows us to keep track of every model iteration.

Docker tag for custom actions: develop

Workflow

Pushes into non-main, branches:

Pipeline of non-main branches

Pushes into main

The pipeline is the same as before except it has a final stage
Pipeline last step of main branch

Conclusion

This setup enables a tech team of 2 people, to manage multiple applications in our production environment, while also ensuring a standardised way to perform quality control. Now the goal of the tech team is to solve problems and build the product, as opposed to dedicating a significant share of time to figuring out how to manage the product.

Discussion (0)