😡 Agonies, Despair and self hosted Redash BI server on Microsoft Azure cloud platform 😋 part 1

#devops #azure #docker #redash

Hello everyone,

Thank you for joining in today to talk about the open source BI tool named Redash. We will focus today mainly on how to set up a self hosted Redash server in the cloud starting from a local environment setup. We will see the nuances hidden in the process and try to 'debug' on the fly.

Motivation

You may wonder about the article's title to be a bit... maybe negative ?! Well i *** you not but the amount of energy and time wasted on solving simple issues is tremendous. I am writing this article so you potentially and hopefully won't go into the rabbit hole as a public service.

Let's hit the road!

Prerequisites

Before we begin, as always we want to know what is the minimum for us to be able to start and be efficient and productive.

Visual Studio Code
Azure Account
Docker - You do need to be familiar with docker, docker compose and yaml configurations file. In case you are not, I will suggest visiting and reading this article.

General setup

We want to understand which services integrate with Redash so we can have a minimalistic working example. We will list now what we need for us for the purpose of this article and later we will see what and where we can add additional services that you may fancy for you custom self hosted setup.

Service List
- Nginx server
- Redis server
- PostgreSQL database
- Redash server
- Redash scheduler
- Redash worker x 3 (optional, minimum is 2)
- pgAdmin (optional, GUI for PostgreSQL)

As you can see there is quite a bit to it. We need all these services to be configured and to be available to work in harmony with one another so the goal to have a self hosted Redash is successful.

Local Environment

In this step we will configure our development environment that is going to be almost the exact same as our operational cloud environment. We will use docker compose for this purpose, so install it if you don't already have it installed. I provided a link earlier in the prerequisites section.

We are going to write a docker compose file that is going to be a bit hefty. We will not use external files (.env files), because when we are ready to proceed to the cloud setup, there is no option to be dependant on external files, this way we will make ourselves used to the same perspective.

Let's first understand what these services are for in the list we wrote earlier. Redash is a python based application, so as PostgreSQL and they work perfectly together. The reason we need a database for the Redash application itself is basically for the application management and maintenance. Redash is dependant on other data sources in order to display the BI you fancy. The use of Redis server in the setup is for caching and queuing purposes and Nginx server is there to be a reverse proxy. We will see that the nginx server is a customised one for the Redash purposes. We will take a look why we need the Redash workers later on.

No we will write our docker compose file, we will fetch all our services from docker hub. I would suggest that you explore it for the specific services that we are going to be using and also maybe other services that you are interested in. In my opinion the instructions written in docker hub are a great resource.

# docker-compose.yml

version: "3.9"

x-environment: &base_environment
    PYTHONUNBUFFERED: 0
    REDASH_WEB_WORKERS: 4
    REDASH_LOG_LEVEL: "INFO"
    REDASH_RATELIMIT_ENABLED: "false"
    REDASH_REDIS_URL: "redis://redis_server:6379/0"
    REDASH_MAIL_DEFAULT_SENDER: "redash@example.com"
    REDASH_ADDITIONAL_QUERY_RUNNERS: "redash.query_runner.python"
    REDASH_DATABASE_URL: "postgresql://postgresuser:postgrespassword@postgresdb/redash"


x-base_redash: &base_redash
  environment:
    <<: *base_environment
  image: redash/redash:8.0.2.b37747
  restart: always


services:
  # redis
  redis_server:
    image: redis:alpine
    container_name: redis_server_local
    restart: unless-stopped

  # database
  postgresdb:
    image: postgres:alpine
    restart: always
    container_name: postgresdb_server_local
    ports:
      - "5432:5432"
    environment:
      POSTGRES_HOST_AUTH_METHOD: "trust"
      POSTGRES_USER: postgresuser
      POSTGRES_PASSWORD: postgrespassword
      POSTGRES_DB: redash
    volumes:
      - ./postgres-data:/var/lib/postgresql/data

  # pgAdmin
  pgAdmin:
    container_name: "pgAdmin_local"
    image: dpage/pgadmin4
    restart: always
    ports:
      - "11180:80"
      - "11443:443"
    environment:
      PGADMIN_CONFIG_ENHANCED_COOKIE_PROTECTION: "False"
      PGADMIN_DEFAULT_EMAIL: pguser@mail.com
      PGADMIN_DEFAULT_PASSWORD: pgpassword
    depends_on:
      - postgresdb

    volumes:
      - ./pgadmin:/var/lib/pgadmin
      - ./pgadmin/backup:/var/lib/pgadmin/storage

  # redash server
  server:
    <<: *base_redash
    command: server
    ports:
      - "5000:5000"
      - "5678:5678"
      - "8081:8080"
    depends_on:
      - postgresdb
      - redis_server

  # redash scheduler
  scheduler:
    <<: *base_redash
    command: scheduler
    depends_on:
      - server
    environment:
      << : *base_environment
      QUEUES: "celery"
      WORKERS_COUNT: 1

  # redash worker 1
  scheduled_worker:
    <<: *base_redash
    command: worker
    depends_on:
      - server
    environment:
      << : *base_environment
      QUEUES: "scheduled_queries"
      WORKERS_COUNT: 1

  # redash worker 2
  adhoc_worker:
    <<: *base_redash
    command: worker
    depends_on:
      - server
    environment:
      << : *base_environment
      QUEUES: "queries"
      WORKERS_COUNT: 2

  # redash worker 3
  scheduled_worker:
    <<: *base_redash
    command: worker
    depends_on:
      - server
    environment:
      << : *base_environment
      QUEUES: "schemas"
      WORKERS_COUNT: 1

  # nginx - pay attention to the image name
  nginx:
    image: redash/nginx:latest
    ports:
      - "8080:80"
    depends_on:
      - server
    links:
      - server:redash
    restart: always

Once we finished writing our docker compose yml configurations file we are ready to spin up our local environment and see it in action. We want to open the terminal in the same directory where we saved our docker-compose.yml file and also open docker desktop. Now run the next command.

    $ docker-compose -f docker-compose.yml up

You can see that the docker images are pulled from docker hub and and all the links are created one after the other. In the end of the startup you should see all containers available and colored green to indicate that they are healty. You also can see that the terminal is now blocked because of all the logs we that are streamed to it. We did it on purpose actually, we needed to see the logs. Obviousely we could run it in detached mode. I guess that maybe you want to see if Redash is working now? Let's locate the nginx container inside docker desktop, hover on that row and click on the 'open in browser' button.

Surprise! it doesn't work!! But why ? We did everything right. Well that is the first neuance. There is one more command we need to run to make it work locally. As our terminal is busy with the logs stream, I want to open a new terminal and run inside of it a one time command. You can close it after we are done.

    $ docker-compose -f docker-compose.yml run server create_db

This command tells the Redash server to run a pre configured database setup script so it will be able to manage itself. You need to pay attention and make sure that this command ends successfully other wise no Redash for you!

I would like to take the opportunity now and talk about another neuance here. If you remember we are using a specific version of Redash image (look at the anchor declaration), other images, older ones, may fail and crash during the last command that we run to setup the database. In case that happens, pay attention what you need to do, the solution is very easy but it took us a very long while to figure it out. You will have to create a database named 'redash' manually inside the postgresql server instance. I suggest using the pgAdmin instance that we listed as one of our services availabe locally. You can either use the same details as i wrote here in the connection string and the environment values that i set for postgres or you can change it. What ever you like. So once you connected to the postgres server instance and created the 'redash' database, just run the last command again.

Now let's go back to the browser and refresh. VOILÀ! At this point you should be seeing blanc, fresh Redash instance welcome screen that is asking you to create the first account with username as email and password to take the role of an admin.

As agreed in the beginning, we are not going to actually create the dashboards and actually make use of the BI capabilities. We are only talking about the setup here. Once we are finished with also creating our Azure cloud environment for Redash, we will take a look at some other configurations we need in order to make use of it properly.

I want to discuss about the docker compose configurations file. You can see that we make use of YAML anchors here. We actually have 2 of them. One for describing a Redash base service and the other is for the environment variables we want to pass to each instance. The service named 'server' is the one that runs the Redash application and it gets everything that the base_redash object contains, the Redash workers on the other hand, well they need a specific configurations on top of the base_redash object. We obviousely could write everything inline but i guess you can understand that we would get a mile long configurations file. If we can reuse instead of replicate then this is what we will do. The workers fill in for particular roles and each of them needs a specific configurations to do it. We pass via the environment these configurations, as you can see we actually extend and overide the base_environemnt object with each worker instance. The special '<<' double chevron right sign is used inside the YAML language to merge and override key: value pairs in the same object. What we did was declaring key: value pairs in the global obejct and override it as we needed to and also we added additinal keys QUEUES, WORKERS_COUNT that holds different values per worker (again, i will recommend reading the article i linked in the beggining to understand YAML better).

As this discussion / guide is getting to be a bit long, I decided to split it to 2 parts. We will end now this part with a working local environment so you could explore a bit on your own and I will release the second part once i finish it hopefully over the wekend.

Part 2 is coming...
Stay tuned for next
Like, subscribe, comment and whatever ...
Goodbye