ChunTing Wu

Posted on Oct 10, 2022

Building Apache Pinot and Presto

#bigdata #eventdriven #tutorial #programming

Recently, we have been surveying some streaming database solutions and the primary target is Apache Pinot, which fits our needs from the description and is therefore the primary target.

Apache Pinot, a real-time distributed OLAP datastore, purpose-built for low-latency high throughput analytics, perfect for user-facing analytical workloads.

However, some of the official documents are actually missing, so I found some information on the Internet and finally was able to do some experiments. These processes were not documented at that time, so I am not sure what reference materials were actually used.

Currently, the entire experimental infrastructure is built with docker-compose, and the complete codes in Github repository is as follows.
https://github.com/wirelessr/pinot-plus-presto

Apache Pinot

To explain the core components first, let's take a look at docker-compose.yml.

services:
  zookeeper:
    image: zookeeper:3.5.6

  pinot-controller:
    image: apachepinot/pinot:0.9.3
    ports:
      - "9000:9000"
      - "8888"
    environment:
      JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"

  pinot-broker:
    image: apachepinot/pinot:0.9.3
    ports:
      - "8099:8099"
      - "8888"
    environment:
      JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"

  pinot-server:
    image: apachepinot/pinot:0.9.3
    ports:
      - "8098:8098"
      - "8888"
    environment:
      JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"

The above are the four basic services needed to construct a Pinot, most of the same as the official document are omitted and only the parts I modified are listed. By the way, I have changed the image version to the new one, but let's keep the official version in the example.

The main changes are to the ports and environment of the three Pinot services, in order to make Pinot's metrics available by Prometheus.

Therefore, ports opens an additional 8888 and adds an additional javaagent parameter to JAVA_OPTS. The purpose of the javaagent is to enable the original Pinot JMX metrics to be web-based and exposed to Prometheus.

So far, the jmx_prometheus_javaagent.jar in the official document does not include the version number, but I could not find the corresponding file from the official container, so I have to use the jmx_prometheus_javaagent-0.12.0.jar instead.

Monitoring

In order to observe the state of the services, we also need to build Prometheus and Grafana.

  prometheus:
    image: prom/prometheus
    container_name: monitoring-prometheus
    restart: unless-stopped
    volumes:
    - monitoring-prometheus-data-1:/prometheus
    - ./volumes/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
    - "9090:9090"
  grafana:
    image: grafana/grafana
    volumes:
    - ./volumes/grafana/provisioning:/etc/grafana/provisioning
    - ./volumes/grafana/dashboards:/var/lib/grafana/dashboards
    ports:
    - "3000:3000"
    environment:
      GF_SERVER_ROOT_URL: https://localhost:3000
      GF_SECURITY_ADMIN_PASSWORD: password

In volumes we define the required configuration files for these two services.

Prometheus needs to know where to fetch the metrics from, so it needs to set the FQDNs of the three Pinot services.
Grafana requires data source settings and a Pinot-specific dashboard.

After running docker-compose, we can view the dashboard from https://localhost:3000. The account and password are simply admin and passowrd.

Apache Presto

The last service is Presto.

Why do we need Presto? Because Pinot supports SQL, but in fact Pinot does not support JOIN. If there is a need to merge two tables, it must be provided by an external SQL engine.

Presto is a SQL engine that supports ANSI SQL and multiple data sources, including Pinot of course, so I chose to use Presto in my experiment.

  presto-coordinator:
    image: apachepinot/pinot-presto
    restart: unless-stopped
    ports:
    - "18080:8080"

Presto has two roles, coordinator and worker. In the experimental environment, we simply use the built-in worker of the coordinator.

The official container apachepinot/pinot-presto is the coordinator by default without any settings, and already uses the FQDN pinot-controller:9000, so there is no need to change anything, we can use it directly.

Conclusion

After cloning the Github repository, we can run docker-compose up -d directly to get the experimental environment up and running well.

The whole experimental environment contains three key points.

Apache Pinot: the streaming database itself.
Prometheus and Grafana: the monitoring system that emulates the production environment.
Apache Presto: a query capability that is not available in Pinot.

By the way, the official document does not introduce how to use Kafka with security, only Kafka without security as an example.

In the case of Confluent, Kafka has built-in SASL_SSL, I refer to this article to set up Kafka's account and password.
https://dev.startree.ai/docs/pinot/recipes/kafka-sasl

This experimental environment may be expanded for some of my needs, such as adding Presto workers.

DEV Community

Building Apache Pinot and Presto

Apache Pinot

Monitoring

Apache Presto

Conclusion

Top comments (0)

Read next

Top Open Source Communities you should not miss out in 2025🔥

How to Make a Retro 2D JavaScript Game Part 2

HubSpot offers a powerful platform for creating user interfaces (UI) and serverless functions using React, allowing users to develop highly customizable pages and forms directly within the CRM itself.

How the new concepts of JSSugar and JS0 are able to slow down websites