Recently, we have been surveying some streaming database solutions and the primary target is Apache Pinot, which fits our needs from the description and is therefore the primary target.
Apache Pinot, a real-time distributed OLAP datastore, purpose-built for low-latency high throughput analytics, perfect for user-facing analytical workloads.
However, some of the official documents are actually missing, so I found some information on the Internet and finally was able to do some experiments. These processes were not documented at that time, so I am not sure what reference materials were actually used.
Currently, the entire experimental infrastructure is built with
docker-compose, and the complete codes in Github repository is as follows.
To explain the core components first, let's take a look at
services: zookeeper: image: zookeeper:3.5.6 pinot-controller: image: apachepinot/pinot:0.9.3 ports: - "9000:9000" - "8888" environment: JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log" pinot-broker: image: apachepinot/pinot:0.9.3 ports: - "8099:8099" - "8888" environment: JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log" pinot-server: image: apachepinot/pinot:0.9.3 ports: - "8098:8098" - "8888" environment: JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
The above are the four basic services needed to construct a Pinot, most of the same as the official document are omitted and only the parts I modified are listed. By the way, I have changed the image version to the new one, but let's keep the official version in the example.
The main changes are to the
environment of the three Pinot services, in order to make Pinot's metrics available by Prometheus.
ports opens an additional
8888 and adds an additional
javaagent parameter to
JAVA_OPTS. The purpose of the
javaagent is to enable the original Pinot JMX metrics to be web-based and exposed to Prometheus.
So far, the
jmx_prometheus_javaagent.jar in the official document does not include the version number, but I could not find the corresponding file from the official container, so I have to use the
In order to observe the state of the services, we also need to build
prometheus: image: prom/prometheus container_name: monitoring-prometheus restart: unless-stopped volumes: - monitoring-prometheus-data-1:/prometheus - ./volumes/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" grafana: image: grafana/grafana volumes: - ./volumes/grafana/provisioning:/etc/grafana/provisioning - ./volumes/grafana/dashboards:/var/lib/grafana/dashboards ports: - "3000:3000" environment: GF_SERVER_ROOT_URL: https://localhost:3000 GF_SECURITY_ADMIN_PASSWORD: password
volumes we define the required configuration files for these two services.
- Prometheus needs to know where to fetch the metrics from, so it needs to set the FQDNs of the three Pinot services.
- Grafana requires data source settings and a Pinot-specific dashboard.
docker-compose, we can view the dashboard from
https://localhost:3000. The account and password are simply
The last service is Presto.
Why do we need Presto? Because Pinot supports SQL, but in fact Pinot does not support
JOIN. If there is a need to merge two tables, it must be provided by an external SQL engine.
Presto is a SQL engine that supports ANSI SQL and multiple data sources, including Pinot of course, so I chose to use Presto in my experiment.
presto-coordinator: image: apachepinot/pinot-presto restart: unless-stopped ports: - "18080:8080"
Presto has two roles, coordinator and worker. In the experimental environment, we simply use the built-in worker of the coordinator.
The official container
apachepinot/pinot-presto is the coordinator by default without any settings, and already uses the FQDN
pinot-controller:9000, so there is no need to change anything, we can use it directly.
After cloning the Github repository, we can run
docker-compose up -d directly to get the experimental environment up and running well.
The whole experimental environment contains three key points.
- Apache Pinot: the streaming database itself.
- Prometheus and Grafana: the monitoring system that emulates the production environment.
- Apache Presto: a query capability that is not available in Pinot.
By the way, the official document does not introduce how to use Kafka with security, only Kafka without security as an example.
In the case of Confluent, Kafka has built-in
SASL_SSL, I refer to this article to set up Kafka's account and password.
This experimental environment may be expanded for some of my needs, such as adding Presto workers.
Top comments (0)