Recently, we have been surveying some streaming database solutions and the primary target is Apache Pinot, which fits our needs from the description and is therefore the primary target.
Apache Pinot, a real-time distributed OLAP datastore, purpose-built for low-latency high throughput analytics, perfect for user-facing analytical workloads.
However, some of the official documents are actually missing, so I found some information on the Internet and finally was able to do some experiments. These processes were not documented at that time, so I am not sure what reference materials were actually used.
Currently, the entire experimental infrastructure is built with docker-compose
, and the complete codes in Github repository is as follows.
https://github.com/wirelessr/pinot-plus-presto
Apache Pinot
To explain the core components first, let's take a look at docker-compose.yml
.
services:
zookeeper:
image: zookeeper:3.5.6
pinot-controller:
image: apachepinot/pinot:0.9.3
ports:
- "9000:9000"
- "8888"
environment:
JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
pinot-broker:
image: apachepinot/pinot:0.9.3
ports:
- "8099:8099"
- "8888"
environment:
JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
pinot-server:
image: apachepinot/pinot:0.9.3
ports:
- "8098:8098"
- "8888"
environment:
JAVA_OPTS: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8888:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
The above are the four basic services needed to construct a Pinot, most of the same as the official document are omitted and only the parts I modified are listed. By the way, I have changed the image version to the new one, but let's keep the official version in the example.
The main changes are to the ports
and environment
of the three Pinot services, in order to make Pinot's metrics available by Prometheus.
Therefore, ports
opens an additional 8888
and adds an additional javaagent
parameter to JAVA_OPTS
. The purpose of the javaagent
is to enable the original Pinot JMX metrics to be web-based and exposed to Prometheus.
So far, the jmx_prometheus_javaagent.jar
in the official document does not include the version number, but I could not find the corresponding file from the official container, so I have to use the jmx_prometheus_javaagent-0.12.0.jar
instead.
Monitoring
In order to observe the state of the services, we also need to build Prometheus
and Grafana
.
prometheus:
image: prom/prometheus
container_name: monitoring-prometheus
restart: unless-stopped
volumes:
- monitoring-prometheus-data-1:/prometheus
- ./volumes/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana
volumes:
- ./volumes/grafana/provisioning:/etc/grafana/provisioning
- ./volumes/grafana/dashboards:/var/lib/grafana/dashboards
ports:
- "3000:3000"
environment:
GF_SERVER_ROOT_URL: https://localhost:3000
GF_SECURITY_ADMIN_PASSWORD: password
In volumes
we define the required configuration files for these two services.
- Prometheus needs to know where to fetch the metrics from, so it needs to set the FQDNs of the three Pinot services.
- Grafana requires data source settings and a Pinot-specific dashboard.
After running docker-compose
, we can view the dashboard from https://localhost:3000
. The account and password are simply admin
and passowrd
.
Apache Presto
The last service is Presto.
Why do we need Presto? Because Pinot supports SQL, but in fact Pinot does not support JOIN
. If there is a need to merge two tables, it must be provided by an external SQL engine.
Presto is a SQL engine that supports ANSI SQL and multiple data sources, including Pinot of course, so I chose to use Presto in my experiment.
presto-coordinator:
image: apachepinot/pinot-presto
restart: unless-stopped
ports:
- "18080:8080"
Presto has two roles, coordinator and worker. In the experimental environment, we simply use the built-in worker of the coordinator.
The official container apachepinot/pinot-presto
is the coordinator by default without any settings, and already uses the FQDN pinot-controller:9000
, so there is no need to change anything, we can use it directly.
Conclusion
After cloning the Github repository, we can run docker-compose up -d
directly to get the experimental environment up and running well.
The whole experimental environment contains three key points.
- Apache Pinot: the streaming database itself.
- Prometheus and Grafana: the monitoring system that emulates the production environment.
- Apache Presto: a query capability that is not available in Pinot.
By the way, the official document does not introduce how to use Kafka with security, only Kafka without security as an example.
In the case of Confluent, Kafka has built-in SASL_SSL
, I refer to this article to set up Kafka's account and password.
https://dev.startree.ai/docs/pinot/recipes/kafka-sasl
This experimental environment may be expanded for some of my needs, such as adding Presto workers.
Top comments (0)