Monitoring Solr (3 Part Series)
Open source software adoption continues to grow. Tools like Kafka and Solr are widely used in small startups, ones that are using cloud ready tools from the start, but also in large enterprises, where legacy software is getting faster by incorporating new tools. In this second part of our Solr monitoring series (see the first part discussing Solr metrics to monitor), we will explore some of the open source tools available to monitor Solr nodes and clusters. We'll take the opportunity to look into what it takes to install, configure and use each tool in a meaningful way.
Operating, managing and maintaining distributed systems is not easy. As we explored in the first part of our monitoring Solr series there are more than forty metrics that we need to have full visibility into our Solr instances and the full cluster. Without any kind of monitoring tool, it is close to impossible to have a full view over all the needed pieces to be sure that the cluster is healthy or to react properly when things are not going the right way.
When searching for an open source tool to help you track Solr metrics, look at the following qualities:
- The ability to monitor and manage multiple clusters
- An easy, at-glance overview of the whole cluster and its state
- Clear information about the crucial performance metrics
- Ability to provide historical metrics for post mortem analysis
- Combines low-level OS metrics, JVM metrics, and Solr specific metrics
- Ability to set up alerts
Let's now explore some of the available options.
Prometheus is an open-source monitoring and alerting system that was originally developed at SoundCloud. Right now it is a standalone open source project and it is maintained independently from the company that created it initially. Prometheus project, in 2016, joined the Cloud Native Computing Foundation as the second hosted project, right after Kubernetes.
Out of the box Prometheus supports flexible query language on top of the multi-dimensional data model based on TSDB where the data can be pulled using the HTTP based protocol:
For Solr to be able to ship metrics to Prometheus we will use a tool called Exporter. It takes the metrics from Solr and translates them into a format that is understandable by Prometheus itself. The Solr Exporter is not only able to ship metrics to Prometheus, but also responses for requests like Collections API commands, ping requests and facets gathered from search results.
The Prometheus Solr Exporter is shipped with Solr as a contrib module located in the contrib/prometheus-exporter directory. To start working with it we need to take the solr-exporter.xml file that is located in the contrib/prometheus-exporter/conf directory. It is already pre-configured to work with Solr and we will not modify it now. However, if you are interested in additional metrics, ship additional facet results or send fewer data to Prometheus you should look and modify the mentioned file.
Once we have the exporter configured we need to start it. It is very simple. Just go to the contrib/prometheus-exporter directory (or the one where you copied it in your production system) and run appropriate command, depending on the architecture of Solr you are running.
For Solr master-slave deployments you would run:
./bin/solr-exporter -p 9854 -b http://localhost:8983/solr -f ./conf/solr-exporter-config.xml -n 8
For SolrCloud you would run:
./bin/solr-exporter -p 9854 -z localhost:2181/solr -f ./conf/solr-exporter-config.xml -n 16
The above command runs Solr exporter on the 9854 port with 8 threads for Solr master-slave and 16 for SolrCloud. In case of SolrCloud we are also pointing exporter to the Zookeeper ensemble that is accessible on port 2181 on the localhost. Of course, you should adjust the commands to match your environment.
After the command was successfully run you should see the following:
INFO - 2019-04-29 16:36:21.476; org.apache.solr.prometheus.exporter.SolrExporter; Start server
We have Solr master-slave/SolrCloud running and we have our Solr Exporter running, this means we are ready to take the next step and configure our Prometheus instance to fetch data from our Solr Exporter. To do that we need to adjust the prometheus.yml file and add the following:
scrape_configs: - job_name: 'solr' static_configs: - targets: ['localhost:9854']
Of course, in the production system, our Prometheus will run on a different host compared to our Solr and Solr Exporter - we can even run multiple exporters. That means that we will need to adjust the targets property to match our environment.
After all the preparations we can finally look into what Prometheus gives us. We can start with the main Prometheus UI.
It allows for choosing the metrics that we are interested in, graph it, alert on it and so on. The beautiful thing about it is that the UI support the full Prometheus Query Language allowing the use of operators, functions, subqueries and many, many more.
When using the visualization functionality of Prometheus we get the full view of the available metrics by using a simple dropdown menu, so we don't need to be aware of each and every metric that is shipped to Solr.
The nice thing about Prometheus is that we are not limited to the default UI, but we can also use Grafana for dashboarding, alerting and team management. Defining the new, Prometheus data source is very, very simple:
Once that is done we can start visualizing the data:
However, all of that requires us to build rich dashboards ourselves. Luckily Solr comes with an example pre-built Grafana dashboard that can be used along with the metrics scrapped to Prometheus. The example dashboard definition is stored in the contrib/prometheus-exporter/conf/grafana-solr-dashboard.json file and can be loaded to Grafana giving a basic view over our Solr cluster.
Of dashboards with metrics is not everything that Grafana is capable of. We are able to set up teams, users, assign roles to them, set up alerts on the metrics and include multiple data sources within a single installation of Grafana. This allows us to have everything in one place - metrics from multiple sources, logs, signals, tracing and whatever we need and can think of.
Graphite is a free open-sourced monitoring software that can monitor and graph numeric time-series data. It can collect, store and display data in a real-time manner allowing for fine-grained metrics monitoring. It is composed of three main parts - Carbon, the daemon listening for time-series data, Whisper - database for storing time-series data and the Graphite web-app that is used for on-demand metrics rendering.
To start monitoring Solr with Graphite as the platform of choice we assume that you already have Graphite up and running, but if you don't we can start by using the provided Docker container:
docker run -d --name graphite --restart=always -p 80:80 -p 2003-2004:2003-2004 -p 2023-2024:2023-2024 -p 8125:8125/udp -p 8126:8126 graphiteapp/graphite-statsd
To be able to get the data from Solr we will use Solr metrics registry along with the Graphite reporter. To configure that we need to adjust the solr.xml file and add the metrics part to it. For example, to monitor information about the JVM and the Solr node the metrics section would look as follows:
<metrics> <reporter name="graphite" group="node, jvm" class="org.apache.solr.metrics.reporters.SolrGraphiteReporter"> <str name="host">localhost</str> <int name="port">2003</int> <int name="period">60</int> </reporter> </metrics>
So we pointed Solr to the Graphite server that is running on the localhost on the port 2003 and we defined the period of data writing to 60, which means that Solr will push the JVM and Solr node metrics once every 60 seconds.
Keep in mind that by default Solr will write by using the plain text protocol. This is less efficient than using the pickled protocol. If you would like to configure Solr and Graphite in production we suggest using setting the pickled property to true in the reporter configuration and using the port for the pickled protocol, which in case of our Docker container would be 2004.
We can now easily navigate to our Graphite server, available at 127.0.0.1 on port 80 with our container and graph our data:
All the metrics are sorted out and easily accessible in the left menu allowing for rich dashboarding capabilities.
If you are using Grafana it is easy to setup Graphite as yet another data source and uses its graphing and dashboarding capabilities to correlate multiple metrics together, even ones that are coming from different data sources.
Next, we need to configure Graphite as the data source. It is as easy as providing the proper Graphite URL and setting the version:
And we are ready to create our visualizations and dashboards, which is very easy and powerful. With the autocomplete available for metrics we don't need to recall any of the names and Grafana will just show them for us. An example of a single metric dashboard can looks as follows:
Ganglia is a scalable distributed monitoring system. It is based on a hierarchical design targeted for a large number of clusters and nodes. It is using XML for data representation, XDR for data transport and RRD for data storage and visualization. It has been used to connect clusters across university campuses and is proven to handle clusters with 2000 nodes.
To start monitoring Solr master-slave or SolrCloud clusters with Ganglia we will start with setting up the metrics reporter in the solr.xml configuration file. To do that we add the following section to the mentioned file:
<metrics> <reporter name="ganglia" group="node, jvm" class="org.apache.solr.metrics.reporters.SolrGangliaReporter"> <str name="host">localhost</str> <int name="port">8649</int> </reporter> </metrics>
Next thing that we need to do is allow Solr to understand the XDR protocol used for data transport. We need to download the oncrpc-1.0.7.jar jar file and place it either in your Solr classpath or include the path to it in your solrconfig.xml file using the lib directive.
One all of the above is done and assuming our Ganglia is running on localhost on port 8649 that is everything that we need to do to have everything ready and start shipping Solr nodes and JVM metrics.
By visiting Ganglia and choosing the Solr node we can start looking into the metrics:
We can jump to the graphs right away, choose which group of metrics interested in and basically see most of the data that we are interested in right away.
Ganglia provides us with all the visibility for our metrics, but out of the box, it doesn't support one of the crucial features that we are looking for - alerting. There is a project called ganglia-alert, which is a user contributed extension to Ganglia.
As you can see there is a wide variety of tools that help you monitor Solr. What you have to keep in mind is that each requires setting up, configuration and manual dashboard building in order to get meaningful information. All of that may require deep knowledge across the whole ecosystem.
If you are looking for a Solr monitoring tool that you can set up in minutes and have pre-built dashboards with all necessary information, alerting and team management take a look at the third part of the Solr monitoring series to learn more about production ready Solr monitoring with Sematext.