Monitoring Apache Solr

Apache Solr is an open source enterprise search service from the Lucene project. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.Like any service or component in your architecture, you’ll want to monitor it to ensure that it’s available and gather performance data to help with tuning.

In this post, we’ll look at how we can monitor Solr, what performance metrics we might want to gather and how we can easily achieve this with Opsview.

We’ll use Opsview as it is built on Nagios and thus has access to a wide range of plugins, yet provides a more approachable user interface for configuring service checks.

A check list for service checks

Solr is built on Lucene so follows the same layout, an index contains documents that are comprised of fields. As part of the search service value add over Lucene, Solr provides a number of useful ways of obtaining health status / monitoring metrics:

  • Health-check status using the /admin/ping handler
  • The admin statistics page /admin/stats.jsp (XML styled with XSL)
  • JMX MBeans
  • The list of applicable checks could be defined by whether it is a health check or a data gathering check – but we’d end up with a lot of overlap. Instead I’ve divided the list into the checks that can be performed remotely (without an installed agent on the server) and those that are best performed locally to the Solr server.

    Remote (agent-less) checks

    What should we look for over the network?

    Firstly we can have a host-level check which may perform a network level ping.Next we can check TCP connectivity to the servlet container port and then make an HTTP GET request to the Solr ‘front page’ and check for a known string (e.g. Welcome to Solr).

    Now we’ve made it up to the application layer so can start to perform Solr specific checks. Items to monitor may include (delete as applicable):

  • Ping status
  • Number of docs
  • Number of queries / queries per second
  • Average response time
  • Number of updates
  • Cache hit ratios
  • Replication status
  • Synthetic queries
  • Agent-based checks

    Installing an Opsview agent on the Solr server means we can run additional checks over NRPE (Nagios Remote Plugin Executor). This could be operating system level checks such as memory/disk utilisation or CPU load, or the following:

  • Java servlet container process is running
  • JMX checks e.g. heap memory or custom MBeans
  • File age
  • Log parsing for exceptions
  • For more detail on some of the non-Solr specific checks, see my previous post on monitoring Grails (though broadly applicable to any JVM application).

    The Solr wiki describes how to configure JMX support: ... Read more