Beter then NDO?

MK Livestatus

Required version: 1.1.0
February 02. 2010

How to access Nagios status data

Accessing status data today

The classical way of accessing the current status of your hosts and services is by reading and parsing the file status.dat, which is created by Nagios on a regular basis. The update interval is configured via status_update_interval in nagios.cfg. A typical value is 10 seconds. If your installation is getting larger, you might have to increase this value in order to minimize CPU usage and disk IO. The nagios web interface uses status.dat for displaying its data.
Parsing status.dat is not very popular amongst developers of addons. So many use another approach: NDO. This is a NEB module that is loaded directly into the Nagios process and sends out all status updates via a UNIX socket to a helper process. That creates SQL statements and updates various tables in a MySQL or PostgreSQL database. This approach has several advantages over status.dat:

  • The data is updated immediatley, not only every 10 or 20 seconds.
  • Applications have easy access to the data via SQL. No parser for status.dat is needed.
  • In large installations the access for the addons to the data is faster then reading status.dat.

Unfortunately, however, NDO has also some severe shortcomings:

  • It has a complex setup.
  • It needs a (rapidly growing) database to be administered.
  • It eats up a significant portion of your CPU ressources, just in order to keep the database up-todate.
  • Regular housekeeping of the database can hang you Nagios for minutes or even an hour once day.

The Future

Since version 1.1.0, Check_MK offers a completely new approach for accessing status and also historic data: Livestatus. Just as NDO, Livestatus make use of the Nagios Event Broker API and loads a binary module into your Nagios process. But other then NDO, Livestatus does not actively write out data. Instead, it opens a socket by which data can be retrieved on demand.
The socket allows you to send a request for hosts, services or other pieces of data and get an immediate answer. The data is directly read from Nagios’ internal data structures. Livestatus does not create its own copy of that data. Beginning from version 1.1.2 you are also be able retrieve historic data from the Nagios log files via Livestatus.
This is not only a stunningly simple approach, but also an extremely fast one. Some advantages are:

  • Other then NDO, using Livestatus imposes no measurable burden on your CPU at all. Just when processing queries a very small amount of CPU is needed. But that will not even block Nagios.
  • Livestatus produces zero disk IO when quering status data.
  • Accessing the data is much faster then parsing status.dat or querying an SQL database.
  • No configuration is needed, No database is needed. No administration is neccessary.
  • Livestatus scales fairly well to large installations, even beyond 50.000 services.
  • Livestatus gives you access to Nagios-specific data not available any other available status access method – for example the information wether a host is currently in its notification period.

On the same time, Livestatus provides its own query language that is simple to understand, offers most of the flexibility of SQL and even more in some cases. It’s protocol is fast, light-weight and does not need a binary client. You can even get access from the shell without any helper software.

Livestatus via xinetd

Using xinetd and unixcat you can bind the socket of Livestatus to a TCP socket. Here is an example configuration for xinetd:
I have not tested it yet, but It sounds like pretty amazing alternative solution:
http://mathias-kettner.de/checkmk_livestatus.html

Tags: , , ,

1 thought on “Nagios MK livestatus

  1. Olá felipe,
    Vi hoje seu post sobre o livestatus.
    Estou começando a usá-lo em substituição ao NDO e está funcionando bem.
    Administro um NOC em uma autarquia federal. Atualmente tenho 16 servidores nagios espalhados em 14 estados, com 934 hosts e mais de 2500 serviços. Uso o nagvis para representá-los em um mapa central e para alimentá-lo estava usando um banco de dados sql. Comecei a ter o problema do crescimento do banco, da dificuldade de gerenciamento, grande uso de CPU e memória no servidor SQL. Outro problema era que o mapa central estava ficando cada vez mais pesado para ser atualizado devido a grande quantidade de queries que tem de ser feitas para atualizar o status geral. Após ver no novo NAGVIS a opção de backend LIVESTATUS vi nele a solução para esse problema. Por enquanto estou usando o livestatus em algums servidores para comparar o desempenho em relação ao NDO. Deixo meu contato para conversarmos mais sobre o assunto e podermos compartilhar novas idéias sobre isso.
    abraços

Leave a Reply

Your email address will not be published. Required fields are marked *