Nagios / Centreon Distributed Monitoring – Clustering
Document referral to Centreon Wiki
There are many ways nagios can be used to load the balance, the basic architeture
is Master/Slave with just nagios is reommeneded to use NSCA daemon witch is developed by Ethan
(the creator of Nagios).The way I describe does not use NCSA but it uses centcore to keep both nagios and .cfgs updated.Distributed architecture or load balancing in nagios is based on central monitoring server (Master) and one or several Sattelite monitors(Remote).
The master server consolidates all monitoring data and offers a user interface which also offers the possibility to monitor and manage the master server and the remote monitors. The remote monitors send their check results to the master server, all is based on NDO to keep the current status updated. Also the centcore script is crucial to keep both servers configuration .cfg aligned and updated.
This type of setup permits distribution of checks – for any type of reason remote locations, or just because you have too many checks for one server to handle. Exemple: we could have all Network checks using Poller1 and all System checks using Poller2, same for remote sites etc…
In practice, centcore takes care of the data transfers between the different servers. The master server has to be equipped with a complete monitoring installation (Nagios, Centreon, NDOutils, MySQL, etc.), in contrast with the remote monitors that only have Nagios and NDOutils installed.
Contents
1 Setting up key authentication using SSH
2 Centreon configuration
3 Plugin duplication
4 SUDO configuration
5 Finalization
6 Some remarks/tips/advice
7 Troubleshooting
NOTE:
This setup was tested with Nagios 3.0.6 Centreon 2.0.2 and NDOUtil 1.4.7b under Debian4 and Ubuntu8.04. On the remote is enough to have just Nagios and NDO setup, or you can also have another centreon but no real need for it.
1. Setting up key authentication using SSH
On the master server generate a key pair using ssh-keygen. Accept all defaults. Set the password blank.
# su nagios
# ssh-keygen
> Enter file in which to save the key (/usr/local/nagios/.ssh/id_rsa):
> Created directory ‘/usr/local/nagios/.ssh’.
> Enter passphrase (empty for no passphrase):
> Enter same passphrase again:
> Your identification has been saved in /usr/local/nagios/.ssh/id_rsa.
Transfer the public key to the remote monitor for the Nagios daemon owner. (Replace {IP_ADDRESS} with the IP address of the remote monitor.)
# ssh-copy-id -i ~/.ssh/id_rsa.pub nagios@<hostname>
Should get an answer like:
Now try logging into the machine, with “ssh ‘nagios@<hostname>'”, and check in: .ssh/authorized_keys
It is posible that to work you have to manually allow nagios to SSH and to create the .ssh folder plus the authorized_keys file.
If these steps are succesfully completed, you should be able to log on to the remote monitor via SSH without entering a password. Test the ssh by doing
$ ssh nagios@remoteserver ls
If you get the output without being asked, be sure to use the nagios account to try that, also to start and stop centcore service.
An alernative way to do this is here
2 Centreon configuration
Connect to the Master Centreon interface and configure the remote monitor.
You must configure 4 things:
pollers : configure a poller for each server (2)
Configuration > Centreon > Pollers > Add
(Status:enabled, Localhost: no, IP address, etc.)
Sattelite Name : give a new name, UNIQUE
Status : enabled
localhost : no
ip address : ip address of the remote server
Nagios Init Script : /etc/init.d/nagios
nagios Binary : /usr/local/nagios/bin/nagios
nagiostats Binary : /usr/local/nagios/bin/nagiostats
ndomod : configure a file for each remote server and master (2)
Next, duplicate the ndomod configuration for the new poller.
Configuration > Centreon > ndomod.cfg.
Select action “Duplicate”.
(Status: enabled, Requester: the name of the freshly created poller, IP address: the IP address of the master server, Instance name: must be unique)
Description : UNIQUE
Status : enabled
Requester : Select the good poller
Instance Name : Must be UNIQUE
Interface type : tcp socket
output : ip adress of the master
TCP Port : 5668
ndomod.cfg (remote)
instance_name=remote
output_type=tcpsocket
output=192.168.3.137
tcp_port=5668
output_buffer_items=5000
buffer_file=/usr/local/nagios/var/ndomod2.tmp
file_rotation_interval=14400
file_rotation_timeout=60
reconnect_interval=15
reconnect_warning_interval=900
data_processing_options=-1
config_output_options=3
ndo2db : configure a file for each remote server and master too (2)
Next, duplicate the ndo2db configuration for the new poller.
Configuration > Centreon > ndo2db.cfg.
Description : remote <UNIQUE NAME>
Status : enabled
Requester : select the good poller
Socket Type : tcp
Socket Name :
TCP Port : tcp
GENERAL:
Select action “Duplicate”.
Status: enabled
Requester: the name of the new Remote poller
SocketType: tcp
SocketName: no need since its TCP
TCPPort: 5668
DATABASE:
Database Type: MySQL
DatabaseHoster:<IP of MASTER>
Databasename:nagios
Listening Port: 3306
Prefix: nagios_
User:nagios
Password: pass
ndo2db.cfg (remote)
ndo2db_user=nagios
ndo2db_group=nagios
socket_type=tcp
socket_name=/var/run/centreon/ndo.sock
tcp_port=5668
db_servertype=mysql
db_host=<IP of MASTER>
db_name=nagios
db_port=3306
db_prefix=nagios_
db_user=nagios
db_pass=passwordnagiosuser
max_timedevents_age=1440
max_systemcommands_age=1440
max_servicechecks_age=1440
max_hostchecks_age=1440
max_eventhandlers_age=1440
nagios.cfg : configure a file for each server(2)
Next, also duplicate the nagios configuration for the new poller.
Master Centreon configurationConfiguration > Nagios > nagios.cfg
Select action “Duplicate”.
(Status: enabled, Server Nagios configured: the name of the freshly created poller)
Status : enabled
Server Nagios configured : select the good poller
3.Plugin & CFG duplication
Copy all plugins from the master server to the remote monitor:
# scp /usr/local/nagios/libexec/* nagios@{IP_ADDRESS}:/usr/local/nagios/libexec/
# scp /usr/local/nagios/etc/* nagios@{IP_ADDRESS}:/usr/local/nagios/etc/
Go into each server and check the /usr/local/nagios/etc if are the same
4. SUDO configuration
In order to allow the master server to manage the Nagios daemon on the remote monitor, sudo has to be configured. Edit /etc/sudoers and add the following lines:
nagios ALL=NOPASSWD: /etc/init.d/nagios restart
nagios ALL=NOPASSWD: /etc/init.d/nagios stop
nagios ALL=NOPASSWD: /etc/init.d/nagios start
nagios ALL=NOPASSWD: /etc/init.d/nagios reload
nagios ALL=NOPASSWD: /usr/sbin/nagiostats
nagios ALL=NOPASSWD: /usr/sbin/nagios *
nagios ALL = NOPASSWD: /usr/local/nagios/bin/ndo2db-3x *
The use of “visudo” command is preferd to edit /etc/sudoers file.
5.Finalization
Make sure centcore is running on the master server only, under nagios user. If it is not running, start it:
# /etc/init.d/centcore start
Configure a host with the remote new poller, and restart both nagios and watch if it works.
6.Some remarks/tips/advice
- Remote pollers only are supported from version Centreon 2 beta 5.
- Nagios 3x is required
- NDOutils do not give a lot information on why things just won’t work so make sure NDOutils are compiled with mysql support – review the config.log carefully. If NDO2DB is working, you should see a mysql session for the configured user on the configured database.
- The procedures to restart, reload, … nagios as well as the transfer of configs to remote pollers are called via a command file (/var/lib/centreon/centcore.cmd). Make sure the both the Apache and Centcore owner can create and modify this command file.
7.Troubleshooting
If you get it working in the first shoot, congratulations for me it took a whole day, but it was well worthed!
ALLOW MYSQL REMOTE ACCESS:
NDO cannot get to MySQL Database we will need to setup MySQL to listen in the correct IP, by default MySQL listens only in 127.0.0.1 so no inbound connection is accepted.
MySQL5 – connect from Remote.
Login via SSH Edit /etc/mysql/my.cnf with bind-address = 192.168.X.X (localhost IP)
# vim /etc/mysql/my.cnf
# mysql -u root -p mysql
use centreon
GRANT ALL ON centreon.* TO nagios@’192.168.X.X’ IDENTIFIED BY ‘PASSWORD’;
or Just use phpmyadmin to set permissions.
CENTCORE PROBLEMS
Check the centcore log file:
# tail /usr/local/centreon/log/centcore.log
By default the debug is disable I recommend turning it on by changing the file:
# vim /usr/local/centreon/bin/centcore
Find all debug=0 and set to debug=1 restart centcore(using nagios account) and
# tail -f /usr/local/centreon/log/centcore.log
At first my centcore script was giving me errors like:“Try `mv –help’ for more information.
Use of uninitialized value in concatenation (.) or string at /usr/local/centreon/bin/centcore line 293.”
I now know it was because the script does a query in the DB nagios and looks for the location of files, mine was missing.
Go into both nagios/etc servers and verifiy manually that they are correct, important ones: nagios.cfg ndo2db.cfg ndomod.cfg
NDO PROBLEMS
First check if the daemon is running
# ps ax | grep ndo
26032 ? SNs 0:00 /usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
Check if the service is listening on port 5668
# netstat -an | grep :5668
tcp 0 0 0.0.0.0:5668 0.0.0.0:* LISTEN
Command to startup ndo2db:
# /usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
Look in the nagios.log to check if the broker (aka ndo) started successfully, like:
ndomod: NDOMOD 1.4b7 (10-31-2007) Copyright (c) 2005-2007 Ethan Galstad (nagios@nagios.org)
[1249550302] ndomod: Successfully connected to data sink. 1325 queued items to flush.
[1249550304] ndomod: Successfully flushed 1325 queued items to data sink.
[1249550304] Event broker module ‘/usr/local/nagios/bin/ndomod-3x.o’ initialized successfully.
NDO is critical to be working correctly, I monitor it every 5 minutes using this very cool plugin by
download
Links:
Please Help me out with a simple Click:
[ad]
Thanks
Hi Felipe,
I have been told by AkHeNaToN that we do not have to duplicate ndo2db so if you could correct your tuto for next people please =)
Thanks for the tuto anyway !
Sismon
Nice post.
It’s wonderful know that non french people are helpin
Hi Felipe,
I found a hint to your blog in #centreon as I didn’t know it before. It is an excellent resource and I would like to invite you to participate in making the centreon wiki more complete. Would you like to update the wiki with the additions and changes you posted here or allow me to re-use your posts in the wiki? Credits go where credits are due – of course 🙂
Thanks for sharing your knowledge with us.
Nikolaus
Hey Felipe,
I just noticed that this distributed monitoring isn’t that integrated well to nagios. E.G. for the parent-child system. If the a host on main nagios has its parent on the child nagios, it will appear down instead of unreachable. PLease correct me if im wrong
Hi,
Great knowledge base article you have…. 🙂
guy’s, am i missing something here, should a vanilla nagios instance be running on the 2nd polling server and a ndomod process…..