Varnish is an amazing reverse proxy product and it does its
job very well, but how can we know that? Or even better
how can we know it detected problems in the backed?
Currently I monitor those server using
hit/ratio – official varnish nagios plugin witch is very basic, just returns me the hit/ratio witch is usefull but it does not output perfdata so no graphs for that.
conections – simple bash script that checks how many connections
are established. Simple like that:
result=`netstat -anp | grep :81 | grep -c EST`
echo “Conexoes Totais = $result |Conexoes=$result”
Here is the script
Load/Disk/Memory – Standard Nagios/Centreon scripts
Errors Backend – this one was a bit more complicated to develop and I find it very usefull, because it checks how many erros it got from the backend (like 503 etc…) I do that every 5 min, the script genertates a tmp file and then each time it runs it zero the tmp and restart loging the “Service Unavailable” erros thrown by “varnishlog -I Unavailable”
Download check_iis script
Resposta Backend – It checks how well is the backend answering to requests. The time it is taking to reply and also creates a graph. For it to work varnish must be using the probe option off the backend, it is based on the command.
# varnishadm debug.health |grep Sick
Download check_bend script
Links – This is a quite amazing script written in Perl by my coworker Nereu and it checks each link in a dynamic way (kind of a spider) verifying each HTTP response code, if any 404 or 500 are detects it returns a error, it also counts how many links there are and graphs how long it takes to go thru all of them.
Download check_url script
This is how my centreon/nagios Varnish monitor looks like: