Another nagios/bash script that verifies if a cluster has performed a failover. This one is used for any microsoft cluster.
#!/bin/bash # 2008-07-23 Nereu # edit: Felipe Ferreira 01-2010 TMPDIR=/usr/local/nagios/var PLGDIR=/usr/local/nagios/libexec OK=0 WARN=1 CRIT=2 UNKN=3 if [ "$#" -ne 3 ] then echo "$0 Service cluster_node1 cluster_node2" exit $UNKN else SERVICE=$1 NODE1=$2 NODE2=$3 fi FILENAME=$(echo $SERVICE|tr -d $) LAST_FILE=${TMPDIR}/.${FILENAME} if [ -e $LAST_FILE ] then LAST=`cat $LAST_FILE` else LAST="" fi RESULT=`$PLGDIR/check_nt -H ${NODE1} -v SERVICESTATE -l ${SERVICE}|grep running` if [ -z "${RESULT}" ]; then ### ADDED THIS SO IT WILL NOT FAIL IN CASE NO RETURN FROM CHECK_NT if [ "$RESULT" = "All services are running" ] then CURRENT=$NODE1 else CURRENT=$NODE2 fi fi echo $CURRENT >$LAST_FILE if [ "$LAST" == "$CURRENT" -o "$LAST" != "" ] then echo "Service in ${CURRENT}" exit $OK else echo "Service change from ${LAST} to ${CURRENT}" exit $CRIT fi
Hello,
Look like a nice script, but it won’t catch cluster resources failing over, failing again and coming back online on the same node. Or am I wrong? I really need some script that detects failing cluster resources, even when they don’t failover.
Hello,
I have had some trouble with this getting the correct return values in case that the cluster is not working at all and also with using nscp instead of using the standard check_nt.
I have further added a validation check if the cluster is running on the second node in case the the first one is broken. If this is for some reason broken as well this will be handled now.
#!/bin/bash
# 2008-07-23 Nereu
# edit: Felipe Ferreira 01-2010
# edit: Martin Mahnert 2014-02-12
TMPDIR=/usr/local/nagios
# PLGDIR=/usr/local/nagios/libexec
PLGDIR=/usr/lib/nagios/plugins
OK=0
WARN=1
CRIT=2
UNKN=3
if [ “$#” -ne 3 ]
then
echo “$0 Service cluster_node1 cluster_node2″
exit $UNKN
else
SERVICE=$1
NODE1=$2
NODE2=$3
fi
FILENAME=$(echo $SERVICE|tr -d $)
LAST_FILE=${TMPDIR}/.${FILENAME}
if [ -e $LAST_FILE ]
then
LAST=`cat $LAST_FILE`
else
LAST=””
fi
ACCESS=`grep 12489 /etc/nagios-plugins/config/nt.cfg | awk ‘{print $9}’`
PASS=`echo ${ACCESS//\”/}`
#RESULT1=`${PLGDIR}/check_nt -H ${NODE1} -v SERVICESTATE -l ${SERVICE} | grep -c OK`
RESULT1=`${PLGDIR}/check_nt -H ${NODE1} -u -p 12489 -s ${PASS} -v SERVICESTATE -l ${SERVICE} | grep -c OK`
if [ -n “${RESULT1}” ]; then ### ADDED THIS SO IT WILL NOT FAIL IN CASE NO RETURN FROM CHECK_NT
if [ “${RESULT1}” = “1” ]
then
CURRENT=$NODE1
elif [ “${RESULT1}” = “0” ]
then
RESULT2=`${PLGDIR}/check_nt -H ${NODE2} -u -p 12489 -s ${PASS} -v SERVICESTATE -l ${SERVICE} | grep -c OK`
if [ -n “${RESULT2}” ]; then ### ADDED THIS SO IT WILL NOT FAIL IN CASE NO RETURN FROM CHECK_NT
if [ “${RESULT2}” = “1” ]
then
CURRENT=$NODE2
elif [ “${RESULT2}” = “0” ]
then
CURRENT=”Not active.”
fi
fi
fi
fi
echo $CURRENT >$LAST_FILE
if [ “$LAST” == “$CURRENT” -o “$LAST” != “” ] && [ “$CURRENT” != “Not active.” ]
then
echo “Service active on: ${CURRENT}”
echo “OK”
exit $OK
else
if [ “$CURRENT” == “Not active.” ]
then
echo “Service active on: ${CURRENT}”
echo “CRIT”
exit $CRIT
else
echo “Service change from ${LAST} to ${CURRENT}”
echo “CRIT”
exit $CRIT
fi
fi
Cool Martin. Good work.