Home > HowTo, Linux, Nagios, plugins, Varnish > check_conntrack_unreplied

check_conntrack_unreplied

March 27th, 2013 Leave a comment Go to comments

Our front end varnish servers expierence a lot of traffic and we currently
must have a iptables NAT that routes from port 80 to port 81, where the varnish daemon listen to traffic. Doing this we end up using the iptables conntrack feature.

Our infrastructure uses DSR, thus we have this iptables NAT running. https://kemptechnologies.com/white-papers/what-is-direct-server-return/

And this is our current bottleneck as the conntrack queue fills up and slow down connections.

I wrote a simple script to detect when this behaviour is about to affect us.
The biggest problem is when the load balancer’s healthcheck starts to fail causing the varnish to be put offline and no traffic is sent untill the healthcheck is OK.

This is a really nice information about the topic:
http://www.iptables.info/en/connection-state.html

Even better explanation:
http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

An interesting command is slabtop

The output is like this:
The check only for all the connections:
# ./check_conntrack_unreplied -w 1000 -c 2000 -t total
OK – Existem 46 conexoes em estado UNREPLIED total | unreplied_total=46
The check only for the load balance traffic
# ./check_conntrack_unreplied -w 1000 -c 2000 -t bl
OK – Existem 0 conexoes em estado UNREPLIED do balanceador | unreplied_bl=0

The script has been tested in RedHat/CentOS 5.x and 6.x

So here is the script:


#!/bin/bash
#Get current UNREPLIED from within the iptables conntrack 
#Version 20
#By Felipe Ferreira October 2012

# Exit codes
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
BALANCER_NET='192.168.34.(2(4[0-9]||5[0-3]^))'  # in my case the load balancers healthcheck always comes from 251 and 250
#Arguments
maxconwarn=10
maxconcrti=30
type="total"
HELP="Usage: $0  [-w|--warning ] [-c|--critical ] [-t ]"

#EDIT HERE:
BALANCER_NET='192.168.34.(2(4[0-9]||5[0-3]^))'  
	#In my case the load balancers healthcheck always comes from 240 to 253
	#Get ur regex here: http://www.analyticsmarket.com/freetools/ipregex


#this file location changed from redhat 5 to redhat 6
tst=`grep " 6." /etc/issue.net`
if [ -z "$tst" ]; then
	CONNTRACK_FILE="/proc/net/ip_conntrack"
else
	CONNTRACK_FILE="/proc/net/nf_conntrack"
fi

#VERIFY IF FOUND THE FILE
if ! [ -f "$CONNTRACK_FILE" ]; then
	echo "ERROR CONNTRACK FILE NOT FOUND"
	exit $STATE_UNKNOWN
fi

#Check arguments print help
if [ $# -lt 4 ]; then
    echo $HELP
    exit $STATE_UNKNOWN
fi

#GET ARGUMENTS
while test -n "$1"; do
    case "$1" in
        --help)
            echo $HELP
            exit $STATE_OK
            ;;
        -h)
            echo $HELP
            exit $STATE_OK
            ;;
                --warning)
            maxconwarn=$2
            shift
            ;;
        -w)
            maxconwarn=$2
            shift
            ;;
        -c)
            maxconcrit=$2
            shift
            ;;
        -t)
            type=$2
            ;;
    esac
   shift
done

#COUNTS EITHER TOTAL OR ONLY THE BALANCERS HEALTHCHECK
if [ $type = "total" ]; then
	CON=`grep -c UNREPLIED $CONNTRACK_FILE`
	MSG="Existem $CON conexoes em estado UNREPLIED total | unreplied_total=$CON"
elif [ $type = "bl" ]; then 
	CON=`egrep "${BALANCER_NET}" $CONNTRACK_FILE | grep -c UNREPLIED`
	MSG="Existem $CON conexoes em estado  UNREPLIED do balanceador | unreplied_bl=$CON"
else
	 echo "Usage: $0  [-w ] [-c ] -t(total|bl)" 
	 exit $STATE_UNKNOWN
fi



if [ $CON -ge $maxconcrit ]; then
        OUTPUT="CRITICAL- $MSG"
        exitstatus=$STATE_CRITICAL
elif [ $CON -ge $maxconwarn ]; then
        OUTPUT="WARNING - $MSG"
        exitstatus=$STATE_WARNING
elif [ $CON -lt $maxconwarn ];then
   OUTPUT="OK - $MSG"
   exitstatus=$STATE_OK
fi


rm -f $TMP_FILE
echo $OUTPUT
exit $exitstatus

  1. No comments yet.
  1. No trackbacks yet.

VAMOVE *

*