September 2008 – Not So Frequently Asked Questions

Disconnecting windows remote desktop (terminal server) users

You are trying to connect via remote desktop (terminal server) to the server, but you find out there’s too much people already connected. You get the damn message:

You can't connect!

What can I do? Is easy. As we already have our brand new tool winexe, we can make a little script to make our lives easier:
#!/bin/bash


[ $# -lt 1 ] && echo "Error: Missing argument" && echo "Use: $0 server [disc #session]" && exit

[ ! -z "$2" ] && [ $2 != disc ] && echo "Error: Can't understand second argument" && echo "Use: $0 server [disc #session]" && exit [ "$2" == "disc" ] && echo "Disconnecting session $3 from server $1..." && winexe //$1 "logoff $3" -A secretfile && exit echo "Listing server $1 sessions:" winexe //$1 "query session" -A secretfile

File “secretfile” is optional, just in case you don’t want to type user and pass. Contents are:
domain=YOURDOMAIN username=user password=pass

That’s an poorly error-controlled script, but it allows you to watch who is connected:
[email protected]:~/$ ts.sh server2 Listing server server2 sessions: SESSIONNAME USERNAME ID STATE TYPE DEVICE > user1 0 Disc rdpwd rdp-tcp 65536 Listen rdpwd Administrator 3 Disc rdpwd user2 1 Disc rdpwd console 5 Conn wdcon [email protected]:~/$

In this server you can’t login, there are too much users. We can see everybody is “disconnected”, so there is no one working. We choose the user we like the least, and we kick him out:
[email protected]:~/$ ts.sh server2 disc 1 Disconnecting session 1 from server server2... [email protected]:~/$ ts.sh server2 Listing server server2 sessions: SESSIONNAME USERNAME ID STATE TYPE DEVICE > user1 0 Disc rdpwd rdp-tcp 65536 Listen rdpwd Administrator 3 Disc rdpwd console 5 Conn wdcon

Et voilà, we just get a free session to connect to admin this server.

Obviously, is way better if everybody logs off when they end working. But if you have to share your servers with absentminded admins, you must take care of yourself…

My server reboots when it hasn’t access to SAN disks!

You have your Linux boxes, with a Oracle 10g RAC. Everything works perfectly, but suddenly one server reboots. Yo peek in the logfile and you find this:
Sep 18 00:27:24 server1 kernel: SCSI error : <2 0 2 0> return code = 0x20000 Sep 18 00:27:24 server1 kernel: end_request: I/O error, dev sdae, sector 1672 Sep 18 00:27:24 server1 kernel: device-mapper: dm-multipath: Failing path 65:224. Sep 18 00:34:14 server1 syslogd 1.4.1: restart. Sep 18 00:34:14 server1 syslog: syslogd startup succeeded Sep 18 00:34:14 server1 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Ok… SAN disks failed… server has lost part of its disks… But this doesn’t seem to be a big deal, it shouldn’t have rebooted, should it? Operating system (root filesystem “/”) is mounted on a local disc. In fact, there is nothing using SAN disks but the ocfs from Oracle… The only one who should have faild was Oracle, and nothing more, isn’t it? Why has been rebooted the whole machine?

Turns out that long ago, Oracle RAC, when it found itself in this situation, tried to pull out machine from cluster via “evict node”. But this didn’t work most of the time, ocfs2 driver hung, hunging the whole cluster a lot of times (every machine in the cluster). Drastic solution… What’s the safest way to get out of a cluster? You got it, rebooting the machine.

They could have made Oracle to leave some messages in the logfile, warning it was the one who rebooted the machine, so things would be clearer. But you can’t always get what you want.

So if you find your machine rebooting when it losses SAN disks, don’t blame the machine, and don’t blame Oracle… get your SAN fixed so it won’t happen again.

Checking a domain name expiration date: check_domain

We woudn’t like our domain to expire, and having our domain bought by an ciberspeculator (also bad-known as “cibersquatters”), asking us 1.000$ for it, when it’s actually worth 20 (I’ve lived that situation with a personal domain name of myself).

It’s not a big deal, after all registrars always warn users in advance, giving you every chance for renewal (that’s their interest). But… what if the email address you configured that day, is not active any more? What if the new boss secretary mistake it with spam, and ignore it (real case)? What if the company is so big that no one knows who is reading that email address?

To make sure we are up-to-date with our domains, I’ve created a nagios plugin, named “check_domain”. It’s simple (if you look at the code, you’ll see there’s more lines parsing parameters than doing things ), but it covers our needs, and warns you when the domain name is near to expire.

In the full article (“read more”) you can see the code, and a downloadable file.

check_domain
#!/bin/bash


PROGPATH=echo $0 | /bin/sed -e 's,[\/][^\/][^\/]*$,,'
. $PROGPATH/utils.sh
# Default values (days):

critical=7

warning=30
# Parse arguments

args=getopt -o hd:w:c:P: --long help,domain:,warning:,critical:,path: -u -n $0 -- "$@"

[ $? != 0 ] && echo "$0: Could not parse arguments" && echo "Usage: $0 -h | -d  [-c ] [-w ]" && exit

set -- $args
while true ; do

        case "$1" in

                -c|--critical) critical=$2;shift 2;;

                -w|--warning)  warning=$2;shift 2;;

		            -d|--domain)   domain=$2;shift 2;;

		            -P|--path)     whoispath=$2;shift 2;;

		            -h|--help)     echo "check_domain - v1.01"

                               echo "Copyright (c) 2005 Tom�s N��ez Lirola  under GPL License"

                               echo "This plugin checks the expiration date of a domain name."

                               echo ""

                               echo "Usage: $0 -h | -d  [-c ] [-w ]"

                               echo "NOTE: -d must be specified"

                               echo ""

                               echo "Options:"

                               echo "-h"

                               echo "     Print detailed help"

                               echo "-d"

                               echo "     Domain name to check"

                               echo "-w"

                               echo "     Response time to result in warning status (days)"

                               echo "-c"

                               echo "     Response time to result in critical status (days)"

                               echo ""

                               echo "This plugin will use whois service to get the expiration date for the domain name. "

                               echo "Example:"

                               echo "     $0 -d domain.tld -w 30 -c 10"

                               echo ""

                               exit;;

	             	--) shift; break;;

                *)  echo "Internal error!" ; exit 1 ;;

        esac

done
[ -z $domain ] && echo "UNKNOWN - There is no domain name to check" && exit $STATE_UNKNOWN
# Looking for whois binary

if [ -z $whoispath ]; then

      type whois &> /dev/null || error="yes"

      [ ! -z $error ] && echo "UNKNOWN - Unable to find whois binary in your path. Is it installed? Please specify path." && exit $STATE_UNKNOWN

else

      [ ! -x "$whoispath/whois" ] && echo "UNKNOWN - Unable to find whois binary, you specified an incorrect path" && exit $STATE_UNKNOWN

fi
# Calculate days until expiration

expiration=whois $domain |grep "Expiration Date:"| awk -F"Date:" '{print $2}'|cut -f 1

expseconds=date +%s --date="$expiration"

nowseconds=date +%s

((diffseconds=expseconds-nowseconds))

expdays=$((diffseconds/86400))
# Trigger alarms if applicable

[ -z "$expiration" ] && echo "UNKNOWN - Domain doesn't exist or no WHOIS server available." && exit $STATE_UNKNOWN

[ $expdays -lt 0 ] && echo "CRITICAL - Domain expired on $expiration" && exit $STATE_CRITICAL

[ $expdays -lt $critical ] && echo "CRITICAL - Domain will expire in $expdays days" && exit $STATE_CRITICAL

[ $expdays -lt $warning ]&& echo "WARNING - Domain will expire in $expdays days" && exit $STATE_WARNING

# No alarms? Ok, everything is right. echo "OK - Domain will expire in $expdays days" exit $STATE_OK

Executing windows commands from your linux box: winexe

When you see a windows stopped server in your nagios console, sometimes you would like to add an event_handler who tries to start the service automatically.

With samba , it´s been a long term feature, some way to control services ( net stop or net start ), but I haven’t found that this ever worked.

There’s a useful tool: winexe . With this tool, you can, not only stop and start windows services, but execute any shell comand, even having a windows shell inside your linux box, as simply as:
winexe -U HOME/Administrator%Pass123 //host cmd

It’s a open source project (software libre), having the source code is published in the same web, and having no modification since 26/10/07. It surely haven’t needed any modification, because it is fully functional, and I haven’t had any problem so far, beyond that bloody craze, using backslashes () everywhere, forcing us to escape characters every now and then…

Winexe turned out a useful complement as a event_handler nagios tool.

Getting system info from command line: pstools

In the previous post, where we talked about winexe, we showed how to execute shell commands from our linux console. Our first idea was to start and stop services ( net start; net stop), but once we have a windows shell, we can go beyond a do a lot more. to achieve that, we can use pstools .

With them we will feel like we were our windows console, because we can have ps (pslist), a kill (