bash – More powerful than a speeding locomotive

(flashback)

Our project was inching closer and closer to going live.  On the evening before the Easter break some clod started a massive report which tried to process all the data since the beginning of time. My colleague’s inefficient program gathered up all of the data from the database and wrote it to the data directory in a temp file.

The data directory was more than large enough for a test environment but it wasn’t a big enough for such a test.  We could have easily recovered if it had occurred while we were at work but instead the system attempted to process nightly batches over those four days without enough space and made a pretty big mess.

My boss Theodore was more upset than he should have been for a test environment and kept yammering on about what if this has been production.  He was right of course but one of the preconditions of the system is that enough resources are available.  It is our groups responsibility to write the programs but it is someone else’s to ensure that the system doesn’t run out of resources.

Anyway, we implemented the boss’ warning email feature.  Each time the program is run it checks for enough disk space when not enough space exists then send out a warning email and quits. To be on the safe side, my boss asked that I have my email address as one of the recipients.

(the present)

If I receive one more warning email from one of the test systems I am afraid I am going to kill someone.

Von: Automatically.Generated.Message@acme.com [mailto:Automatically.Generated.Message@acme.com] 
Gesendet: Dienstag, 7. Februar 2017 11:54
An: process_monitoring@acme.com; 
Betreff: Warning ... the end is near on acme-app1

An error has occurred, the copy app has not been launched 
because of insufficient disk space on following partition.

Filesystem             size   used  avail capacity  Mounted on
acme-app1_dpool/app    85G    85G     0K   100%    /appdir

Corrective action is required immediately.


The status of rest of machine is as follows.

Filesystem             size   used  avail capacity  Mounted on
/                       10G   7.7G   2.3G    77%    /
/dev                    10G   7.7G   2.3G    77%    /dev
proc                     0K     0K     0K     0%    /proc
ctfs                     0K     0K     0K     0%    /system/contract
mnttab                   0K     0K     0K     0%    /etc/mnttab
objfs                    0K     0K     0K     0%    /system/object
swap                   140G   400K   140G     1%    /etc/svc/volatile
fd                       0K     0K     0K     0%    /dev/fd
swap                   8.0G   700M   7.3G     9%    /tmp
swap                   140G    40K   140G     1%    /var/run
acme-app1_dpool/app    85G    85G     0K   100%    /appdir
acme-app1_dpool/acme_home   1.0G   353K   1.0G     1%    /appdir/home/gast
acme-app1_dpool/acme_samba   2.0G    36K   2.0G     1%    /appdir/samba
acme-app1_dpool/acme_scripts   2.0G   249M   1.8G    13%    /appdir/scripts

This is an automatically generated message for informational purposes.

The idea seemed ok; when no disk space exists then send out an email.  The underlying assumption was someone in IT would deal with the problem.

Apparently the idiot users turn off half of the system about a week back but not every process.  I came to work and found hundreds of emails clogging up my inbox.  Looking through them you could literally see the space filling up over time.

Well, hundreds of files are annoying but the general functionality is awesome.  A combination of the bash script and sendmail allows me to capture the important facts about our system and send it to someone.

Just look at the script.

#!/usr/bin/bash
SUBJECT="warning the end is near"
DF_Command=`df -h ${FILESYSTEM}`
FULL_DF=`df -h`
TO=process_monitoring@acme.com
FROM="automatically generated message"

HOSTMACHINE=acme-app1

( cat << !
To: ${TO}
From: ${FROM}
Subject: warning ... the end is near on $HOSTMACHINE

An error has occurred, the copy app has not been launched because of insufficient disk space on following partition.

${DF_Command}

Corrective action is required immediately.

The status of rest of machine is as follows.

${FULL_DF}

This is automatically generated message for informational purposes.
!
) | /usr/sbin/sendmail -t

Fill the variables with information ranging from a single word up to a lot of lines of text and then substitute them into your mail.  The bash shell will expand them before sending the mail out.

I guess that the moral of the story should be that more logic should be used because some idiot will inevitably trigger it on a non-production environment.  Well, that or just get rid of the idiots ….

 

This entry was posted in programming and tagged , . Bookmark the permalink.