Edit: Are you monitoring errors?
Originally posted April 14, 2015 on AIXchange
Your shop has no budget for monitoring software, but you still want to be notified when LPAR errors appear in the AIX error log. You have a few options.
You could write scripts and periodically run them out of cron. You could set up a master workstation and use it to ssh into each of the machines you want to monitor and run errpt. Or you could set up your machines to send you email notifications of new errors. To do this, you could hard code an email address — either your own, a group address or some generic address (e.g., one that’s monitored by operations or the on-call person) — or you could route the emails to root on the server and set up a .forward file to distribute them to all the addresses you choose to designate. This nice how-to document has the details:
“Having the pleasure of working across many client accounts, it’s funny to see some of the convoluted scripts people have written to receive alerts from the AIX error log daemon. Early in my AIX career, I used to do the exact same thing, and it involved a whole bunch of SSH keys, some text manipulation, crontab, and sendmail. Wouldn’t it be nicer if AIX had some way of doing all of this for us? Well, you know I wouldn’t ask the question if the answer wasn’t yes!
Step 1
Create a temporary text file (e.g. /tmp/errnotify) with the following text:
errnotify:
en_name = “mail_all_errlog”
en_persistenceflg = 1
en_method = “/usr/bin/errpt -a -l $1 | mail -s \”errpt $9 on `hostname`\” user@mail.com”
Step 2
Add the new entry into the ODM.# odmadd /tmp/errnotify
Step 3
Test that it’s working by adding an entry into the error log.
# errlogger ‘This is a test entry’
If required, you can delete the ODM entry with the following command:
# odmdelete -q ‘en_name=mail_all_errlog’ -o errnotify
0518-307 odmdelete: 1 objects deleted.
To send notifications to multiple addresses, you can do something like ops@company.com,unix@company.com . To update your email address, be sure to do the odmdelete first; if you just rerun the odmadd, it will create multiple entries in the odm. To see the entries on your system use:
#odmget -q ‘en_name=mail_all_errlog’ errnotify
One caveat: I know of one environment that processed so much email and logged so many SAN errors that it actually impacted system performance. It would be nice if there was a way to limit the rate that error messages were sent out if a ton of errors were generated for some reason. This whole process assumes you have sendmail working. For those instructions, check out this IBM developerWorks article:
To start the Sendmail daemon automatically on a reboot, uncomment the following line in the /etc/rc.tcpip file:
# vi /etc/rc.tcpip
start /usr/lib/sendmail “$src_running” “-bd -q${qpi}”
Execute the following command to display the status of the Sendmail daemon:
# lssrc -s sendmail
To stop Sendmail, use stopsrc:
# stopsrc -s sendmail
The Sendmail configuration file is located in the /etc/mail/sendmail.cf file, and the Sendmail mail alias file is located in /etc/mail/aliases.
If you add an alias to the /etc/mail/aliases file, remember to rebuild the aliases database and run the sendmail command with the -bi flag or the /usr/sbin/newaliases command. This forces the Sendmail daemon to re-read the aliases file.
# sendmail -bi
To add a mail relay server (smart host) to the Sendmail configuration file, edit the /etc/mail/sendmail.cf file, modify the DS line, and refresh the daemon:
# vi /etc/mail/sendmail.cfDSsmtpgateway.xyz.com.au
# refresh -s sendmail
You can use this same method to monitor your VIO servers.
How are you notified of LPAR errors?