Edit: Edited some links to Tom’s articles and the location of nstress.
Originally posted July 20, 2009 on AIXchange
As I noted in last week’s AIXchange blog entry, Ken Milberg has a soon-to-be-released book on understanding AIX performance. Blogging about Ken’s book got me thinking about performance, and how challenging it is for AIX administrators to become well-versed in this area.
In 2003 I worked for IBM in Boulder, Colo. We were invited to attend an onsite class taught by Tom Farwell. These AIX performance classes were going to be held twice a week over five weeks, 10 classes in total. Intrigued by the subject matter, I attended the first class. It felt like there were a hundred people in that conference room. But by the end of the course, only a handful of administrators remained. I often wonder about that. AIX performance is critical for an administrator to understand, yet it can be a challenging topic to master. Some, I imagine, dropped out due to a heavy workload or commitments outside of work. Others didn’t want to put forth the effort necessary to understand the concepts, or felt there was too much detail that they didn’t need to understand.
Later that year at IBM Technical University in Miami, Tom was a presenter at some of the sessions. I remember sitting in the crowd and being struck by the huge amount of interest from the attendees. During breaks people would bring their laptops to the front and ask Tom to take a look (by virtue of their wireless connections) at their production systems that were having issues. Simply by having the admins run a few commands, Tom quickly deduced what was wrong in many cases.
He was the first person that I heard apply queueing theory principles to AIX performance, and Tom’s authored several articles in IBM Systems Magazine that are worth reading. Some samples are “Examining vmstat” and “It’s All in the Networking.” A Google search should yield several others.
In addition to Tom’s writing, the “AIX 5L Practical Performance Tools and Tuning Guide” Redbooks publication and other performance-themed IBM Redbooks publications can help you gain knowledge on this topic.
Once you’ve done your reading and studying, get a test machine to practice on. Start by running commands from nstress as explained on IBM developerWorks, including ncpu, ndisk, nfile and nmem to simulate different types of workloads on your machine.
Spend the time with the AIX tools and commands that can help you pinpoint the causes of system problems and show you how the output appears when the machine is under load.
Better still, get your actual application running and find ways to break your test box. Troubleshooting a live, misbehaving machine is the best way I know to prove you really understand how to diagnose and fix problems.
Get the knowledge from the books, and then get the hands-on experience from the machines. Both acts will help you understand AIX performance–and, accordingly, allow you to become better at your job.