Edit: This is still a relevant topic
Originally posted October 2019 by IBM Systems Magazine
Understand your unique environment to best collect and graph data.
Q: What’s the best way to collect and graph AIX performance data?
You probably won’t care for the short answer, but honestly, it depends.
The best way to collect and graph AIX* performance data will come down to the unique characteristics of your environment. And naturally, the available time and collection of skills of those on staff, along with the resources being budgeted to your operations, are also critical considerations.
Now for the longer answer: Because one size obviously doesn’t fit all, arriving at a solution takes time and forethought. This is true whether you’re assembling a brand new environment or you’re realizing that what’s worked in the past no longer serves your purposes.
In my experience as a consultant, I’ll typically start by asking—and getting answers to—numerous questions:
- If you plan to host the infrastructure in-house, do you have the cycles in your schedule to stand up and learn a new tool? For that matter, do you have an existing VM—or spare capacity to create a new VM in your environment—that can be used for the effort?
- Are you looking to send the data off-site and let someone else create and present the reports? Is this Software as a Service-type solution impractical or even impossible in your enterprise given your corporate security posture, firewall requirements or other constraints?
- Will you need software support to help with setup or troubleshooting if something goes wrong with the data collection? Do you require outside expertise in interpreting the graphs and reports?
- Who are the consumers of the information being created: management, technical staff or both?
- What kinds of decisions will be made based on the data? Will data be used to help with server consolidation, or is the idea to rebalance the workloads by identifying frames that are “running hot”?
- Are you looking to retain past data on system performance? (It’s not a bad idea to have this information on hand to address user questions or concerns.)
- Do you need trend data to help determine when new servers or additional capacity should be added to the environment?
Your Performance Data Toolbox
Of course, commercial performance tools can do much of this work for you. Options like Performance Navigator from Midrange Performance Group or Galileo Performance Explorer Suite from The ATS Group require little intervention once you get them up and running. These and other products can include vendor support, which may be a priority/requirement for your management.
IBM offers PM for Power; there’s a basic version available at no charge as well as the full-featured product. For many environments, the limited functionality of the free version is sufficient—it just depends on your needs. And newer versions of the HMC include built-in performance graphs. That at least provides a quick, easy way to investigate any potential issues.
Other freely available options include open-source tools such as Ganglia and RRDtool. Or you can always manually feed your nmon files into the Microsoft Excel-based nmon analyzer tool.
Nigel Griffiths has posted articles and videos that detail newer techniques using njmon, influxdb and grafana. These are highly customizable solutions that allow you to change your graphical views on the fly. They’re capable of handling huge amounts of data from large numbers of LPARs and can be implemented with minimal technical expertise.
Re-Evaluating Your Needs
The process doesn’t end with the choice you make. Over time, you’ll need to re-evaluate your solution. Do the tools still do the job they’re needed to do? Is greater automation needed? Are new and better options available? Have your requirements changed?
There’s no one way to store and maintain performance data. But if you ask the right questions—and keep asking questions—the effort and resources you invest in a solution will be worth it.