Living with Legacy

Edit: I still run into AIX 5.3 all the time

Originally posted December 12, 2017 on AIXchange

This Twitter comment got me thinking about companies that continue to run legacy hardware and operating systems, etc.:

The software needs to be reliable. We had an operator put in a wrong toolholder and poof, 37k out the window for a new spindle. Imagine if it were a software error? Machine manufacturers aren’t going to risk it. They develop something, test the snot out of it, and then try not to change anything unless they have to. The machines are likely to last far longer than the average operating system. We have some that are between 15-20 years old and it is not uncommon to find them 30+ years old like the lathe this guy is working on. The more expensive the machine, the more likely it will be rebuilt and kept running.

Although his focus is Windows XP, I’m sure if you ask around, you can find examples of ancient unsupported systems on your own. They may even be in your own organizations.

Here’s a sampling of replies:

This is almost always the vendor’s fault, and no they usually don’t test on new OSes when they come out. I’ve seen this happen with law enforcement, medical, and industrial software.

I’ve had vendors tell me the newest version of Windows/Windows Server they support is one that had been out of extended support for years. Server software that requires 2000/2003. Client software that requires NT or XP. Low competition, vendor lock in? Why bother.

And the other problem? These industries are boring. Not many SV nerds getting on GitHub to write new jail cell management software. Again, little/no competition, niche industry, locked in customers. No choice but to run insecure software.

The vendors have no incentive to change, or they’ve gone out of business, but their customers have found that their existing solutions solve their problems so they’re not looking for another solution. I still come in contact with people that are happily running AIX 5.3 on POWER5 hardware and don’t see any reason to move ahead. Save for the occasional hard drive failure, they don’t even touch their systems.

While most of us understand the numerous, important benefits of being current, supported and up to date, others do not. IBM has tried to give them a way forward with Extended Support and other options allow customers to run unsupported operating systems on modern hardware, but you can only do that for so long.

Unfortunately, these customers may find themselves painted into a corner with no way out. The technical debt will catch up with them.

So what’s the answer? What is the best path forward for customers that insist on running some old application on DOS, or on AIX 4.3.3, or something similar? Because eventually you will get called to help out this sort of customer. What are you doing now to prepare for the day when critical legacy infrastructure goes down?

Maybe all we can do is hope that this joke, recently told on Twitter, becomes reality:

@eashman AWS has announced new PDP-11 instances. Useful for airlines and Motor Vehicle departments looking to move to the cloud without upgrading existing infrastructure. #reInvent #geekhumor

The POWER9™ Journey Begins

Edit: At the time of this writing we are waiting to begin the POWER10 Journey

Originally posted December 5, 2017 on AIXchange

One of the great benefits of being an IBM Champion is the ability to attend IBM briefings about unannounced products. For instance, recently, IBM gave us some details about the new Linux-only POWER9™ server. The AC922 (Machine Type 8335-GTG) for high-performance computing (HPC) is being officially announced today, and will be generally available Dec. 22. Learn more about the AC922 here. 

The AC922 will be the first GA system to run a POWER9 processor. This machine takes advantage of the new faster speeds we’ll see from PCIe Gen 4–which is twice as fast as PCIe Gen 3–along with the improved speeds of CAPI 2.0 and next-generation NVIDIA NVLink, where POWER remains the only processor for NVLink from the processor to the GPU accelerators.

It contains two POWER9 sockets with up to 40 cores, and up to four NVIDIA Volta-based Tesla V100 GPUs. It will max out at 1TB of memory if you use the 16 x 64 GB DIMMs (eight per socket). It has four PCIe Gen 4 slots, and can have up to 7.7 TB of storage and 3.2TB NVMe adapters. It’s not designed for virtualization; it’s intended to be configured as a bare metal “single server.”

There are two processor modules: 16-core and 20-core. Initially, the available memory options are 16, 32, or 64GB industry standard DIMMs. There are two hard drive slots per machine. You can choose from 1TB and 2TB HDD options and 960GB, 1.92TB and 3.84TB SSDs. You have your choice of RAID0, RAID1 and RAID10.

I’ll share some information I received during the call. These notes come from presentations created by IBM experts:

  • The AI era is going to be a journey. Clients are faced with challenges of commodity hardware combined with open source software. IBM has built the best systems in the marketplace to crush the data challenges of the AI era. These are enabled with advanced I/O interfaces, new shared memory structures and co-optimized hardware and software.
  • There are three key points with the AC922, which make it the best server for AI. First, it is designed from the ground up for AI workloads; this starts with the acceleration superhighway. In AC922, IBM has introduced second generation NVLink between the CPU and GPU, this is 5.6x faster than PCIe Gen 3 architectures. Second, IBM did not only focus on the NVLink and the GPU, but designed a balanced system, one that is designed for the AI era with industry-leading memory bandwidth, PCIe Gen 4 buses for the best network connectivity with Infiniband and high performance storage adapters. Lastly, IBM took the open source deep learning frameworks and optimized them around this advanced design. This results in the best server and solution for enterprise AI. Additionally, this server design will find use in applications such as HPC and accelerated databases…so do not think it is just for AI.
  • At the center of Power Systems’s differentiation is the processor. Everything starts from here and it is designed for the cognitive era. Power has always had a stronger core with up to 4x the threads over x86. The architecture also enables advantaged memory bandwidth for a balanced system design, enabling ease of data movement within the system. One of the core differentiators Power delivers is the advanced I/O interfaces. Last fall IBM introduced POWER8 with NVLink. This was the first processor with NVLink between the CPU and the GPU. With POWER9 IBM introduced more advanced interfaces such as next generation NVLink, PCIe Gen 4 and OpenCAPI.
  • This remains the only processor in the industry to leverage NVLink between CPU-GPU.
  • When IBM talks about AC922, they discuss CORAL. CORAL is the collaboration between Oak Ridge, Argonne and Lawrence Livermore research labs for the Department Of Energy. It all starts with the POWER9 processor and the NVIDIA Tesla V100. IBM is combining these on a motherboard, which is differentiated with the connectivity between them. All nodes are contained in a standard rack mount chassis. It is the repeatable building block used for this super computer.

Resources
On Twitter, the OpenPOWER Foundation shared photos of this system that were shown at November’s SC17 conference. Check out picture 10.

This article describes the nodes that make up the Summit Supercomputer. It gives you an idea about potential real-world uses for these nodes:

Oak Ridge National Laboratory’s new Summit supercomputer, projected to be the fastest in the world, should rocket the U.S. back into the lead over China on the top 500 list of fastest supercomputers. At SuperComputing 2017, IBM demoed its Power Systems AC922 server nodes that serve as the backbone of the Summit supercomputer. …

Summit promises to deliver 5-10x more performance than its predecessor, Titan, but it crams much more power into a smaller footprint. Titan featured 18,688 nodes, but Summit will overpower it with “only” ~4,600 nodes. That capability stems from increased node performance; Summit will offer more than 40 TeraFLOPS per node, whereas each Titan node weighed in at 1.4 TeraFLOPS. Packing all that power into a single node begins with IBM’s water-cooled Power Systems AC922 node. Each node is equipped with two IBM POWER9 processors and six Nvidia Volta GV100 GPUs. The nodes also feature an aggregate of 512GB of coherent DDR4 and HBM2 (High Bandwidth Memory) along with 1,600GB of non-volatile RAM. …

Supercomputers are all about parallel computation and moving data between the CPUs, GPUs, memory, and networking, so Summit provides numerous layers of extreme bandwidth. The system features 96 lanes of PCIe 4.0 that comes in handy for the dual-port Mellanox EDR InfiniBand adapter, which has a theoretical maximum throughput of 400Gb/s. IBM has measured throughput at 392Gb/s, which is twice the bandwidth of a PCIe 3.0 adapter. The Volta GV100’s connect via PCIe 3.0 and NVLink 2.0. The NVLink interface provides 100GB/s of throughput for CPU-to-GPU and GPU-to-GPU traffic. The GPUs are arranged in a dual-mesh design. Interestingly, IBM also produces a model with four GPUs that will power CORAL’s Sierra supercomputer. The four-GPU model (the last picture in the album above) touts 150GBps for inter-GPU/CPU communication. Due to the reduced number of GPUs, IBM can provision more links (“bricks” in NVLink parlance) to the CPUs and GPUs, which increases throughput. …

The POWER9 processors have eight memory channels, for a total of 16 channels per server that provide 340GB/s of aggregate bandwidth. Each Summit node will wield a maximum of 2TB of DDR4-2666 memory.

In this video, “Scott Soutter, IBM; Steve Fields, IBM Power Systems; and Dylan Boday, IBM Power Systems discuss Power AI, deep learning frameworks, continued partnership with Nvidia for POWER9, and Open CAPI, from SC17 in Denver, Colorado.”

Although there’s nothing AIX-specific in today’s announcement, more announcements that cover the AIX and IBM i ecosystem will be made in the future.

IBM has issued a statement of direction for the POWER9 Enterprise hardware. I’ve also seen timelines for AIX and IBM i that, I assure you, extend years into the future.

Obviously, there’s much more ahead with POWER9, but this machine is the first step on that journey.

Using lvmo to Migrate LVM Performance Tuning Values

Edit: Some links no longer work

Originally posted November 28, 2017 on AIXchange

If you use the lvmo command to tune Logical Volume Manager (LVM) pbufs, this information may be useful:

The lvmo command sets or displays pbuf tuning parameters. The equal sign can be used to set a particular tunable to a given value. Otherwise, if no equal sign is used, the value of the tunable will be displayed.

Of course the warning is also very helpful:

Misuse of the lvmo command can cause performance degradation or operating-system failure.

Yes, lvmo requires the utmost care, but when used properly, it can provide valuable function. For instance, via Twitter I found this IBM developerWorks post from May. It explains how to use lvmo for migrating LVM performance tuning values.

These tunables are stored outside of the on disk volume group in ODM, and aren’t preserved when the volume group is moved to a new LPAR. The exportvg and importvg of the volume group will set the LVM performance tunables to the default values.

A way around this is to backup lvmo tunables before exporting the volume group, and then restore them after importing.

developerWorks provides a downloadable sample script, lvmo_tool, to demonstrate this:

    # lvmo_tool -?
    getopt: Not a recognized flag: ?
    Usage: lvmo_tool -b lvmo_tool -r

To back up lvmo tunables:

    # lvmo_tool -b testvg
    lvmo tunables are saved in /tmp/lvmo00f6f42a00004c000000015b827ff33f.

Run “lvmo_tool -r /tmp/lvmo00f6f42a00004c000000015b827ff33f testvg” to restore lvmo tunables after importing the volume group.

To restore lvmo tunables, import the volume group with no varyon option:

    # importvg -y testvg -n hdisk1 testvg

restore lvm tunables

    # lvmo_tool -r tmp/lvmo00f6f42a00004c000000015b827ff33f testvg

varyon the volume group:

    # varyonvg testvg

developerWorks adds this note:

Restoring lvm tunables on already varied volume group requires a varyoff and varyon the volume group if:

  • New values of the tunables are less than the volume group’s current values.
  • Changing max_vg_pbufs of the volume group.

Do you think you’d find this tool helpful?

Losing a Laptop

Edit: This still has the potential to be devastating

Originally posted November 21, 2017 on AIXchange

A buddy recently checked into a hotel. He asked the desk clerk about in-room safes. He’s told the hotel doesn’t have them. Then he asked the clerk to recommend a good nearby restaurant within walking distance. He was given directions, went, ate, returned to his room, and found his laptop had been stolen. He wondered if the clerk was somehow in on the theft, but he couldn’t prove it.

I mention this story because it points to how essential our laptops are to our lives. Losing our computers–under any circumstance–inhibits our ability to make a living. I liken it to an auto mechanic or a construction worker having his tools stolen. This puts someone’s livelihood in jeopardy. At minimum, there’s a significant inconvenience involved in replacing the stolen gear.

Think about your laptop and its contents. Is the hard drive encrypted? Is there a power-on password? How would you be affected if it were taken from you forever?

Whenever I get a new machine, I spend time recreating my environment. Certainly, that process is made easier if I have access to my old laptop in order to make comparisons. There may be VPN definitions to recreate, virtual machines and .iso files that you like to have available, software packages to download and install, and documents that are in process or saved locally. I’m sure you have your own list of tools and capabilities that you use everyday. They would be difficult or impossible to replace.

How long would it take you to rebuild and recover? Would some things simply be lost for good? While you’re pondering that, you might want to ask yourself about maintenance. How recent is your latest backup? Have you tested a restore?

I know some people who, I guess, wouldn’t be lost without their laptop. They have relatively thin clients, they use cloud storage and/or regularly backup their files. (There’s software that automatically backs up everything hourly, or at an even faster interval.)

So what do you do to protect your laptop? Do you bring it everywhere you go rather than leave it unattended at any point during your travels? That can be a drag when you want to walk around and explore a new destination. Do you really want to bring everything with you all of the time?

There are other options, like a laptop lock. Of course the downside to this is that it’s pretty obvious what’s been locked up. Plus it’s not capable of protecting any other valuables you might have with you.

I’ve heard about something called a backpack/bag protector. Everything goes in your bag, then you wrap this metallic mesh of cables around it and attach everything to a bedframe or a drainpipe in your bathroom. I guess it’s like chaining a bicycle to a bike rack. Apparently international travelers use them when staying at places like youth hostels. I’ve also heard of them being used by hikers and backpackers; they just attach their bags to a tree.

The outfit that makes the backpack protector also has what it calls a portable safe. It’s basically their own bag that contains a built in mesh, and it comes with an integrated lock and a cable that can be attached to any heavy, unmovable object.

Obviously there’s no foolproof solution. Hotel safes certainly aren’t impenetrable (watch herehere and here). Devices like locks and portable safes can be defeated by bolt cutters. Keeping your laptop with you at all times is a problem if you’re mugged. Even common sense has its limits. When you’re traveling you may not go around advertising that you have a laptop with you, but when a bunch of techies gather at a conference, the thieves can figure out that there will be unsupervised computers in the area.

What steps do you take to protect your laptop, on the road or even when you’re at home?

On Becoming a Sponsor User

Edit: Some links no longer work

Originally posted November 14, 2017 on AIXchange

While attending the IBM Technical University last month I went to a session on the Cloud Management Console (CMC). One thing I highlighted when I first wrote about the CMC is how you get access to the product. You can pay $50 per frame per month, or, if you’ve purchase a C model, you receive access to the product for three years.

Another way to gain access to the service is to become a Sponsor User. Although the information here describes teams that are building products, I think it’s a good overview for those interested in the CMC:

Sponsor Users are real-world users that regularly contribute their domain expertise to your team, helping you stay in touch with users’ real-world needs throughout the project.

Despite our best efforts, empathy has its limits. If you’re designing the cockpit of an airliner but you aren’t a pilot, you simply won’t know how it feels to land a plane. Without that first-hand experience, it’s easy to lose touch with our users’ reality and allow bias and personal preference to creep into our work.

Sponsor Users are real users or potential users who bring their experience and expertise to the team. They aren’t passive subjects––they’re active participants who work alongside you to deliver a great outcome. While they won’t completely replace formal design research and usability studies, Sponsor Users will help you break the empathy barrier and stay in touch with real-world needs throughout your project.

Anatomy of a Sponsor User
A good Sponsor User is representative of your intended user, they’re invested in the outcome, and they have the availability to regularly work with you and your team.

1) Are they representative of your target user? A good Sponsor User reflects the actual user you intend to serve. As enthusiastic as your client, customer, or economic buyers may be to help you, they are often not the user who will ultimately derive personal value from your offering.

2) Are they personally invested in the outcome? A good Sponsor User cares as much about your project’s outcome as you do. Look for candidates who have a particularly demanding use case––a Sponsor User who relies heavily on your offering to be successful will have a vested interest in your project’s success.

A word of caution: don’t mistake a demanding use case with an “extreme” use case. If you’re working on a Hill that concerns a family minivan, a race car driver is probably not a great candidate for a Sponsor User, no matter how interested they are in working with you.

3) Are they available to collaborate? A good Sponsor User is open and willing to share their expertise and experience with your team.
While being a Sponsor User isn’t a full-time job, it is a commitment. Set expectations, but be respectful of their time and be flexible around their schedule. What’s important is that their insights and ideas are heard.

If you’ re interested in becoming a Sponsor User with the IBM Cognitive Systems team, contact cary-anne olsen-landis at caolsen (at) ibm dot com. She’ll tell you more about the team and the products they’re working on.

A Tip on Getting Started with the PowerHA 7.2.1 GUI

Edit: Some links no longer work.

Originally posted November 7, 2017 on AIXchange

There are a lot of ways to get familiar with the new PowerHA 7.2.1 GUI:

In PowerHA® SystemMirror Version 7.2.1, or later, you can use a graphical user interface (GUI) to monitor your cluster environment.

The PowerHA SystemMirror GUI provides the following advantages over the PowerHA SystemMirror command line:

Monitor the status for all clusters, sites, nodes, and resource groups in your environment.
Scan event summaries and read a detailed description for each event. If the event occurred because of an error or issue in your environment, you can read suggested solutions to fix the problem.

Search and compare log files. Also, the format of the log file is easy to read and identify important information.

View properties for a cluster such as the PowerHA SystemMirror version, name of sites and nodes, and repository disk information.

More information is available in these videos from Shawn Bodily and Michael Herrera. And then there’s this virtual user group presentation.

I’m mentioning this now in part because someone recently asked me how to locate the fileset that’s needed to get it to work. While the requirements tell you what version of AIX you need to be running, they don’t tell you where to get the cluster.es.smui.server fileset.

For that, you need to go here and download the ESD_PowerHA_SystemMirror_v7.2.1_Std_Ed_122016.tar.gz archive. The package unzips into three directories: the installp directory, an smui_server directory, and a usr directory. While you might assume the filesets are in the installp directory, they’re actually found in smui_server. Credit to Shawn Bodily, who pointed this out to me. Be sure to keep this in mind as you do your own testing of the PowerHA GUI.

Just Back from Technical University

Edit: Some links no longer work

Originally posted October 31, 2017 on AIXchange

I haven’t written about the IBM Technical University lately, but rest assured, I continue to make time for it as schedules allow.

The most recent event took place in New Orleans two weeks ago. When you look at the list of presenters, there was a lot of technical firepower on hand. The Technical University isn’t an exercise in marketing or fluff. It’s technical information for technical people: there were lectures, hands-on labs and numerous opportunities to meet and talk to the speakers and attendees.

IBM puts on several of these types of events each year including Interconnect in March and the IBM Systems Technical University in May. The good thing about this iteration is that, compared to some of the other events, this one did not feel “huge.” There was plenty of room in most of the sessions, and, as I said, ample opportunities to interact with speakers, vendors, salespeople and other attendees in the Solution Center, networking center, and hallways.

The Technical University is a worldwide happening. The conference in Prague is coming up in early November, and future events are set for Cairo, Dubai and Florianpolis.

There’s nothing like being at a large gathering of your peers, as all of us learn from industry experts and from each other. On a personal note, it’s gratifying to have chance real-life meetings with people who’ve been reading this blog over the years. Plus there’s always the possibility of bumping into a fellow IBM Power Champion. (And on that note, nominations for 2018 Power Champions are now open.)

I recognize that it can be tricky to get away from the office, and that some of our employers balk at the idea of paying for training. But events like Technical University are worth it. And training, in general, adds value. It reminds me of an anecdote that I see all the time:

Two managers are talking about training their employees. The first asks, “What if we train them, and they just leave?” The second responds, “What if we don’t train them, and they stay?”

Anyway, for those of you who attended the conference, what were your impressions? And for those of you who haven’t been to a Technical University event, what has kept you from attending?

Patching: Seeking a Happy Medium

Edit: Still an ongoing issue

Originally posted October 24, 2017 on AIXchange

Let’s talk about patching. IT pros understand that it’s critical to patch in a timely matter. Or at least they should understand, but then, getting behind on patching was one factor in the Equifax breach (and many other breaches for that matter).

Even though patching is essential, having control over when and how you patch is highly desirable. When we’re talking about servers that run your core business, you should have absolute control over when and how you apply your fixes. Of course with this power comes responsibility. You should be coordinating with change control and testing changes in a test/dev/QA environment before anything is put into production, and you should be installing fixes in a timely manner, especially high severity fixes.

However, not everyone gets to decide when their patches get installed. Unless you’re inclined to go into your advanced settings and fiddle around a bit, from what I can gather recent Windows versions offer very little in the way of controlling how and when updates are made. I’ve seen Windows 10 systems reboot with no warning on “Patch Tuesday.” I realize this behavior is aimed at non-technical users, and their systems should certainly be kept reasonably current. Nonetheless, they should still have some control over the process. And it’s not just a workplace issue. I’ve seen patch downloads occur over metered connections when it would make more sense to allow these users to choose when to actually download the fixes. Not everyone has unlimited data, even at home; and this is certainly the case with most cellular users. If you’re using your phone as a wifi hotspot with a laptop, you don’t want your limited data allowance chewed up by a Windows update that could have waited till you got home.

Related to this, I’ve read news articles about people reporting system issues once patches were installed. How furious would you be if couldn’t do work following an unplanned reboot, or even worse, if your machine no longer rebooted at all? Imagine the chaos in your life if you no longer had access to your computer, especially if it happened when you were not expecting it.

The point is, if you’re in the middle of something and work gets lost to an auto-reboot, it’s counter-productive. I’d like to see a happy medium with consumer devices. Even my phone lets me postpone updates until it’s more convenient. As an IT pro, having a head’s up with these devices is valuable. I like to take a good backup before patching so it’s easier to roll back the changes if disaster strikes. That may not be possible with a machine that just reboots out from under you.

These are just things I’ve seen recently. To be honest, I’m not sure how widespread this issue is, or whether the fault lies primarily with Microsoft, corporate IT policies or users themselves. I’m just an AIX administrator with a blog, after all.

Perhaps the solution is to switch to Linux on the desktop–although that hasn’t worked out so well in Munich.

What are you seeing with patching, either in the enterprise or among your non-techie friends on the desktop?

A Hitch with SEA Failover Testing

Edit: Test test test.

Originally posted October 17, 2017 on AIXchange

A few months back, I ran into an issue during shared Ethernet adapter (SEA) failover testing. After upgrading to VIO server 2.2.5.10, we would fail VIOS1 and verify our disks and networks were functioning as expected on the VIO clients. Then we’d bring VIOS1 back online and fail VIOS2. The network would hang on the VIO clients.

When we checked the status of our SEAs on VIOS1, they would show up as “unhealthy.” The only way we could resolve this was to reboot the VIO server. This was unexpected behavior and not the way failover used to work.

Eventually we found that we could change the settings on the health_time_req attribute so that it would timeout sooner:

Health Time (health_time_req)
Sets the time that is required to elapse before a system is considered “healthy” after a system failover. After a Shared Ethernet Adapter moves to an “unhealthy” state, the Health Time attribute specifies an integer that indicates the number of seconds for which the system must maintain a “healthy” state before it is allowed to return into the Shared Ethernet Adapter protocol. The default value is 600 seconds.

It appears IBM is aware of this issue and working on a fix. Chris Gibson recently relayed this information:

APAR status
Closed as program error.

Problem summary
Given a pair of VIOS LPARs (2.2.5.x and up) with matching SEAs in HA mode (ha_mode set to auto or sharing) with one node in UNHEALTHY state, if the healthy node is rebooted or loses link, the UNHEALTHY node will not assume the PRIMARY state. In the field, a customer reboots the primary LPAR and waits until it is back up. Then the customer reboots the backup LPAR. Unbeknownst to the customer, the primary LPAR has gone into the UNHEALTHY state because the link came up slightly delayed.

When the backup LPAR is shutdown, the primary LPAR does not take over and become PRIMARY as it did before the upgrade.

Problem conclusion
Code changed to disable link check as part of health check and also reduce the default value of health_check attribute to 60 secs and minimum value to 1s.

This is another reason to do plenty of testing after updates. In our case we just went from 2.2.4.22 to 2.2.5.10, yet we were bit by this issue. For anyone doing VIO maintenance, it’s certainly something to be aware of.

Have you seen this type of behavior?

Power Systems Best Practices Doc Updated

Edit: I always look for the latest version of this document. Some links no longer work.

Originally posted October 10, 2017 on AIXchange

Not long ago I was asked about the Power Systems best practices document that I wrote about in March.

The reader who contacted me couldn’t download the presentation, nor could I when I tried. So I reached out to Fredrik Lundholm, the author, who assured me that it was still available. I tried again, and it worked.

In the interim, a new version of this doc, 1.18, was released. Download it here.

A tip: I’ve found that I need to click on the download button in the top right and select “direct download” in order to get it to work. If your download isn’t successful, you’ll see a timeout message stating that the file cannot be previewed.

Anyway, some highlights from the updated presentation:

  • Slides 13-15 cover the VIO server. Page 13 has VIOS policy, page 14 has the VIOS release lifecycle (showing VIO out into the 2023 timeframe), and page 15 shows network access mechanism information.
  • Page 19 shows VIOS virtual Ethernet tuning information, page 20 has SR_IOV and vNIC information, and page 21 shows storage information.
  • Page 26 has the AIX latest support matrix as of September 2017, page 27 has AIX policy information.
  • Page 36 has PowerHA recommendations, page 39 has Linux and IBM i notes on mtu_bypass and SEA performance.

If you’ve seen previous versions, it’s pretty easy to spot the changes. All new/updated slides are labeled “Updated 1.18” in red.

If you’re new to this, be sure to read this introduction on page 4:

This presentation describes the expected best practices implementation and documentation guidelines for Power Systems with AIX. These should be considered mandatory procedures for virtualized Power servers.

The overall goal is to combine simplicity with flexibility. This is key to achieve the best possible total system availability with adequate performance over time.

While this presentation lists the expected best practices, all customer engagements are unique. It is acceptable to adapt and make implementation deviations after a mandatory review with the responsible architect (not only engaging the customer) and properly documenting these.

Fredrik does a great job of presenting this information. Every update is well worth your time.

My Reading List

Edit: Some links no longer work

Originally posted October 3, 2017 on AIXchange

From time to time I’ll share some random links to AIX documentation I find online or via Twitter. But I also regularly read certain individuals, some who write about Power/AIX and some who cover tech more generally. I thought I’d share that list here:

Jay Kruemcke currently works for SUSE, but you might have caught him at IBM technical conferences in standing-room only, NDA-required, AIX trends and directions sessions. That’s my way of saying he’s a popular speaker. His blog delves into the personal at times, as he explained back in 2011:

One of the reasons I started this blog is to give me an opportunity to discuss topics outside of just IBM AIX and Power Systems. One of my professional passions is product management – the process of creating and managing a product or offering from inspiration through launch, product maturity and eventually the withdrawal of the product. It is a way to “own” a piece of the business and put your own unique mark on a company.

The author of this blog chooses to remain anonymous. As he explains:

I’m my just a simple dumb sysadmin who loves Unix systems and who loves to blog.
I’m now blogging for more than seven years and it has always been for me a way to better understand the things I am working on and a way to share the knowledge. I do not do this for recognition or fame. It’s just my way to thanks all the people who are blogging around the world and to give back what they gave to me: knowledge

I do this for free. I do not accept any donation, or any offers related to money. This blog will stay ad free forever.

For some personal reasons my name will never appear on this website. I prefer to stay anonymous even if most of you will probably find a way to know who I am or already know my real identity.

Recent posts include “Managing a Docker Swarm Cluster with Prometheus” and “Building a Docker Swarm as a Service.”

Bartlomiej Grabowski writes about IBM Power Systems–IBM i as well as AIX:

First of all, I’m pleased to welcome you to my blog. My idea was to create a simple website, where a user can easily find information about IBM i/iSeries/System i/AS400 (so many names for the same system over the last 15 years), AIX, Virtual I/O Server, PowerVM features, and POWER Systems. There are a number of sites about VIOS and IBM i, but I couldn’t find one where PowerVM features are described from the IBM i perspective. I’m also going to publish some simple scripts, and programs which I think might be useful.

Now, let’s move on to some background info about me. My name is Bartlomiej Grabowski, and I’ve been working as principal system support specialist. Main areas of expertise include IBM i, AIX, PowerVM , VIOS, Power Systems hardware. Specifically, I have had the pleasure to work with solutions based on software, hardware replication, DS 8k, SVCs, independent ASP, and dozens of LPARs, and servers. Also, I have collaborated with IBM, and other experts creating several Redbook publications.

Recent posts covered administrative domains on IBM i and LUG 2017 at IBM Rochester.

Brian Krebs doesn’t cover Power Systems, but his stories around the security field are usually very interesting and unique:

He’s recently written about the Sonic, Deloitte and Equifax breaches. I also recommend checking out “Who is Marcus Hutchins” or “Twitter Bots Use Likes, RTs for Intimidation” to get an idea of the kind of information he provides.

Accelerate with IBM Storage lists upcoming calls around different IBM storage topics. Call replays are available if you can’t listen live.

Of course I have to include Nigel Griffiths and Chris Gibson. Both write about new hardware, tools, tips and more.

Gareth Coates authors “Tips of the Power Masters.” These are practical, easy to understand and easy to implement solutions.

The Linux on POWER blog has a self-explanatory name. Recent headlines include: “Red Hat now supports Containers on IBM POWER Systems” and “IBM Advance Toolchain for Linux on Power 11.0-0 released.”

These are my go-tos. Who do you read? Make your recommendations in comments.

Design, Customize and Buy Your OpenPOWER LC Server Online

Edit: Have you bought servers using this method?

Originally posted September 26, 2017 on AIXchange

Did you know how easy it is to design your own OpenPOWER LC server? Here’s a hint: it’s pretty easy.

Just go here and select your server. Your choice will be customized with various workload types, including Hadoop and Spark Analytics, memory intensive clusters, open source DB, deep learning and GPU-accelerated.

Depending the option you choose, you’ll be presented with a different server type. Then you’ll be able to select your chassis. For instance, if you choose GPU accelerated with NVIDIA NVLink, you get:

IBM Power System S822LC for High Performance Computing
Tackle new problems with NVIDIA Tesla P100 on the only architecture with NVIDIA NVLink — eliminating barriers between CPU-GPU.

Experience unprecedented performance and application gains with the new POWER8 with NVIDIA NVLink — delivering 2.8X the CPU-GPU bandwidth compared to x86 based systems.

IBM Power Systems S822LC for High Performance Computing pairs the strengths of the POWER8 CPU with 4 NVIDIA Tesla P100 GPUs. These best-in-class processors are tightly bound with NVIDIA NVLink technology from CPU-GPU — to advance the performance, programmability and accessibility of accelerated computing and resolve the PCIe bottleneck.

For memory intensive clusters, there’s an additional option of either a 1U or 2U system. Here’s the 1U description:

IBM Power System S821LC
A dense, high-data throughput server for your enterprise and cloud.
Compute-intensive workloads can now benefit from two POWER8 processors in a 1U form factor. This server delivers the density your business needs for virtualization, database and HPC deployments.

IBM Power Systems S821LC brings open innovation and high-density computing to the Linux server market with superior virtualization, incorporating POWER8 processors, tightly coupled FPGAs and accelerators and faster I/O using CAPI. Optimize processing power while simultaneously increasing workload throughput and reducing data center floor space requirements.

And here’s the 2U version:

IBM Power System S822LC for Commercial Computing
Open standards-based system designed to simplify and optimize your data center.
Open standards-based system that provides flexible deployment options for hybrid cloud, big data and business-critical applications.

The IBM Power System S822LC is designed to deliver superior performance and throughput for high-value Linux workloads, such as industry applications, big data and LAMP. With greater reliability, serviceability and availability than competitive platforms, the Power System S822LC incorporates OpenPOWER Foundation community innovation for clients that need to run big data, Java, open source and industry applications.

From here, simply click on “build your server” and you’ll be presented with options for your processor, memory, storage and PCIe cards. In my 1U example, I chose the 2×8 core option from this list:

    1x 8 core CPU at 3.32 GHz (8x POWER8 cores)
    1x 10 core CPU at 2.92 GHz (10x POWER8 cores )
    2x 8 core CPUs at 3.32 GHz (16x POWER8 cores)
    2x 10 core CPUs at 2.92 GHz (20x POWER8 cores)

Then I picked 16 DIMMS at a 32G DIMM size, giving me 512G total.

For storage you can chose from NVMe, SSD, SAS or SATA drives, and also tailor the size and quantity to your needs. There are also options for adapter cards.

Once you’ve made your selections, you’ll advance to your server config. Here you can download server specs and view the starting price.

Then just click on “purchase now,” proceed to checkout, and login with your IBM ID to finalize the purchase. Like I said, easy.

Have you been purchasing systems this way? Let me know in comments.

POWER9: What’s Already Out There Says Plenty

Edit: At the time of this writing we are talking about POWER10

Originally posted September 19, 2017 on AIXchange

In March I wrote about the POWER9 roadmap. More recently, I sat in on a confidential briefing about the upcoming release. All I can really say about it is that some exciting things are coming, and I can’t wait to share the details with you.

Of course, per the confidentiality agreement I signed, I will have to wait. But the thing is, if you look at what’s already been publicly divulged about POWER9 (see hereherehere and here), you’ll get a clear, if incomplete, picture.

Here’s what I’ll add to that: If you look at the roadmaps for AIX and POWER, you’ll see that IBM delivers its solutions at a consistent pace. So if you consider the timelines of previous releases, it’s safe to assume that we won’t have to wait much longer for new products.

Plus, this supercomputer is already running a POWER9 solution:

Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 4,600 nodes when it arrives in 2018. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect.

As I said, I’ll write more as soon as I can. For now, a quick show of virtual hands: How many of you have made the move to POWER8, and how many plan to make the move to POWER9?

Some Familiar and Not So Familiar Uses of ifconfig

Edit: Some links no longer work

Originally posted September 12, 2017 on AIXchange

In our line of work, you never stop learning. But I also believe it’s important to refresh one’s memory by revisiting some basics from time to time.

For instance, we should all know the OSI model, which is described here.

How well do you know your ifconfig commands? Here are some examples:

When do we use ‘ALIAS’? Consider the following command:

# ifconfig en0 <ip address> <subnet mask> alias

What is the function of ‘alias’ here? Alias is basically used to assign more than 1 ip address to a single interface.

For eg:
# ifconfig en0 192.168.0.2 255.255.255.0 alias

This command will assign 2 ip addresses to a single interface en0.

If no ‘alias’ is used:
# ifconfig en0 192.168.0.2 255.255.255.0

This command will replace the earlier ip address for the interface en0 with a new ip address.
So, by using ‘alias’, we can assign 255 ip addresses to a single interface.Or maybe you want to remove the TCP/IP configuration on a host:

The rmtcpip command removes TCP/IP configuration on a host machine. The basic functions of this command is:

* Removes the network interface configurations.
* Restores /etc/rc.tcpip to the initial installed state.
* Restores /etc/hosts to the initial installed state.
* Removes the /etc/resolv.conf file.
* Removes the default and static routes.
* Sets the hostname to localhost.
* Sets the hostid to 127.0.0.1.
* Resets configuration database to the initial installed state.

Like I said though, you never stop learning. As it pertains to ifconfig, awhile ago I become aware of an interesting option that I hadn’t tried. Chris Gibson tweeted about it:

Move IP address seamlessly from one interface to another. # ifconifg en0 transfer 10.1.1.10 en1

He included this link from the IBM Knowledge Center:

transfer tointerface
* Transfers an address and its related static routes from interface to tointerface. For IPv6, this command works only for addresses added by using the ifconfig command.
* ifconfig interface addressfamily address transfer tointerface

Note: If you want to transfer an IP address from one interface to another, and if the destination interface is not part of the virtual LAN (VLAN) to which the IP address belongs, you must add the VLAN to the adapter on which the destination interface is configured.

This is certainly handy. Did know it was available? Have you used it?

Tech Changes, but Teaching Doesn’t

Edit: We were all beginners once

Originally posted September 5, 2017 on AIXchange

Even though it was published in the earliest days of the internet, this 1996 article about helping people learn to use computers still rings true.

Do you find yourself falling into any of these traps when you’re teaching users about your systems? This was written at a time when people were logging onto the network using modems and operating systems that are primitive by today’s standards, so obviously things are quite different now. In a lot of ways, things are better now. Twenty years ago, few people outside of IT knew much about computer basics or getting online:

Computer people are fine human beings, but they do a lot of harm in the ways they “help” other people with their computer problems. Now that we’re trying to get everyone online, I thought it might be helpful to write down everything I’ve been taught about helping people use computers.

First you have to tell yourself some things:

Nobody is born knowing this stuff.
You’ve forgotten what it’s like to be a beginner.
If it’s not obvious to them, it’s not obvious.

Have you forgotten what it’s like to be a beginner? Most users today have literally grown up with computers, but that doesn’t mean they really understand what goes on under the hood. Pretty much anyone under the age of 80 can get online–and quite a few octogenarians can, too! However, simple access to mobile devices and user-friendly operating systems doesn’t make anyone a techie.

And don’t ignore the learning curve for actual techies, either. Not all of us come from UNIX/Linux backgrounds and have spent decades working from a command line. You may be comfortable with AIX or the VIO server, but these environments can be intimidating to newbies:

Beginners face a language problem: they can’t ask questions because they don’t know what the words mean, they can’t know what the words mean until they can successfully use the system, and they can’t successfully use the system because they can’t ask questions.

You are the voice of authority. Your words can wound.

Computers often present their users with textual messages, but the users often don’t read them.
By the time they ask you for help, they’ve probably tried several things. As a result, their computer might be in a strange state. This is natural.

They might be afraid that you’re going to blame them for the problem.

The best way to learn is through apprenticeship–that is, by doing some real task together with someone who has a different set of skills.

Your primary goal is not to solve their problem. Your primary goal is to help them become one notch more capable of solving their problem on their own. So it’s okay if they take notes.

Personally, I love WebEx and other screen sharing technology. There’s nothing better than getting on a shared screen session and a call and walking through an issue. There are multiple ways that I can coach them through it. They can just watch me, or even better, I can watch them figure it out for themselves:

Don’t take the keyboard. Let them do all the typing, even if it’s slower that way, and even if you have to point them to every key they need to type. That’s the only way they’re going to learn from the interaction.

Try not to ask yes-or-no questions. Nobody wants to look foolish, so their answer is likely to be a guess. “Did you attach to the file server?” will get you less information than “What did you do after you turned the computer on?”

Take a long-term view. Who do users in this community get help from? If you focus on building that person’s skills, the skills will diffuse to everyone else.

Never do something for someone that they are capable of doing for themselves.

Take the time to read the whole thing. And if you like that, you may want to peruse this archive of articles on a variety of interesting topics that were published from 1994-1996.

It’s interesting to see not only how technology has changed over time, but what hasn’t changed. I can’t help but wonder what it will be like 20 years from now. What will be different? What will be the same?

My Workout Day at the Data Center

Edit: It can still make for a good workout

Originally posted August 30, 2017 on AIXchange

As I’ve noted previously, IT pros aren’t the healthiest lot. But if you spend any time setting up new hardware in data centers, you’re at least getting a workout.

This occurred to me while I was recently unboxing and racking customer setup equipment, including V7000, V5000 and V9000 storage units and several POWER8 servers. In a sense it’s like opening presents on Christmas. I’m always amazed to see the effort and care that goes into the packing and shipping of this gear. It’s done in such a way that the boxes can take some abuse (which they often do) while the contents survive quite nicely.

Keep in mind that many environments don’t allow cardboard in the computer room, so most of the gear must be unpacked and transported at least a short distance to get it to the raised floor. Even with carts and lift tools and sufficient manpower, a lot of this stuff is pretty heavy. On top of that, you may find that you’re unloading hardware in areas where the facility’s A/C isn’t up to snuff, at least compared to the chilly computer room. And once all the boxes of servers, controllers, expansion drawers and disks are opened, you’re also dealing with a fair amount of trash, so it’s good to have a roomy staging area and a plan for waste management.

The point is, it’s easy to take for granted what goes into this process as well as what it takes out of you. After doing several racks worth of equipment, you might find yourself a little sore the next day, so be sure to build in some recovery time into your project plan.

My most recent “workout” at the data center has me convinced that we could develop technology-related, CrossFit-type programs based on these activities. For sure there’s plenty of bending/lifting/hauling/kneeling, etc. that goes on. And if you don’t have power tools, the simple acts of installing rails and tightening screws and attaching cables must equate to various familiar exercises. Why put yourself through dead lifts, squats or bench presses when you can just do a rack and stack?

My plan is still in the inception stages, but I have to think people would happily pay for this type of workout. People pay to go to hot yoga; what could I charge them for time spent in hot and cold aisles in a computer room? Or maybe The Techie Fitness Spa (the name’s also a work in progress) will be more like those strongman competitions or the gyms where they toss old tractor tires around. But instead, I’ll have racks and drawers and everything else we deal with. To make my establishment stand out, I could recreate the whole computer room experience by throwing in a few man traps, retina and fingerprint scanners, and bag checks.

No doubt, the money will be rolling in soon. I just hope that nobody tries to steal my idea in the meantime.

Is Anyone Interested in a Real-Time AIX Forum Using Slack?

Edit: I am still using it with the IBM Champions

Originally posted August 21, 2017 on AIXchange

I recently started using Slack. It’s a group messaging tool that seems to be making inroads at IBM. There’s also a channel for IBM Champions, which is the one I joined. Despite my limited experience with Slack, I can see some interesting possibilities with it, which I’ll get to at the end of this post.

So what is Slack? I’ll let IBM’s Chuck Calio explain it. Chuck created a presentation, and with his permission I’ll share some details from the slides with you.

He starts by explaining some of the limitations of our current communication methods. While some of this is IBM-specific, I’m sure you can find it relatable:

•    Email: good for formal communications, overload from way too many, easy for discussions to fragment, key people often left out, responses often too slow.
•    Conference calls: good for education/intros, 1:1 sensitive calls, very limited active participation by the entire group
•    Connections/communities: share useful resources with extended team, forums for discussions, blogs, wikis [but] lack modern application integration.

Here’s how Chuck defines Slack:

Slack is a next generation real time “collab app” aimed at businesses rather than individuals. It’s optimized for teams that will interact with each other around specialized topics (channels). Slack’s strength is around creating an open transparent collaborative “web” of many diverse people to accelerate global team collaboration and innovation around specialized topics.

The Benefits of Slack

Why is Slack different from what we have today? Here are six reasons:

1. Enables information transparency across large distributed global and diverse teams; drives, enhances collaboration and accelerates innovation (vs. private, individual chats and 1:1 learning).
2. Encourages people to collaborate around specific topics (channels), across big groups (teams), across business units (IBM, non-IBM).
3. Slack is optimized to work across multiple devices (PC, laptop, tablet, mobile).
4. Slack chats build up into a corpus of searchable deep knowledge and makes it easier for new team members to quickly come up to speed.
5. Support ecosystem of hundreds of modern applications, many deeply integrated.
6. Capability for bots integrated into Slack, built in analytics.

He further helps define a few concepts in Slack, starting with a team:

Groups of people (from two to tens of thousands) that share a common purpose or interest (teams) interact around specialized topics (channels). Typical activities include sharing content, asking questions, getting/giving help, generating or testing new ideas, etc.

Slack [allows users to create] an open, transparent, collaborative “web” of many diverse people to accelerate global team collaboration and innovation around specialized topics.

Individuals can/should be part of and contribute to multiple Slack teams and channels.

Slack works best if a large percentage of the team is actively engaged and contributing in channels.

Channels, according to Chuck, are “focused group discussions, messages, notifications and collaborations.” They’re organized by:

•    Topic (#openpower)
•    Purpose (eg, #sales-tv-ads)
•    Department
•    Announcements
•    Practices
•    Anything else you want

Here’s his comment on threads:

With threads, you can branch off and have discussions around a particular message without having to skip to a different channel or a DM. Threads move conversations to your sidebar—where you can ask and answer questions, give feedback, or go off on an inspired tangent.

To use, click the “start a thread” button on any message.

Here’s what he said about conducting “stand-up” meetings:

The beauty of doing stand-ups in Slack is that each person can post their status at any time, and it can be read asynchronously by everyone else. Our team’s rule is that you just need to post your status in the stand-up channel on Slack at some point in your day. It needs to be a meaningful update, not just “I’m doing work today.” Once a team member reads the other statuses, they can take action on it at that time. Pointing out blockers is especially helpful, so that other people can see what might affect their progress and think about how their work affects others.

And finally, some some hints and tips:

Communicate in public channels whenever possible. By keeping most of your conversations open to all team members. Benefits include:

•    Leverage the wisdom of the crowd.
•    Get answers and responses from SMEs faster.
•    Build a database of organizational knowledge with near zero effort.
•    Draws more of your team into Slack. (No one wants to miss out on critical conversations!)
•    Gain visibility into the latest happenings in your areas of interest.

If you remember irc (which is still a thing, by the way), I’d say Slack is a modernized version of that. It’s new and shiny and seems easy enough to use.

A Public AIX Slack Channel?

Do you think setting up a public AIX Slack channel would be a good idea? I’m serious about this. Take a look at this tool and imagine a forum that provided real-time help and communication and collaboration for AIX users across your desktop and mobile device. Would you be interested in joining that group? Maybe something like this would quickly become too big to manage, but I find the idea very intriguing. So please, let me know what you think.

Taking Your HMC to the Cloud

Edit: Some links no longer work.

Originally posted August 15, 2017 on AIXchange

Have you heard about the new Cloud Management Console (CMC)? It provides a new way of managing our environments from a single pane of glass. The data from your HMC flows to a central location, and you manage it there. If you have multiple HMCs in a large environment, that’s a great convenience.

Enterprises with one of the newer Power Systems C server models are eligible to receive three years of “free” access to the product. Otherwise, check with IBM or your business partner for specific pricing details.

Rather than install (and manage and patch) software, your HMC is set up to connect to IBM’s cloud, allowing the device to send data about your environment. This being a cloud-based model, you can access the CMC from a mobile device or any browser.

Alternatively, you can give it a spin with IBM’s hosted trial:

Scroll to the bottom and click on the sponsor agreement. Select “I agree” and click on “I confirm.” You’ll then get an email with instructions on gaining access to the platform so you can try it out.

To run it in your own environment, your HMC must be at V8R8.6 SP1 PTF MH01698. Once you’ve ordered the product, you’ll be able to register with an existing IBM ID. From there, you can select the unique subdomain you’ll be using for your enterprise. IBM will provide you with an API key. Copy and paste the key into the CLI on your HMC, and then start Cloud Connector using the chsvc command.

Once your HMC data is loaded into the CMC, you can filter and search the information that has been collected. In the performance app you can see utilization trends, gather data about performance and capacity, check your servers’ allocation levels, view performance data, and more.

You can manage users, permissions, and access to the tool, and any apps that aren’t needed can be shut off. Blacklists can also be enabled, so if you have a managed system from which you don’t wish to forward information, it will not be sent. In addition, it’s possible to connect your HMC to IBM through a proxy if need be.

Support can be obtained by opening a ticket in Zendesk. As this is a subscription and you’re buying a service, traditional avenues of IBM support aren’t available.

Incidentally, if you’re wondering, IBM is well aware of the concerns regarding cloud technology. During a recent training webinar I attended, it was mentioned that some think of cloud as a dirty word and don’t want anything to leave their data center. The counter to this argument was a simple question: Have you set up Call Home to IBM? Do you trust that? That’s another example where information flows from your data center to IBM. Why have a problem with one when you rely upon the other? It was also noted that all apps are read-only, and that nothing comes into your data center from the outside.

The final point made at the webinar is that IBM isn’t gathering information for the sake of doing so; they want to aggregate data and use their expertise to help their customers. They want to find the connections and insights that are buried within your data that can help your business.

Going forward, additional capabilities and applications will be brought to the CMC. Eventually, Project Monocle will be incorporated.

For details, see the data sheet:

The IBM Cloud Management Console for Power Systems provides a consolidated view of the Power Systems cloud landscape including inventory of systems and virtual components, performance information and logging. The Cloud Management Console is hosted in the IBM cloud and can be accessed securely at any time enabling system administrators to easily run reports and gain insight into their Power cloud deployments. This solution has been built for mobile devices, tablets and desktop browsers enabling cloud operators to enjoy convenient access to this application.
And here’s the announcement letter:

IBM Cloud Management Console for Power Systems is a software as a service (SaaS) offering that provides enterprise-wide performance, inventory, and logging insight for IBM Power Systems servers. This SaaS offering gives clients a central enterprise-wide view of their Power Systems servers without having to install or maintain software at their data center.

Saved by uuencode (yes, uuencode)

Edit: When was the last time you used this?

Originally posted August 8, 2017 on AIXchange

I’d honestly forgotten about uuencode until recently, when I actually needed it:

Uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.

The name “uuencoding” is derived from “Unix-to-Unix encoding”; i.e. the idea of using a safe encoding to transfer Unix files from one Unix system to another Unix system but without guarantee that the intervening links would all be Unix systems. Since an email message might be forwarded through or to computers with different character sets or through transports which are not 8-bit clean, or handled by programs that are not 8-bit clean; forwarding a binary file via email might cause it to be corrupted. By encoding such data into a character subset common to most character sets, the encoded form of such data files was unlikely to be “translated” or corrupted, and would thus arrive intact and unchanged at the destination. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary (and especially compressed) files by e-mail and posting to Usenet newsgroups, etc.

It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are instead transferred with base64 encoding.

I was working with IBM Support, trying to troubleshoot a server that didn’t have a working network connection. It was connected to the HMC, and I could create a console. IBM wanted me to take a snap and send it to them. Taking the snap was easy, but how could I send it without a network connection? uuencode to the rescue.

IBM gave me these steps:

    #snap -r
    #snap -ac
    Use putty to ssh into the HMC as hscroot.
    #vtmenu
    Select the managed server and the lpar.
    Login as root.
    Left-click on the putty window top left corner icon.
    Select change settings.
    Go to logging and check “All session output.”
    Click on browse to locate where the file will reside locally (in this case, on my laptop).
    Go back to the LPAR and run:

    #uuencode /tmp/ibmsupt/snap.pax.Z /tmp/snapAt this point, the file was “uuencoded,” and these characters scrolled over my screen (and were also logged to the file I’d specified):

M=/A%;-$,#3.)G\10XBAQ/[@=R52H#)0!`,.P8?BPEG@Z++04$0>)CY9@IMUP
M?IC]\)#L6P*$-<.”R_XPCUA*C+8\-\(‘9(>BX2QQ8[@S'”*V!TN&=4\1`MTP
M/WA,?,5X6HZ$?\,E81:Q<-A%S”9F6FHT89Z]@_7P&0`L[!YN$7.)W54_XLCP
M7’@#Q2’X68:’C\0>QR3QR’%)7″AF6H83H`-5D=5P<EA5M%DH!2&(GPR?XE&D
M5<%”+”HF”Y.*K8>G8O(*2*?.TRI&”[^*3<7VAMF”K!A4O”JN$0^&,\0_);<#
MKGA-0UOH%!N+C\7(XF2QLGA9S”QN%CN+G\5A3<$P:QA8S!66$TN’-:.18MLP
M/M@IJW_U-^Z)3<.8XHRCIMCCR”F&%I^&F5$8XL</NMA&#”R6″*L%V,)SNSFQ
M7[A.#-6<%’>$+<7:83*QG]@UO!\&#@N*/\3FXHRDM#@YO”[^%4>$:\**X0[Q
MEE@>G)7*%M.’Z<)!X2*Q:#A<+’$<%Z<MW\,!8XPD5%B,*156!U.%!PH(8JQP
M540K/$3(&)N,3\9.X8$Q6;A@/”V.”&^(,9J-XD$QE]@’51TF$=>%A<)\XHJC
MF?A/S!WV%RN),<8HXYTFRYA@7″N^$2>&8<8*8WFQ<+CR]!BV$W^)A\1A8OPP
MN%A?O#/F%ZN(?<:;8J!QT#@=Z23>&GN-O\8O8@EQT9A:G!*V%F.,L<4]X2]Q
M;E@?^2TN#5.,(XX68XECN1AL/(X4&U.(6\,QX@LQNS@B_!S.6L:,`<9H8Q&Q
MO3AZ@!V^%%.-8YIG8D#Q9OAG[“BF&Y<C[<;IXO$3J_ANW”Y&&/N-D\;7XI(F
MKBQMO”VV#^.,I\9]XJKQ7CA-_”]>$SN.R9$”XG9″QSBQ8″`&&>>*1<:4I05Q
MR?AT3#NN'<LL(<=[8\FQK[AEK”%F(+2′,<=G8\>HPWA$S!&.’IB(I<9NX\.Q
M5E-Q3#IF’,N,;<<F1]RQP5AW+”.>’J>%<<278QWQPIA.[“/F’->’A\2R`XGQ
M9_AM#’&,&T<<Y\;0X[1E13-]S#YN'[N#J<35OTL1EABD”3CF’M.,O<3;XOO9
M/MAP’#I&’/.,^\6!8N<QP-A]G’%T$T,/X\3TXS4QP[A+[#3.’^>)V<;\8YVQ
M_]AJC”3&&I.+M<8$9(CCH9A1JRAF%`./'<71TM]8S?@@62G^'”>/^\?+XYYQ
*`#EKW#C.($,<`<8$
`
end

I then sent the file to IBM and let them handle the decoding, but it’s easy enough to do it yourself. I opened the file with vi and removed some garbage at the beginning that was a result of logging everything with putty. My cleaned-up version started with:

    begin 600 /tmp/snapI renamed it so it ended with a .uue extension. Then I opened the file using 7zip on a Windows machine. Alternatively you could move the file somewhere where uudecode is installed and decode it that way.

The point is that not having network access doesn’t mean you can’t move files around. If you can at least get to the console, you can still transfer files.

Ultimately, I was able to get IBM the information they needed. But let’s get back to uuencode. It’s mentioned in these tips and tricks:

63. How do you send an attachment via mail from AIX?

Uuencode is the answer:
uuencode [source-file] [filename].b64 mail -v -s “subject” [email-address]

For example:
# uuencode /etc/motd motd.b64 mail -v -s “Message of the day” email@hostname.com
I use the .b64 extension which gets recognized by Winzip. When you received your email in Outlook, you will have an attachment, which can be opened by Winzip.

Have you used uuencode/uudecode lately? Does this topic bring back any old memories?

Big Changes are Coming to the HMC

Edit: Some links no longer work

Originally posted August 1, 2017 on AIXchange

Are you bored with that same old x86 version of the HMC that you’ve used for years? Are you tired of the same old interface that you’ve long mastered? Are you ready for a change? Well ready or not, change is coming.

Beginning with V8R8.7.0, there will no longer be an option to run the classic HMC GUI. Everyone will need to get on board with using the enhanced version of the GUI. The good news is that is performance with the enhanced GUI has improved since it originally came out, so now is the time to take another look.

Even if you like the “classic” HMC GUI, or if you’re simply accustomed to it, this development shouldn’t come as a surprise. The enhanced GUI option that’s available when you log into your HMC has been around for more than two years now.

Additionally, we’ve had the option of using the virtual HMC (vHMC) for some time as well.

Until now, those were your choices regarding the HMC: You could run the HMC on dedicated x86 hardware and log into the classic or enhanced GUIs, or you could run the vHMC in VMware or KVM and log into the classic or enhanced GUI. That was it.

But since IBM Power Systems hardware will soon be capable of running HMC code, we’re about to have four HMC options: In addition to being able to run a vHMC on x86, or an HMC on dedicated x86 (as you’re probably used to), you’ll also be able to run a vHMC on a POWER LPAR. This is new. Last but not least is the other new option: we’ll be able to run HMC code on POWER8 hardware once the new HMC model becomes available later this year.

If you’d like to learn more, here are a couple of good resources. First, there’s this AIX Virtual User Group replay. Download the slides (here and here) and listen here. For something quicker, check out this Q&A.

While the original iteration of the vHMC is certainly interesting, I cannot wait to test out a vHMC running in an LPAR on a POWER server. I’m also excited about the new hardware offering, which should GA in the second half of 2017. Since it’s a POWER8 server, we can finally manage our Power Systems hardware fleet without any need for x86 in our environment at all.

So where are you headed with the HMC? Will you stick with x86 HMCs, or will you move to a Power Systems version? Do you have plans for virtual HMCs?

Once I get hands on with the different options, I’ll be sure to share my thoughts and findings with you.

The Place to Go for AIX Updates

Edit: Some links no longer work.

Originally posted July 25, 2017 on AIXchange

I’ve previously mentioned my fondness for reading technical documentation. Another great resource along those tech doc lines is the AIX updates IBM provides. For instance, here’s what’s new for AIX 7.2, and here’s the update for AIX 7.1.

Both of these pages provide links to documentation and information that has been changed, broken down by month.

For example, on the AIX 7.1 page, you’ll find:

June 2017
The following information is a summary of the updates that are made to the AIX 7.1 documentation:

  • Added information about the Fibre Channel Adapter Outstanding-Requests.
  • Limit tunable parameter in the Disk and disk adapter tunable parameters topic.
  • Added information about statistics monitoring in the Monitoring cache statistics topic.
  • Added information about the lsmpio command in the MPIO-capable device management topic.
  • Added description about the pr_sysset array in the /proc File topic.
  • Updated information about new installation images in the Installing optional software and service updates using SMIT topic.
  • Updated information PSz, APP and other metrics in the topas command.
  • Updated information about new installation images in the install_all_updates command.
  • Updated information about the -w flag in the chlv command and the mklv command.
  • Updated information about the -cio option in the mount command.
  • Updated information about the icmptimestamp flag in the no command.
  • Updated information about the -L flag in the lslpp command.
  • Updated information about the pinned pages statistics for the -v option in the vmstat command.

April 2017
The following information is a summary of the updates that are made to the AIX 7.1 documentation:

  • Added an example for the sleep, nsleep or usleep subroutine.
  • Updated information about the application servers and the database servers in the nimadm command.
  • Updated information about the flags in the ld command.
  • Updated information about the /tmp directory files in the Migrating AIX topic.
  • Updated information about removing an adapter from EtherChannel in the Making changes to an EtherChannel using Dynamic Adapter Membership topic.

Here’s a glimpse at the AIX 7.2 page:

June 2017
The following information is a summary of the updates that are made to the AIX 7.2 documentation:

  • Added information about the Fibre Channel Adapter Outstanding-Requests.
  • Limit tunable parameter in the Disk and disk adapter tunable parameters topic.
  • Added information about statistics monitoring in the Monitoring cache statistics topic.
  • Added information about the lsmpio command in the MPIO-capable device management topic.
  • Added description about the pr_sysset array in the /proc File topic.
  • Updated information about new installation images in the Installing optional software and service updates using SMIT topic.
  • Updated information PSz, APP and other metrics in the topas command.
  • Updated information about new installation images in the install_all_updates command.
  • Updated information about the -w flag in the chlv command and the mklv command.
  • Updated information about the -cio option in the mount command.
  • Updated information about the icmptimestamp flag in the no command.
  • Updated information about the -L flag in the lslpp command.
  • Updated information about the pinned pages statistics for the -v option in the vmstat command.

April 2017
The following information is a summary of the updates that are made to the AIX 7.2 documentation:

  • Added information about vSCSI disk support in the Live Update restrictions topic.
  • Added information about the thin-provisioned Shared Storage Pool (SSP) storage in the Best practices for the Live Update function topic.
  • Added an example for the sleep, nsleep or usleep subroutine.
  • Updated information about the application servers and the database servers in the nimadm command
  • Updated information about the flags in the ld command.
  • Updated information about the /tmp directory files in the Migrating AIX topic.
  • Updated information about removing an adapter from EtherChannel in the Making changes to an EtherChannel using Dynamic Adapter Membership topic.

This information goes back to July 2016.

On the left-hand side of each page you’ll find a welcome page, links to the what’s new pages, release notes, and an alphabetical listing of the commands that can be found in the documentation. Be sure to check it out.

Much has Changed, but Not Everything

Edit: Time keeps marching on

Originally posted July 18, 2017 on AIXchange

This blog came to life 10 years ago this week, on July 16, 2007. When I started writing AIXchange, my sons were 8 and 4. Now my oldest is a high school graduate and my youngest is a freshman. Time marches on.

When I started, I was writing about POWER5 and AIX 5.3. HMC Version 7, Live Partition Mobility, WPARs and virtual optical devices were all new solutions/technologies. Within months, I’d turned my attention to AIX 6.1, and then to POWER6.

Now we’re running AIX 7.2 and we await the arrival of POWER9. More of us are running Linux workloads on Power hardware. There is more talk about cloud, cognitive, AI, Blockchain, PowerVC, Live Kernel Patching, flash cache and flash storage.

Some things change. Some don’t.

IBM i is still going strong despite the naysayers talking about the end of legacy hardware and legacy applications. And AIX? Sure, some workloads are migrating away from AIX, but AIX on Power is still an engine that runs core businesses. On premises solutions remain relevant in today’s world.

As I wrote here, I still love AIX as an enterprise operating system. Even as I do more with Linux, I appreciate the simplicity and goodness of AIX.

I still love attending conferences and meeting readers. I love engaging with people on Twitter and finding links to new information and technology.

I am sure that part of the reason I have been named an IBM Champion is due to this blog.

There is always something to learn, and hopefully I’ll continue to be able to share information for at least another 10 years. So maybe I’ll get to talk about POWER11 or POWER12, or AIX 10. I don’t know what will happen, but I look forward to seeing what the future holds for me and this industry.

Thanks for reading.

Booting an LPAR from a USB Port

Edit: Some links no longer work

Originally posted July 11, 2017 on AIXchange

Have you booted LPARs from your USB ports? It was much easier than I thought it would be.

I had been a little worried after reading the intro to this article:

Note from the editor: There is limited USB support in AIX and only specific devices are supported. Other devices might work, but IBM does not support their usage. JFS2 file systems are not officially supported on USB devices, but you can try this at your own discretion.

I guess I should start from the beginning: Recently I was talking with someone who was having networking issues that prevented him from using his NIM server. He wanted to know if he could use a USB flash drive to install his LPAR instead. While this has been supported for quite a while (see here and here), I hadn’t taken the time to mess with it.

From the explanations in these posts (here and here), it seemed easy enough, though. We had a test machine, so we gave it a try.

First, we used dynamic LPAR (DLPAR) to get the USB controller attached to this test LPAR. On this system the adapter came up as Universal Serial Bus UHC Spec. After we attached it and ran cfgmgr, we verified that the device was there (usbms0 is what we were interested in).

    # lsdev | grep -i usb
    usb0       Available       USB System Software
    usbhc0     Available 00-08 USB Host Controller (33103500)
    usbhc1     Available 00-09 USB Host Controller (33103500)
    usbhc2     Available 00-0a USB Enhanced Host Controller (3310e000)
    usbms0     Available 2.3   USB Mass Storage

Then we checked for a virtual CD or an .iso image that was mapped to this LPAR. No dice on either. So I decided to copy a physical DVD to the virtual media library and present that to the client LPAR:

    # lsdev | grep cd

Nothing came back, so in the VIO server I ran:

    mkvdev –fbo –vadapter vhost4

This set up the virtual optical device that was connected to the LPAR.

Then, with the physical CD loaded in the drive, I ran:

    mkvopt –name aix7disk1.iso –dev cd0 –ro

This created the .iso image in the /var/vio/VMLibrary filesystem.

After it finished copying from the physical CD, I was able to load it in the virtual CD using:

    loadopt –disk aix7disk1.iso –vtd vtopt6

(Note: vtopt6 was created earlier when I ran the mkvdev –fbo command.)

I was able to verify it was there by running:

    lsmap –vadapter vhost4

Once the .iso image was mounted in the virtual optical device, I was able to log into the client LPAR and run cfgmgr. That made the cd0 device appear. It was linked to the .iso image in the virtual optical device by virtue of the loadopt command we ran earlier.

    # cfgmgr
    # lsdev | grep cd
    cd0        Available      Virtual SCSI Optical Served by VIO Server

Now that the LPAR had the source DVD (the AIX 1 DVD loaded into /dev/cd0) and the USB device (/dev/usbms0), I was ready to run the dd command:

    # dd if=/dev/cd0 of=/dev/usbms0 bs=4096k
    1010+1 records in.
    1010+1 records out.

At this point, we were able to reboot the LPAR and go into SMS and get it to boot from USB. Booting took a bit longer than it would from a virtual optical device, but it still happened quickly enough.

This is a handy procedure if you need to load a VIO server onto a bare metal machine, for example. It’s especially valuable to know if you either don’t have an optical device or you’re using a split back plane and your optical device is connected to the other VIO server.

So how many of you have done this? What else are you doing with your USB drives?

Info on VIO Commands

Edit: Some links no longer work.

Originally posted June 27, 2017 on AIXchange

I want to highlight a few VIO commands, and point you to where you can find even more commands.

For example, have you heard of the VIO rules command?

Purpose
Manages and deploys device setting rules on the Virtual I/O Server (VIOS).

Syntax
rules -o operation [ -l deviceInstanceName | -t class/subclass/type ][ -a Attribute=Value ] [-d] [-n] [-s] [ –f RulesFile ] [-F] [-h]

Description
The rules command is used to capture, deploy, change, compare, and view VIOS rules. It leverages AIX Run Time Expert Solution (ARTEX) technology. VIOS provides predefined default rules that contain the critical rules for VIOS device configuration that are recommended for VIOS best practice. You can use the rules command to manage device settings rules on VIOS.

You can capture them, deploy them, import them, list them, compare them, modify, delete, and add them.

The IBM Knowledge Center has quite a few examples on using the rules command. It also covers the rulescfgset command:

Purpose
Helps to simplify the rules deploy management process.

Syntax
rulescfgset

Description
The rulescfgset command is an interactive tool to guide a user deploying current rules, upon user direction. It identifies if current system settings match the factory default rules. If any mismatch is found, current rules are merged and updated with the recommended default setting rules automatically. When you allow new rules to be applied, the updated current rules are deployed on the system. The new rules do not take effect until the system reboots. If you do not want to deploy immediately, it returns normally. The rulescfgset command updates current rules, as needed and makes the Virtual I/O Server (VIOS) ready at any time to deploy new rules.

Lastly, here is the whole list of VIO and IVM commands listed alphabetically. This page tells you what’s new with the VIO and IVM commands. I’m willing to bet you’ll find something here that you did not know existed.

I encourage you to check the Knowledge Center periodically for changes and updates to this information.

PowerHA Now Includes HTML Reporting Capability

Edit: Some links no longer work.

Originally posted June 20, 2017 on AIXchange

Here’s an informative write-up about the native HTML report with PowerHA:

IBM PowerHA 7.1.3 has a very nice feature; the native HTML report.

We can get this report via clmgr command, and no external requirements is needed, simply having the software base installed.

This HTML report contains very useful information from the cluster:

  • General Information about the cluster.
  • Nodes configuration.
  • Resource Groups and Application Controllers.
  • Network Configuration (IP Labels, Interfaces…).
  • Shared LVM components.

We can use the HTML report as great summary of our IBM PowerHA cluster!!

To create the HTML report with clmgr command:

# clmgr view report cluster TYPE=html FILE=/tmp/powerha.report

This page also includes a sample report. For details about that, check out pages 63-64 of this Shawn Bodily presentation:

Native HTML cluster report is now available via clmgr:

Alternative to the IBM Systems Director reporting feature
No external requirements. Available in the base product.
       
       Benefits include:

  • Contains more cluster configuration information than any other report.
  • Can be scheduled to run automatically via AIX core abilities (e.g. cron).
  • Portable. Can be emailed without loss of information.
  • Fully translated.
  • Allows for inclusion of a company name or logo into the report header.

Limitations:

  • Per-node operation. No centralized management.
  • Relatively modern browser required for tab effect
  • Only officially supported on Internet Explorer and Firefox


On another note, if you haven’t seen the new user interface, watch this short PowerHA UI video:
https://www.youtube.com/watch?v=d_QVvh2dcCM

As the product continues to evolve I’ll continue to cite interesting new features. Let me know if there’s anything you think I should highlight.

Service and Productivity Tools for LoP Users

Edit: Some links no longer work.

Originally posted June 13, 2017 on AIXchange

If you’re running Linux on Power, are you running these service and productivity tools?

While Linux lacks the diagnostics and reporting capabilities that are built into AIX, these tools help bridge that gap. There are tools to help you with hardware inventory. There’s information about Linux platform diagnostics. Although it’s not quite the same as it is on AIX, you will find  explain_syslog command and the diag_encl utility, for example.

There’s an inventory scout, which surveys the system for vital product data. There’s a servicelog and related utilities to manage events that require service. There are power and environmental management features, service aids, performance management, raid adapter utilities and the IBM Electronic Service Agent tool.

Finally, there’s information about service aids, which provides for things like lparstat, lsslot, bootlist, etc.

If you’re running Linux on Power hardware, it just makes sense to run these packages. Go here for help with installation.

The idea that I can run lscfg, lsmcode, lsvpd and lsvio on a Linux partition warms my heart. Linux administrators need to be able to do these things that AIX admins have taken for granted for years.

A Techie’s Guide to Recreational Reading

Edit: Some links no longer work.

Originally posted June 6, 2017 on AIXchange

What sort of person actually enjoys reading through random IBM support documentation? A lot of us, I imagine. I know I do. I find reading docs in my spare time helps me when I’m actually dealing with a problem. I’ll remember reading something, and even if I don’t recall where I saw it, I can usually locate it with minimal digging.

I frequently browse IBM Techdocs. I’ll search on “aix” or “hmc” for example, and change the any time field to something fairly current. You never know what might pop up.

For example, I recently read this document about initiating a resource dump from the HMC. It wasn’t anything I needed to do at that moment, but if I ever do run into this issue, I will know where to look and I know I’ll feel comfortable with the procedure as it won’t be coming from out of the blue. To me it feels like refreshing my memory as opposed to learning a new concept.

The document offers some introductory information, and then goes through the various steps you would take:

The Resource Dump function can be used to generate Power Hypervisor Resource Dumps, Partition Firmware Dumps, SR-IOV Dumps, and Power Hypervisor Macro Resource Dumps….

A non-disruptive Hypervisor Resource Dump, initiated using selector system or the default blank selector, generates a SYSDUMP dump file. This can take an extended amount of time to complete. The dump status will show ‘In Progress’ until it completes. If IBM Support did not specify a selector, then this is the desired dump type.

Toward the end you can see how to do the same thing from the HMC command line:

Alternatively, these dump types may be initiated from the HMC command line, using:
startdump -m {managed server} -t resource -r “{resource selector}”

Example 1: startdump -m 9119-MME*21ABCDE -t resource -r “system”
would generate a non-disruptive Hypervisor Resource Dump, SYSDUMP type, for server 9119-MME*21ABCDE”

Finally, there’s an option to send the dump into IBM for analysis. Read this for details:

This document describes how to retrieve and send an existing server dump to IBM support using HMC Version 7 and HMC Version 8 (classic or enhanced) user interface. Dump types include any server dump such as FSP dumps (FSPDUMP), system or platform dumps (SYSDUMP), power dumps (PWRDUMP), resource dumps (RSCDUMP), and logical partition adjunct dumps (LPADUMP).

Be sure to read the whole thing. There’s additional information that I haven’t presented here.

Perusing documents from time to time is a simple way to expand your knowledge base. And I do enjoy it, because I never know what will turn up, or when I might need it.

IBM Spectrum Protect Live Demo

Edit: I still love live demos that I can play with vs Youtube videos

Originally posted May 30, 2017 on AIXchange

Recently I was shown a link to the IBM Spectrum Protect Live Demo, and I thought I’d tell you about my experience with it.

The statement on the landing page sums it up well:

Anyone can be a data protection expert by using the Operations Center!

Even with thousands of systems, virtual machines, and applications over multiple sites, you can quickly verify that your data is protected and identify any trouble spots.

You can follow the task scenarios that are provided or explore the sample environment on your own.

Not all product capabilities are included. If you explore beyond the provided scenarios, some tasks might not complete or might be disabled.

Note that the login information (administrator name and password) are provided on the landing page. You should be able to access the site with a click.

On the right side of your screen, if you click on the icon next to the Guided Demo heading, you’ll learn about the demo itself:

This demonstration features a live IBM Spectrum Protect environment (formerly the IBM Tivoli Storage Manager product family). The demo includes sample data for three backup servers running in a virtual environment.

For best results, use one of the following web browsers on a Windows system to view the demo:
    Microsoft Internet Explorer 10 or 11
    Mozilla Firefox ESR 24 or later

Ensure that your screen resolution is set to a minimum of 1680 x 1050 pixels.

Tip: The layout of the Operations Center automatically adjusts to fit the available space. If the demo instructions and images don’t match what you’re seeing, zoom out as needed (Ctrl + – in most browsers).

You can choose from basic tasks that provide details about what you should expect from the demo. You’ll see how to use the dashboard, how to add a client, how to manage your workflow, and how to customize views.

Of course this information is helpful, but as someone who learns by doing, I like to jump in and click around so I can see how intuitive the interface is. I immediately noticed a listing of the number of clients, applications, virtual machines, systems, services alerts and activity along the left side of the screen. Information about servers, storage and data availability was on the right side.

There are multiple areas you can explore, and different scenarios to try. With the enhanced HMC, along with interfaces to the XIV and Storwize systems, IBM has made substantial efforts to help administrators manage their hardware more easily.

I like what I see, and I’d love to see other demo versions of software so we can get used to the look and feel of other products.

CoD Remains an Under-Utilized Option

Edit: Still a powerful tool for your toolbox. Some links no longer work.

Originally posted May 23, 2017 on AIXchange

Although Capacity on Demand has been around for years, I still encounter customers who are unaware of this option. So here’s a primer/reminder:

Certainly if you run enterprise servers, you should know about CoD. The idea behind it is that you know you will need more memory or cores in the future, but you don’t know precisely when. Or maybe it’s only needed temporarily. (Think of any seasonal business.) Rather than add new servers or hardware, which can require planning and possible downtime, IBM will ship these enterprise class systems that have the hardware physically installed, but it’s activated and paid for only when you are ready to use it. You can choose to be charged only for the resources you consume, or you can simply have the hardware activated permanently.

Here’s more about the CoD offering:

Capacity on Demand allows you to easily activate processors and memory without disruption to your operations, paying for the increased capacity as your needs grow. The following programs are available:

Capacity Upgrade on Demand (static)
Capacity Upgrade on Demand provides the ability to bring new capacity on-line quickly and easily. Processors and memory can be activated dynamically without interrupting system or partition operations. Processors can be activated in increments of 1 processor, while memory can be activated in increments of 1 GB. As your workload demands require more processing power, you can activate inactive processors or memory simply by placing an order for an activation feature. You can retrieve, over the Internet an electronically encrypted activation code that unlocks the desired amount of capacity. There is no hardware to ship and install, and no additional contract is required.

Elastic Capacity on Demand (temporary)
Elastic CoD (formally known as On/Off Capacity on Demand) provides short-term processor and memory activation capability for fluctuating peak processing requirements such as seasonal activity, period-end or special promotions. When you order an Elastic CoD feature, you receive an enablement code that allows a system operator to make requests for additional processor and memory capacity in increments of 1 processor day or 1 GB memory day. The system monitors the amount and duration of the activations. Both prepay and post-pay options are available.

Utility Capacity on Demand
Utility CoD provides automated use of on demand processors from the shared processor pool for short-term workloads on IBM POWER6, POWER7 and POWER8 processor-based systems. Utility CoD is for customers with unpredictable, short workload spikes who need an automated and affordable way to help assure adequate server performance is available as needed. Usage is measured in processor minute increments.

Trial Capacity on Demand
Trial Capacity on Demand provides the flexibility to evaluate how additional resources will affect system workloads. A standard request is easily made for a set number of processor core activations and/or a set amount of memory activations. The standard requests can be made after system installation and again after each purchase of permanent processor activation. POWER5 and POWER6 servers except the POWER6 595 can activate up to 2 processor cores and/or up to 4 GB of memory. The POWER6 595, Power 770, 780 795, E870 and E880 can activated up to 8 processor cores and/or up to 64 GB of memory.

An exception request can be made one time, over the life of the machine and enables all available processor cores or memory.

Both standard and exception requests are available at no additional charge….

Trial Active Memory Expansion
Active Memory Expansion can allow a POWER7 and POWER8 server to expand beyond the physical memory limits of the server for an AIX 6.1 partition. Thus a previously memory constrained server can do more work. The degree of expansion depends on how compressible is the partition’s data and on having additional CPU resource for the compression/decompression. A one-time, no-charge 60-day trial allows the specific expansion and CPU usage to be evaluated.

Power Enterprise Pools
Power Enterprise Pools establishes a new level of flexibility and value for systems that operate together as a pool of resources. Mobile activations are available for use on Power 770, 780, 795, E870 and E880 systems and can be assigned to any system in a predefined pool by the user with simple HMC commands. IBM does not need to be notified when these resources are reassigned within a pool. The simplicity of operations provides new flexibility when managing large workloads in a pool of systems. This new feature is especially appealing to aid in providing continuous application availability during maintenance windows. Not only can workloads easily move to alternate systems but now the activations can move as well.The process of activating resources is explained here. Basically, you get the appropriate code from IBM and enter it into the HMC. Power Enterprise Pools require a little more work, as there are actual legal documents to sign, etc., but it’s a great option for moving activations among a pool of multiple physical servers.

Finally, here’s a list of planning guides, user guides, presentations and other technical information.

How to Seize Information About Your SEAs

Edit: Still a useful technique, some links no longer work.

Originally posted May 16, 2017 on AIXchange

Shared Ethernet adapters have matured as a technology. A few years ago when SEAs were new and a little more esoteric, they were occasionally misconfigured, leading to network issues. Now that we have more experience with them, I don’t hear about many problems with SEAs these days.

As is mentioned in the Power Implementation Quality Standard, you may find that you’re more interested in SEA load sharing because it allows you to utilize all of your 10G interfaces and switch ports. Of course a lot of folks have eschewed explicitly setting control channels and are just using the default control channel. As I said, SEAs are easier to use now.

Nonetheless, the question still comes up with both new and legacy systems running SEAs in their VIO servers about which interface is the primary and which is the backup at any given time.

Recently, a client wanted to know the status of their interfaces. This has always been my go-to command to see which SEA is active:

    netstat –cdlistats | grep “State:” | grep –v “Operation State” | grep –v “Stream State”

In this case though, it wasn’t sufficient. My client has multiple SEAs and multiple physical and virtual interfaces, but the output from this command only lists the status of the interfaces; there’s no way to tell which SEA is which:

          padmin $ netstat –cdlistats | grep “State:” | grep –v “Operation State” | grep –v “Stream State”
LAN State: Operational
    State: PRIMARY
LAN State: Operational
LAN State: Operational
    State: BACKUP
LAN State: Operational
LAN State: Operational
    State: PRIMARY
LAN State: Operational
LAN State: Operational

The following command will tell you about all your virtual interfaces, including those that are part of an SEA and if they’re available. You can also find out individual adapter IDs and location codes of backing devices:

    lsmap -all -net

To get the names of the adapters that are SEAs, simply add this flag:

    lsmap -all -net -field sea

You’ll see this output:

    SEA        ent1

    SEA        ent2

    SEA        ent3

By the way, if you’re looking for information about other fields that might be of interest to you, check out this document. It also explains how to change the delimiter.

Ultimately though, my client had a specific issue to address. I took the output from the lsmap -all -net command and created a for loop. Using the awk command, I isolated the entX value that corresponded to the SEAs on the system.

This was the loop I came up with, along with the output that I saw:

for i in `lsmap –all –net –field sea | awk ‘{print $2}’`
do
echo $i ; entstat –all $i | grep State
done

padmin$ for i in `lsmap -all -net -field sea | awk ‘{print $2}’`
> do
> echo $i ; entstat -all $i | grep State
> done
ent1
    State: PRIMARY
LAN State: Operational
LAN State: Operational

ent2
    State: BACKUP
LAN State: Operational
LAN State: Operational

ent3
    State: PRIMARY
LAN State: Operational
LAN State: Operational

Obviously you can get a lot more information with entstat, but this is what I needed.

How do you determine which VIO server is primary and which is the backup in your environment?

About Processor Modes

Edit: Some links no longer work.

Originally posted May 9, 2017 on AIXchange

During the build out of a new POWER8 server, we were loading our VIO servers. We were using the classic HMC interface, and a user clicked on the profile after VIOS was running. He checked the hardware tab, and in that tab, they happened to see the processor compatibility mode.

What would you expect to see there?

Naturally, you’d expect to see POWER8 mode. But instead, POWER7 was the compatibility mode being displayed.

An AIX LPAR did show POWER8 mode. So why wasn’t POWER8 appearing in the VIO server LPAR? For that matter, why are there differences in processor modes anyway?

Here’s a good overview of what I’m talking about. But the short version is this: There are POWER6, POWER7 and POWER8 processor modes. One advantage with these nodes is that they can be used to enable live partition mobility operations between different server families. That helps when migrating from POWER6 or POWER7 to POWER8 servers, because it allows you to take an outage to change the processor mode at your convenience. In effect, you can migrate without any downtime.

There’s also a default processor mode. From the same link:

The default processor compatibility mode is a preferred processor compatibility mode that enables the hypervisor to determine the current mode for the logical partition. When the preferred mode is set to default, the hypervisor sets the current mode to the most fully featured mode supported by the operating environment. In most cases, this is the processor type of the server on which the logical partition is activated. For example, assume that the preferred mode is set to default and the logical partition is running on a POWER8 processor-based server. Because the operating environment supports the POWER8 processor capabilities, the hypervisor sets the current processor compatibility mode to POWER8.

But back to the original issue: Why was VIOS coming up in POWER7 mode? VIO server is based on AIX 6.1. The hypervisor has determined that POWER7 compatibility mode is the best mode in which to run. This is also confirmed in this article:

Once the VIO server was running, I went through all the normal checks. ioslevel shows as 2.2.3.3 and oslevel –s shows the operating system at 6100-09-03-1415. This means the VIO will be running in SMT4 mode since SMT8 requires 7.1 tl03 sp3.

So it’s nothing to worry about; it’s just another thing to be aware of as you move to POWER8.

There’s Still Something About the Good Old Days of Tech

Edit: The more things change..

Originally posted May 2, 2017 on AIXchange

I like to check my Twitter analytics to get a feel for the kinds of topics that my followers find interesting. You can learn the number of impressions and engagements for each individual tweet, as well as your overall engagement rate.

I don’t have a huge Twitter following, so my numbers are usually pretty modest. That’s why I was so surprised at the response to a recent tweet.

On March 12, my notifications lit up over this. I simply said, “Retweet if you’ve ever typed AT commands like ATDT to control a modem.”

To my astonishment, I received more than 9,000 impressions so far for that one, and as of the time of this writing, it still gets the occasional retweet. For the sake of comparison, most of my tweets only generate a few hundred impressions.

I guess it goes to show that there’s something about the “good ol’ days” of tech that seems to capture so many imaginations. For me, when I saw that initial tweet, I instantly flashed back to the sound of a modem connecting.

That sound meant that soon you’d be back online. Of course getting and staying online wasn’t easy in those days. I had a 386 zeos laptop and an external modem, and I can remember needing to be sure to bring either my long distance calling card or some local POP phone numbers so I could check email or send files. To the world at large, computers weren’t ubiquitous then, and wifi and data on our phones didn’t exist. “Long-distance calling” could mean talking to someone a half hour away, and it cost real money. Yet here we were, able to send emails and chat with people from anywhere. It seemed so awe-inspiring back then.

As much as I enjoy being able to connect to wifi from an airplane to check email or watch cat videos while hurtling through the sky at 30,000 feet, there’s something almost romantic about all those hoops we jumped through to get access to our text-based Internet. At least for those of us who’ve been there from the evolution from BBSs to now, there’s a simple perfection to text-based computing. It may be why us older computer nerds have less of an issue with things like the Korn shell and ‘set -o vi’ and vi editing. We were using AT commands before we ever heard the sounds that made it happen. We grew up with these kinds of systems. We didn’t grow up in the graphical, “GUI-fied” world that came after.

This was the soundtrack of life in the 1990s. Before the Eternal September. Before any of your non-techie friends even knew what you were talking about when you mentioned gopher or usenet or irc or email, let alone the World Wide Web.

Running YUM on AIX

Edit: This is still the best way to load rpm packages. Some links no longer work.

Originally posted April 25, 2017 on AIXchange

When it comes to getting open source packages onto your AIX system, there are lots of options. And while package dependencies are always a problem, Michael Perzl’s solution is still sound.

But now there’s another way to handle rpms and dependencies on your AIX machine: YUM. YUM was covered in detail in the December AIX Virtual User Group meeting. I encourage you to listen to the replay and learn more.

Here are some notes from the presentation.

From slide 3:

YUM (Yellowdog Updater, Modified) is an open source command line package management utility for RPM packages.
• YUM is a tool for installing, removing, querying, and managing RPM packages.
• YUM automatically determines dependencies for the packages getting updated/installed thus fetches the dependent packages and installs them with the packages.
• YUM works with an existing software repository that contains RPM packages. The YUM repository can be local or over network.
From slide 4:

YUM on AIX – Before and After
• Before YUM
o Difficult to navigate package dependencies.
o Often must manually determine dependencies, one by one.
o Manually check from toolbox/repository if a package or a new version of package is available.
• After YUM
o YUM automatically consults a dependency database and downloads the dependent packages you need.
o YUM can list all the packages available on the repository.
o Using ‘yum check-update’ one can find if new versions of installed packages are available and can use ‘yum update <package>’ to update it.
Note : YUM doesn’t recognize installp dependencies. For example OpenSSL libraries are provided by AIX in installp format, so if an RPM that depends on OpenSSL then user should make sure to keep his OpenSSL image current.Slide 6 points to this easy to follow document on installing YUM on AIX. Follow along as I share my own experience with getting YUM working.

To get started, I went here and grabbed the rpm.file, which I copied to my machine.

Then I went here for the yum_bundle_v1.tar file, and copied that over.

I installed rpm.rte using smitty. There were no issues. Then I installed the yum_bundle_v1.tar by first untarring it:

I ran this to complete the installation:

At this point I had these rpm files loaded on my systems:

# rpm -qa
expect-5.42.1-3.ppc
tk-8.4.7-3.ppc
readline-6.1-2.ppc
gettext-0.10.40-8.ppc
yum-metadata-parser-1.1.4-1.ppc
db-4.8.24-3.ppc
pysqlite-1.1.7-1.ppc
curl-7.44.0-1.ppc
python-urlgrabber-3.10.1-1.noarch
python-devel-2.7.10-1.ppc
tcl-8.4.7-3.ppc
AIX-rpm-7.2.1.0-2.ppc
sqlite-3.7.15.2-2.ppc
glib2-2.14.6-2.ppc
gdbm-1.8.3-5.ppc
python-2.7.10-1.ppc
python-iniparse-0.4-1.noarch
python-pycurl-7.19.3-1.ppc
yum-3.4.3-3.noarch
python-tools-2.7.10-1.ppc

I went ahead and edited my yum.conf as stated in the instructions. Then I just ran “yum install wget” and saw:

I was able to run yum update and it went ahead and updated my rpm packages as well.

Were you aware of this capability? Have you had any issues with it?

A Different Way to Look at Systems

Edit: Some links no longer work

Originally posted April 18, 2017 on AIXchange

Sure, I enjoy reading about new systems. I’ve also done my share of writing about them. But for me, there’s nothing better than being able to actually visualize hardware. Of course speeds and feeds are helpful, but I want to see where the different cables plug in, or how a server or expansion drawer fits in the rack. It’s just easier to understand how things go together when you can give it the eye test.

The IBM Interactive Product Tour Catalog is a helpful tool for visualizing servers, storage and related solutions. Viewing options are divided into four main categories: Systems and Servers, System Storage, Storage Networking, and Solutions.

For example, from Systems and Storage, you can drill down into IBM LinuxONE, IBM Power Systems, IBM System z, or the IBM zEnterprise BC12. Drill down into Power Systems, and you’ll find Enterprise, Scale Out, Scale Out/Linux, Converged Infrastructure, and I/O Drawer. There are many different models to choose from, including POWER7 and POWER8 options.

Once you choose the product that interests you, you can access product animations that offer views from the front, rear, and even with the cover removed. Here’s the front view of an S822. By hovering your mouse over different parts of the system, you can see where the disks are located, where the SSDs would go, and where the operating panel, USB port and DVD slot are. And again, you can view it without the cover, which allows you to see the location of the fans, processor, memory, power supply and SAS cards.

The Overview option provides server specifications. Here’s the description of the S822:

Power S822 is a 2-socket 2U system which can be ordered with the flexibility of either one or two processor sockets populated provides growth capacity for customers who need it. It provides the benefits of greater performance per core as well as per socket with POWER8 processors, new I/O capabilities, higher internal storage and PCIe capacities and performance, the capability to support CAPI accelerator devices, and greater RAS including hot-plug PCIe capability.As I noted, you can also view enterprise models. For instance with the 880, you can view the system control unit, the Power I/O drawer, or the E880 CEC. There’s an overview and lists of highlights and specifications for this system as well. And don’t forget to check out the storage subsystems and related solutions.

This is a nice way to get more familiar with the form factors and how the systems are actually laid out. Basically, it’s the next best thing to actually being in the same room as the hardware.

The Danger of Defaults

Edit: Some links no longer work

Originally posted April 11, 2017 on AIXchange

A friend who was in the midst of a migration project recently asked me what I knew about TCPTR. Short answer: not much. So I went searching and found this definition:

Configures or displays TCP Traffic Regulation (TR) policy information to control the maximum incoming socket connections for ports.That led me to this more detailed explanation:

TCP network services and subsystems running on AIX automatically and transparently take advantage of this powerful DoS mitigation technology using simple administrative tuning. This new feature provides a simplified approach to increased network security by leveraging centralized management and firewall-based customization. In addition to providing effective service-level and system-level TCP DoS mitigation, IBM AIX TCP Traffic Regulation provides system-wide TCP connection resource diversity across source Internet protocol addresses initiating connections.That jarred my memory, so I went back to this article:

Over the weekend, a client implemented security hardening on their production LPARs. They used AIX 6.1 Security Expert. Apart from some users who had been locked out due to weak passwords, testing went well … until about 9am Monday, when some users reported they couldn’t log in.

I forwarded all that to my friend, but in the meantime, he’d figured out his issue. The details are pretty interesting:

The TCPTR functionality in AIX regulates the amount of connections on certain ports. If you run AIXPert and chose the high settings, it enables this functionality.

This particular application that hits the database on our server generates a lot of connections, more than are allowed by TCPTR by default. So, it was dropping connections. It doesn’t log this, in fact you can’t even enable logging of it (I asked IBM).

We turned it off and our problem went away.

Here is a basic rundown of what happened:

  • We were working on a migration project from Oracle 9 on Solaris 9 to Oracle 11 on AIX 7.1.
  • We had done preliminary migrations and testing with a small number of users.
  • On the weekend of the cutover, things were looking good. Database exports/imports went fine.
  • On Monday morning things were still looking good. No complaints from the users.
  • About mid-morning, we started getting reports of some users experiencing slowness and/or disconnects.
  • We began troubleshooting. We found errors in the Oracle logs like “TNS:packet writer failure” and “TNS:lost contact.”
  • This lead us to believe that we were dealing with an Oracle issue.
  • We spent a good part of the day reviewing and changing Oracle settings including TNS name resolution settings, etc.
  • Later in the day, after doing some we searches, one of the guys stumbled across this article.
  • We checked our systems, and sure enough, tcptr was enabled.
  • We disabled tcptr, and the issue cleared.
  • Upon some further investigation of my notes from six years ago when we first rolled out these new LPARs, it looks like we decided to use the AIXPert tool to enable some hardening of the AIX systems.
  • We must have used the AIXPert “high” setting, which enables TCPTR.
  • We have been running all this time without any issue, because the number of connections to our systems never exceeded the restrictions that tcptr puts in place by default.
  • For this new database we migrated, a large number of client connections are made, which exceeded the default settings for TCPTR.

I see this story as a cautionary tale. We’re taking chances when we accept a tool’s defaults without fully understanding what is being changed under the covers. But what can we really do about this? I always argue for using test and development LPARs and doing testing whenever possible, but this environment had been running with these rules in production for years without an impact until the usage scenario changed and more connections were coming in.

Obviously this isn’t an isolated issue, as at least two customers have run into it that we know about. Now I throw the question out to my readers. Have you experienced this or at least heard about it? Moreover, what should we be doing to protect ourselves and our environments?

More rPerf Resources

Edit: Some links no longer work

Originally posted April 4, 2017 on AIXchange

Earlier this year I pointed you to a method for finding relative performance (rPerf) numbers for your LPAR.

Sometimes you may want to compare the rPerf of different IBM Power Systems, some of which you may not even have access to. If you’re replacing a POWER6 system with POWER8, you’ll need some way to compare these systems so, for instance, you’ll have a better understanding of the number of cores you’ll need to activate for new or existing workloads. As long as you know the model number, the total number of cores on the system and the CPU speed, you can obtain some valuable information.

IBM, which publishes the rPerf values, describes it this way:

rPerf estimates are calculated based on systems with the latest levels of AIX and other pertinent software at the time of system announcement. Actual performance will vary based on application and configuration details. The IBM eServer pSeries 640 is the baseline reference system and has a value of 1.0. Although rPerf may be used to compare estimated IBM UNIX commercial processing performance, actual system performance may vary and is dependent upon many factors including system hardware configuration and software design and configuration. Note that the rPerf methodology used for the POWER6 processor-based systems is identical to that used for the POWER5 processor-based systems. Variations in incremental system performance may be observed in commercial workloads due to changes in the underlying system architecture.

You can always find the rPerf numbers by downloading the latest facts and features guides. Then simply pull those numbers to compare machines. You can also get the IBM Power Systems Performance Report, which includes other performance values like SPECint and SPECfp. Even better, this handy spreadsheet has everything in one place.

The spreadsheet includes a regularly updated change log (the latest update is November, and it goes all the way back to 2007) so you can see if the system that you’re interested in has been added yet:

Data Sources:
http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html
http://www-03.ibm.com/systems/i/advantages/perfmgmt/resource.html

Do you use rPerf numbers as you plan for upgrades? Were you aware of these options for getting these values?

HA and DR Overview

Edit: Some links no longer work.

Originally posted March 28, 2017 on AIXchange

What are the different high availability (HA) and disaster recovery (DR) solutions are available for Power Systems? What are the pros and cons of these different solutions?

This comparison document created by Carl Burnett, Joe Cropper, and Ravi Shankar helps answer these questions:

“There are many elements to consider as you plan high availability and disaster recovery solutions for your Power Systems environment. In this article, we will explore some of these solutions and discuss some considerations and best practices when using technologies like PowerHA, Geographically Dispersed Resiliency and PowerVC within the data center.

Clustering based HA or DR solutions rely on redundant standby nodes in the cluster to be able to take over the workload and start them when the primary node fails. Each node in the cluster will monitor health of various elements such as network interfaces, storage, partner nodes, etc. to act when any of these elements fail. Clustering technologies were the closest to fault tolerant environments in regards to HA or DR support based on completely redundant software and hardware components. Cluster solutions are often operating system or platform specific and they do provide detailed error monitoring, though they require considerable effort to deploy and maintain….

It is expected that Power deployments use both models of HA & DR as needed. Cluster based HA-DR solutions are best for protecting critical workloads. For example, SAP environments are distributed and cluster-based HA would be the best method to monitor and act for various components of the stack. For other workloads, a VM restart-based model might be sufficient protection for HA and DR….”

That’s from section 2.0, which includes a nice graph that presents various solution types in terms of the availability they provide and the complexity of setting them up.

This comes from sections 3.0. 3.1 and 3.2:

“Cluster HA/DR solutions have existed for Power systems for a long time (PowerHA has been the leading HA/DR solution on AIX for more than 20 years). They have been enhanced recently to provide additional capabilities and user experiences.

VM restart based HA/DR solutions are new in 2016 and are described below:
1. PowerVC High Availability Features: PowerVC added new capabilities around VM restart High Availability. These capabilities enable customers to deploy cloud environments easily and enable simplified High availability.
2. Geographically Dispersed Resiliency (GDR) Disaster Recovery: IBM introduced a new offering for disaster recovery using VM restart technology and storage mirroring.

… PowerVC provides enterprise virtualization and cloud management of Power Systems and leverages OpenStack to do so. PowerVC has introduced high availability management functions over its past few releases. Listed below is a summary of those features:

· One-click system evacuation: During planned maintenance windows, this feature allows administrators to evacuate a host by leveraging live partition mobility (LPM). PowerVC orchestrates the mobility of all active VMs to other hosts in the environment (or a host of your choice), thereby allowing maintenance (e.g., firmware or VIOS updates, etc.) to be performed without disrupting workloads. While the host is in maintenance mode, PowerVC will not place any new VMs on this host either. Once maintenance is done, VMs can be then be placed on the host again and normal operation can resume.

· Automated remote restart: PowerVC has supported PowerVM’s simplified remote restart feature since its inception in the POWER8 timeframe. This feature allows an administrator to rebuild a VM residing on a failed host to a healthy host (assuming the hosts have shared storage). This is a critical availability feature as it provides a mechanism to recover critical VMs in the event their hosting server fails unexpectedly (read: unplanned outage).

… Power systems now provides VM restart based DR solution for the entire data center. GDR integrates deeply with PowerVM environments (HMC, VIOS) to provide for DR restart of VMs across sites using storage replicated VM images. GDR Disaster Recovery solution is easy to deploy and manage. GDR can manage recovery of hundreds of VMs across the sites.”

The References section end of the document also points you to information about Geographically Dispersed Resiliency (GDR), starting with this diagram about the IBM offering. There are also links to two GDR articles (here and here).

A POWER9 Roadmap

Edit: Now we are doing POWER10 roadmaps. Some links no longer work.

Originally posted March 21, 2017 on AIXchange

I want to point you to Jeff Stuecheli’s POWER9 presentation from January’s AIX Virtual User Group meeting. This information doesn’t involve specific announcements or new models, but it provides an informative look the capabilities of the chip itself. Download the presentation and/or watch the video.

Some highlights:

  • The slide on page 2 shows a roadmap with POWER9 appearing in the second half of 2017 and into 2018, with POWER10 appearing in the 2020 timeframe.
  • Page 3 covers different workloads that POWER9 has been designed for.
  • This is from page 4:

Optimized for Stronger Thread Performance and Efficiency
• Increased execution bandwidth efficiency for a range of workloads including commercial, cognitive and analytics
• Sophisticated instruction scheduling and branch prediction for unoptimized applications and interpretive languages
• Adaptive features for improved efficiency and performance especially in lower memory bandwidth systems

  • This is from page 5:

Re-factored Core Provides Improved Efficiency & Workload Alignment
• Enhanced pipeline efficiency with modular execution and intelligent pipeline control
• Increased pipeline utilization with symmetric data-type engines: Fixed, Float, 128b, SIMD
• Shared compute resource optimizes data-type interchange

  • From page 8: There will be two ways to attach memory. You can either attach it directly or you can use the buffered memory in the scale up systems.
  • Page 10 shows a matrix and what you will be able to expect from the two socket vs. multi-socket systems.
  • Page 11 shows the socket performance you can expect from POWER9 vs. POWER8.
  • Page 13 covers data capacity and throughput.
  • Page 15 covers the bandwidth improvements between CECs on the large systems, and page 17 examines the different accelerators that will be incorporated.
  • This is from page 18:

Extreme Processor/Accelerator Bandwidth and Reduced Latency
• Coherent Memory and Virtual Addressing Capability for all Accelerators
• OpenPOWER Community Enablement – Robust Accelerated Compute OptionsState of the Art I/O and Acceleration Attachment Signaling
– PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
– 25Gb/s Common Link x 48 lanes – 300 GB/s duplex bandwidth
• Robust Accelerated Compute Options with OPEN standards
– On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2
– CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4
– NVLink 2.0 – Next generation of GPU/CPU bandwidth and integration using 25G
– Open CAPI 3.0 – High bandwidth, low latency and open interface using 25G

  • This is from page 19:

Seamless CPU/Accelerator Interaction
• Coherent memory sharing
• Enhanced virtual address translation
• Data interaction with reduced SW & HW overhead

Broader Application of Heterogeneous Compute
• Designed for efficient programming models
• Accelerate complex analytic/cognitive applications

  • Page 23 covers OpenCAPI 3.0 features. This is from page 26:

Enhanced Core and Chip Architecture for Emerging Workloads
• New Core Optimized for Emerging Algorithms to Interpret and Reason
• Bandwidth, Scale, and Capacity, to Ingest and Analyze
Processor Family with Scale-Out and Scale-Up Optimized Silicon
• Enabling a Range of Platform Optimizations – from HSDC Clusters to Enterprise Class Systems
• Extreme Virtualization Capabilities for the Cloud
Premier Acceleration Platform
• Heterogeneous Compute Options to Enable New Application Paradigms
• State of the Art I/O
• Engineered to be Open
These are things that stood out to me, but obviously you’ll get more from listening to the replay.

And if that doesn’t further whet your appetite for POWER9, here are two videos from the Open Compute Project Summit: Aaron Sullivan, Rackspace distinguished engineer, gives a video tour of a system. In another video, Google and Rackspace engineers provide even more details around the systems they are designing.

Selected AIX Versions Can Soon Be Licensed Monthly

Edit: Some links no longer work

Originally posted March 14, 2017 on AIXchange

IBM made an interesting announcement today. The Standard editions of AIX 7.1 and AIX 7.2 will soon be available to be licensed on a monthly basis under Passport Advantage.

This is another example of Power Systems and AIX making their platforms cloud-ready as they transition to a hybrid cloud model. So far we have C models, PowerVC and LC models, and now we have another way to license AIX.  

This whitepaper goes into detail about IBM’s cloud directions. Registration is required to access it.

The new licensing model goes into effect on March 28. You’ll be able to order it using part number 5737-D09.

While enterprise clients could find this interesting if they’re trying to move software and operating systems licensing costs over to their opex bucket, I believe managed service providers (MSPs) have the most to gain. By giving these shops an option to get AIX licenses “on demand,” they can more easily respond to their ever-evolving circumstances (for instance, customers moving into or out of their cloud). Being able to buy a month’s worth of licensing — this includes entitlement and support — at a time really increases their flexibility and helps to account for the possibility of multiple customers that might be sharing the same physical cores.

No monthly reporting to IBM is required, which provides for flexible pricing and billing. It will be offered on the IBM Digital Marketplace, so for the first time you’ll be able to virtually swipe a credit card to get access to AIX.

This offer lists at $26 USD per virtual processor core (VPC) per month. It applies only to E850 machines and below: the small-tier systems. The entitlement is based on customer numbers rather than serial numbers, which is something to keep in mind when dealing with IBM Support. They too will need to get used to this new licensing model.

If you’re interested in this offer, you’ll first need to determine the number of VPCs you’ll need to license for the LPARs that you have defined. IBM developerWorks provides this script for making that calculation:

This script can be used with HMC and/or NovaLink instances to collect and report virtual CPU allocations of Power logical partitions and configured processors on the server. The script is invoked with a valid HMC or Novalink username and a list of space separated IP addresses. Only one username is allowed as input, and that username will be applied to all HMCs/NovaLinks specified.

Also, ssh must be configured between the system running the script and the HMC or Novalink instances it attaches to. See https://www.ibm.com/support/knowledgecenter/POWER6/ipha1/ settingupsecurescriptexecution.htm for more info.

Here is an example of invoking the script.
$ ./vcpu_report.sh | tee output.csv
Enter the HMC/Novalink User: hscroot
Enter HMC/Novalink List (space separated): 9.4.28.92 vhmc2.tlab.ibm.com

The will produced a CSV output file with each row listing the associated HMC or Novalink instance, the system Model-Type & Serial Number, the logical partition name, the LPAR type, and the current procs assigned to the logical partition. An LPAR_TYPE of unknown indicates there’s no RMC session between the HMC / NovaLink and partition.
 For each physical server, licensees must have sufficient entitlements for the lesser of the sum of all VPCs on all virtual servers, or the number of physical cores on the system.

Obviously, this new licensing offer isn’t for everyone, but for MSPs and some others, it could be a game-changer.

New Version of Power Systems Best Practices Now Available

Edit: I always like to look for these documents. Some links no longer work.

Originally posted March 7, 2017 on AIXchange

As I noted in this 2013 post, Fredrik Lundholm compiles and updates a presentation called the Power Implementation Quality Standard for commercial workloads.

This presentation has proven to be rather popular, so I want to let you know that Fredrik’s latest set of slides, version 1.17, can be downloaded here.

The presentation lists changes to previous versions:

“There is some information about the E850C, along with VIOS 2.2.5.

The last time I wrote about these slides we were on version 1.9, so there have been quite a few changes since then. He has a change log on page 3, but here are some of the highlights of what has been updated over time.

Changes for 1.17:
2017 Update, VIOS 2.2.5

Changes for 1.15:
VIOS update, PowerHA update, AIX update, GPFS update, poll uplink, vNIC, SR_IOV, Linux, IBM i

Changes for 1.14:
Clarification aio_maxreqs
VIOS clarification, interim FIXES

Changes for 1.13:
Correction on attribute for large receive
Currency update, POWER8,
I/O Enlarged Capacity

Changes for 1.12:
PowerHA, and PowerHA levels, AIX levels, VIO levels.
Virtual Ethernet buffer update

Changes for 1.11:
Power Saving Animation
Network configuration update admin VLAN / simplification
Removal of obsolete Network design

Changes for 1.10:
Favor Performance without Active Energy Manager
AIX/GPFS code level updates
AIX Memory Pin”

Here are some highlights, from my perspective. First, from page 10:

“With firmware levels 740, 760 or 770 and above on POWER7 systems and all POWER8/POWER7+ models, the ASMI interface includes the favor performance setting.

With POWER8 and HMC8 this interface can be directly accessed from the HMC configuration panels, ASMI is not required. A new option fixed max frequency is also available (1.17).

Engage favour performance as a mandatory modification for most environments in the ”Power Management Mode” menu.

This safely boosts system clock speed by 4-9%.”

This is from page 11:

“On E870/E880 machines the recommendation is to disable “I/O Adapter Enlarged Capacity” to free up hypervisor memory. With PowerVM 2.2.5 and FW860 this is now better. Only disable for AIX/IBM i only systems.

Power off the machine, log on to ASMI menu on HMC –> I/O Adapter Enlarged Capacity:
    Disable I/O Adapter Enlarged Capacity by unselecting the tick box
    Power on the server
    Observe Hypervisor memory consumption”

In addition, there’s a current AIX matrix on page 25, along with plenty of other great information.

If you haven’t looked at Fredrik’s work previously, it’s well worth your time.

A Power Champion, Again

Edit: Some links no longer work. Still proud to be a champion.

Originally posted February 28, 2017 on AIXchange

In case you missed me mentioning it on Twitter (@robmcnelly), I was recently selected as part of the 2017 class of IBM Power Champions.

Along with 13 others, I was first honored as an IBM Power Champion in 2011. When the Power Champions program relaunched last year, I was recognized again.

For 2017, there are a total of 41 Champions, 27 of whom are returning Champions. Read more here:

“After reviewing and evaluating the contributions of our applicants, IBM is happy to announce the 2017 IBM Champions for Power!

The IBM Champion program recognizes innovative thought leaders in the technical community—and rewards these contributors by amplifying their voice and increasing their sphere of influence. An IBM Champion is an IT professional, business leader, developer, or educator who influences and mentors others to help them make best use of IBM software, solutions, and services.

These individuals evangelize IBM solutions, share their knowledge and help grow the community of professionals who are focused on IBM Power. IBM Champions spend a considerable amount of their own time, energy and resources on community efforts—organizing and leading user group events, answering questions in forums, contributing wiki articles and applications, publishing podcasts, sharing instructional videos, and more.”

My employer, Meridian IT, also made mention of it here.

Here’s the complete list of 2017 IBM Power Champions:

Babatunde Akanni
Liam Allan
Torbjorn Appehl
Aaron Bartell
Shawn Bodily
Jim Buck
Lionel Clavien
Benoît Créau
Shrirang “Ranga” Deshpande
Anthony English
Cynthia Fortlage
Alan Fulton
Susan Gantner
Cosimo Gianfreda
Ron Gordon
Midori Hosomi
Tom Huntington
Jay Kruemcke
Hal Kussler
Andy Lin
Jaqui Lynch
Alan Marblestone
Christian Massé
Pete Massiello
Rob McNelly
Brett Murphy
Richie Palma
Jon Paris
Michael Pavlak
Trevor Perry
Jerry Petru
Steve Pitcher
Kody Robinson
Randall Ross
Anthony Skjellum
Shawn Stephens
John Stone
Paul Tuohy
Jeroen Van Lommel
Dave Waddell
Keith Zblewski

Even though I don’t do it for the accolades, being recognized along with so many other accomplished people never gets old. Hopefully I’ll continue to merit being included in this prestigious group. In any event, as POWER9 gets ready to launch, and with POWER10 in the planning stages, I look forward to many more years of evangelizing IBM Power Systems running AIX, Linux and IBM i.

Supporting Systems and the People Who Use Them

Edit: This is still relevant today

Originally posted February 21, 2017 on AIXchange

What systems are you running? That’s an easy enough question to answer. You might tell me that you have two 880s, two 850s and two S822s, all running AIX 7.2.

But what do your systems actually do?

The answer to this question might also seem straight-forward. Say, for instance, that one of your systems runs Oracle, one runs WebSphere, and another runs DB2. So you might have a database layer, an application layer, and a web layer. You may be running PowerHA and GPFS. All of the systems you manage are patched, tuned and running great.

But why did your company purchase your systems, and what do they really do?

I would guess that they run the business. They track money. They track people. They track inventory. Maybe they’re hospital systems that manage patient care. Maybe they’re systems involved in dispatching police or firefighters. Whatever they do, they run the core operation and affect actual people.

Next question: How do your users actually interact with the machines that you manage?

Do you even have an answer to this? If not, you should make the effort to understand how your users work with your systems.

Do your systems run a warehouse? Get out on the dock and learn what can be done to improve their workflow.

Do your computers support manufacturing activity? Go spend time on the manufacturing floor.

Do you have a help desk? Head over there. What kinds of things are users having issues with? How can you help?

Or are you working for a hospital? Then go spend time with the nurses at their workstations and learn about the little things that drive them crazy.

“A co-worker of mine once snapped at a nurse when she had problems logging into her workstation. She responded by asking him if he’d like to come up the hall with her and fix an IV or administer some drugs. Touche. The nurse was just as knowledgeable and passionate about healthcare as my coworker was about technology. Working with computers was important, but it was only a small part of her job. She just needed to enter data and to print some reports. She didn’t care about drivers, passwords or proper startup/shutdown sequences. Once we showed her how to do what she needed to do, she was fine, and we didn’t hear from her again.

End users may not know computers, but they know when they’re running slowly. How often do you take the time to actually sit down with your end users and find out how things are working from their perspective? I’ve had users who were printing out reports from one system and retyping the data into another. How easy would it be to save folks from that effort and aggravation? Just leave the raised floor and take a walk. Find people in other departments that use your systems and ask them for feedback. Ask them if you can look over their shoulder while they use your machine sometime.

End users are our customers. If they weren’t using the data we store and process, there would be no need for us. And if we have a better understanding of users’ problems and frustrations, if we show them better ways to do things, the entire organization benefits.”

I believe most of us understand the need to listen to end users. But we’re busy and they’re certainly busy, so a reminder never hurts. If you actually have access to the people that use your system, accept this gift. Learn how your downtimes actually affect them. Our jobs aren’t just about working with cool technology. Those awesome machines are there to support real people doing real jobs.

New Servers Designed for Smaller Environments

Edit: These offerings are usually pretty popular

Originally posted February 14, 2017 on AIXchange

In my consulting work, I see a number of customers with small machines running critical workloads that don’t incorporate virtualization. Because these workloads aren’t necessarily memory- or CPU-intensive, these customers see no need to set up multiple LPARs. They just want a stand-alone system.

The challenge for many of these customers is that, from a technology standpoint, they’re lagging behind. The hardware is old, and the risks of doing business on older, unsupported systems are substantial.

IBM understands this, and today (Feb. 14), the company is making some announcements to address the modest but pressing needs of these customers.

First, IBM is unveiling the 2U S812 (8284-21A) server. It will come in two flavors: a single-core server for IBM i and a 4-core server for AIX workloads. The form factor is a rack mount system; there aren’t any options for a tower.

These systems will be available in e-config on Feb. 28, and will be generally available on March 17.

Again, this server is designed for a particular subset of customers: those that run AIX or IBM i in a single partition, and don’t use virtualization. This server is not intended for Linux workloads; use the Linux-only or other existing hardware models for them.

The IBM i centric single-core server is a 3.02 GHz POWER8 processor with a maximum of 64G of memory. It has six hot pluggable PCIe Gen3 low profile slots?five if an SAS backplane with write cache is used.

The system supports a maximum of 25 users. It can run IBM i 7.2 or 7.3.

You can add a DVD drive, but there’s no bay for tape or RDX in the system unit. The system has 900W power supplies that can take either 110V or 220V power. You cannot add an I/O drawer, but you could add in fibre adapters to attach to external SAN disk.

The AIX flavor is a 4-core 3.02 GHz POWER8 processor with a maximum of 128G of memory. There’s room for six hot pluggable PCIe Gen3 low profile cards, although, as with the IBM i flavor, only five slots are available if you use the SAS backplane with write cache. There is no option to virtualize the system, but you can add in up to three EXP24S or EXP24SX expansion drawers for up to 72 additional drives. It will run AIX 6.1, 7.1 or 7.2, and also has the 900W 110V or 220V power supplies.

Also announced today is an option for the E880C virtual solution edition for SAP HANA. This is a 48-core 4.02 GHz POWER8 processor system with 2 TB of memory.

In addition, there will be changes with the HMC. As 500G drives become less available, IBM will be switching to 1 TB drives for use in the HMC, with the option to have a second disk with a matching capacity.

Finally, there’s another option for the RDX docking station. This will be the EUA4, which is a follow-up to the EU04.

Search the relevant IBM announcement letters for details. You’ll also want to check for information about some products that are being withdrawn from marketing.

Article Misses the Point on VIOS Use

Edit: Hopefully you are running dual VIOS

Originally posted February 7, 2017 on AIXchange

This was posted on Jan. 17, but it’s worth revisiting. I thought the article was a little over the top, starting with the headline:

“Power Systems running IBM’s VIOS virtualisation need a patch and reboot
Unless you’re willing to tolerate the chance of data corruption”

Here’s what follows:

“IBM on Saturday slipped out news of a nasty bug in its VIOS, its Virtual I/O Server that offers virtualisation services on Power Systems under AIX.

Issue IV91339 strikes when moving virtual machines and means “there is a very small timing window where the VIOS may report to the client LPAR that some I/Os have completed before they actually do.”

IBM advises that “This could cause applications running on the client [logical partition] LPAR to read the wrong data from the virtual device. It’s also possible that data written by the client LPAR to the virtual device may be written incorrectly.

Hence the issue’s title: “possible data corruption after LPM failure.”

Of course data corruption is precisely what Power Systems and AIX are supposed not to do. The platforms are promoted as exceptionally stable and resilient, just the ticket for mission critical applications that can’t afford many maintenance windows, never mind unplanned ones.

So IBM’s guidance that “Installation of the ifix requires a reboot” will not go down well with users.” 

After the article went live, it was updated:

UPDATE: IBM’s now released a fix and updated its advice on this issue.

Big Blue now also says “The risk of hitting this exposure outside of the IBM test lab has had extensive evaluation and is considered extremely small. The controlled test environment where this problem was observed makes use of a high-precision test injection tool that was able to inject a specific error within a tiny window.”

“The chances of hitting this window outside of the IBM test lab are highly unlikely and there is no known occurrence of this issue outside of the IBM test lab.”

The Reg is nonetheless aware that IBM has recommended users implement the patch.”

As I said, I thought this was over the top, and judging by these comments, I wasn’t the only one:

Uh… why not?

patch and boot the secondary, patch and boot the primary. Extra points if you are nice enough to disable the vscsi and vfcs of the corresponding vios first (rmdev -pl $adaptername). Ethernet fails over automatically, though you could add extra grace there as well.

Hardly a big deal. And in order to run into iv91339s bug, you´d have to have a failing lpm in first place.

                    ******************************

If this goes back as far as 2.2.3.X – then, clearly – it is not happening often – and management might decide that the higher risk to business is updating and rebooting a dual VIOS configuration.

As far as change records go: whether they are a major pain or a minor pain or no pain – experience has taught many that no records – ultimately is a ‘killing pain’. This again, is a process that can ensure that the business can manage their risk – as they view it. System administration is not the business – even that “we” have the best of intents “they” must okay the process. That is how business is done.

The argument that should be made is that the systems were engineered for concurrent maintenance. Not doing the maintenance now may lead to a disruptive ‘moment’. The business does not need to know the technical details – it needs to know the relative risk and impact on business. The design – aka best practice – of using dual VIOS is that the impact should be zero – even with a reboot!

                    ******************************

Although there are reasons to go with a single VIOS and with more recent features that provide a cluster-like availability on other servers my preference within my organization is to deploy Dual VIOS. It’s a nominal expense to deploy while having the ability to tell the business the platform will continue to service the dozens of VM’s on each box while we do concurrent maintenance for each VIOS.

We are not shy to our stakeholders either on how we’ve built our Power environment (starting with P4 and now mostly P8) so they have confidence in the platform and our ability to keep it all running virtually non-stop. 

Really, the article’s whole premise is faulty. I can’t recall the last time I saw an environment with VIOS that wasn’t using dual VIO servers. Patching one VIOS, rebooting and then patching the other VIOS is business as usual. Updating VIOS with the client LPARs running is common practice, and isn’t much of a risk in my opinion. During your next patch cycle, add the fix as you always would. This platform is exceptionally stable and resilient, and this article and the comments actually illustrate that point.

Decoding iCalendar Files

Edit: This seems to be less of a problem lately

Originally posted January 31, 2017 on AIXchange

If you use an electronic calendar, chances are you’re dealing with multiple calendaring and email systems between your work and personal accounts. Some folks use Google Calendar, others use Outlook and still others use Lotus Notes for example. Personally, I use multiple email clients, and each one has a calendar. I prefer to keep all of my appointments in one place using one piece of software, and everything has to sync with my phone.  

Many calendar meeting invitations, regardless of the platform, get sent back and forth as iCalendar (.ics) files:

“iCalendar is a computer file format which allows Internet users to send meeting requests and tasks to other Internet users by sharing or sending files in this format through various methods. The files usually have an extension of .ics. With supporting software, such as an email reader or calendar application, recipients of an iCalendar data file can respond to the sender easily or counter-propose another meeting date/time. The file format is specified in a proposed internet standard (RFC 5545) for calendar data exchange.

iCalendar is used and supported by a large number of products, including Google Calendar, Apple Calendar (formerly iCal), IBM Lotus Notes, Yahoo! Calendar, Evolution (software), eM Client, Lightning extension for Mozilla Thunderbird and SeaMonkey, and partially by Microsoft Outlook and Novell GroupWise.”

One thing I’ve noticed is that when I get sent an .ics file or calendar invite in Gmail, Google makes it difficult to transfer that file to another mail reader — it tries really hard to force you to use Google Calendar. You can’t simply forward that invite from Gmail to another mail program and expect it to just work. Fortunately, there is a way to deal with this. Select Show Original to view the original email, and then scroll to the bottom, where there’s a section with this header

    Content-Type: text/calendar; charset=”utf-8″; method=REQUEST
    Content-Transfer-Encoding: base64

Google seems to intentionally encode the .ics file (shocking, I know), so you need a way to make it readable. There are tools that work fine in most instances (just search on “base64 decode”). Basically, you’d cut and paste the information and get a valid .ics file. But if you’re dealing with important, work-related documents, keep in mind that this decoding can also be done from your command line.

For instance, here’s how to work with .isc files in Linux:

    $ echo -n ‘scottlinux.com rocks’ | base64
    c2NvdHRsaW51eC5jb20gcm9ja3MK

    $ echo -n c2NvdHRsaW51eC5jb20gcm9ja3MK | base64 -d
    scottlinux.com rocks

On AIX, you can use openssl:

openssl base64 -e <<< ‘Welcome to openssl wiki’
V2VsY29tZSB0byBvcGVuc3NsIHdpa2kK
openssl base64 -d <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kK’
Welcome to openssl wiki

warning base64 line length is limited to 76 characters by default in openssl ( and generated with 64 characters / line ).

openssl base64 -e <<< ‘Welcome to openssl wiki with a very long line that splits…’
V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRo
YXQgc3BsaXRzLi4uCg==

openssl base64 -d <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRoYXQgc3BsaXRzLi4uCg==’
=> NOTHING !

to be able to decode a base64 line without line feed that exceed 76 characters use -A option :
openssl base64 -d -A <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRoYXQgc3BsaXRzLi4uCg==’

Welcome to openssl wiki with a very long line that splits…

In any event, plenty of available options make it simple enough to decode the text. Once the text is deobfuscated, save it as an .ics file. You should then be able to open the .ics file with your mail client of choice and successfully add it to your calendar.

Taking on the Upgrade Exception

Edit: Still relevant today

Originally posted January 24, 2017 on AIXchange

During a recent conversation over lunch, my companion made a great observation: No one questions the need to upgrade their computers and other devices anymore — with one notable exception.

Who do you know that is still using Windows XP, even on a home PC? We just replace these tools, because we understand that the current models are so much faster and more powerful. The same goes for the miniature computers we carry around in our pockets (sometimes called smartphones). Sure, over the lifetime of your phone, you will periodically update your OS and your apps. But eventually, we move on here, too, knowing that the new phones have the latest hardware — and recognizing that the phone carriers will stop supporting the old hardware and software over time.

You can even see this with the technology in your living room. Most likely your television is not more than a few years old. Larger flat screens have become more affordable, and with HD you can really sense the difference with that clear, sharp picture.

So what’s the one piece of technology many of us are reluctant to upgrade? You guessed it. It’s our Power Systems servers.

I still run across customers who are running POWER6, POWER5 or even older processors, along with unsupported versions of AIX or IBM i. And I’m still surprised when I see businesses ignore their critical infrastructure to this extreme. We all understand that this was — and is — amazing technology. But it is old. POWER6 came out in the summer of 2007 — nearly a decade ago. AIX 5.3 hasn’t been supported since 2012 (unless you paid for extended support).

So why is the need to upgrade and stay current not as obvious to some enterprise computing customers? There are a number of factors, starting with the amount of money customers invest in Power Systems hardware. That said, IBM has made these systems more affordable over time, and leasing options are available.

Yes, these systems keep running, but just as with the other technology in your life, eventually there’s a tipping point where upgrading your current hardware and OS becomes the safe and prudent course of action. As time goes on, replacing old hardware parts becomes harder and harder. And if anything goes wrong with your operating system or unsupported application, you may be on your own. It’s far better to stay current with your OS and application patches and refresh your hardware regularly to ensure that support is available when you need it.

Sure, at one time we were all excited about migrating to Windows XP, or getting 3G on our phones. But so much better technology is available to us now. And yes, upgrades take some work on our part, but don’t you find that experience kind of exciting, too? I do when I think of the end users. Those days after cutover weekend, when they can’t believe how snappy their machines are, and they’re ecstatic over the time they’re saving because their jobs are running faster and the system is more responsive. That’s a great feeling, and it goes well with the relief of knowing that your enterprise is up-to-date with its critical systems.

Thoughts on Performance Tuning

Edit: Still good stuff

Originally posted January 17, 2017 on AIXchange

I recently discovered this post to the UNIX & Linux Forums. While it’s from 2013, “The Most Incomplete Guide to Performance Tuning” has some great — and still relevant — ideas.

For starters, this is from the section called “What Does Success Mean?”

“The problem is that fast is a relative term. Therefore it is absolutely imperative that you agree with your customer exactly what fast means. Fast is not “I don’t believe you could squeeze any more out of it even if I threaten to fire you”. Fast is something measurable – kilobytes, seconds, transactions, packets, queue length – anything which can be measured and thus quantified. Agree with your customer about this goal before you even attempt to optimize the system. Such an agreement is best laid down literally and is called a Service Level Agreement (SLA). If your customer is internal a mail exchange should be sufficient. Basically it means that you won’t stop your efforts before measurement X is reached and in turn the customer agrees not to pester you any more once that goal is indeed reached.

A possible SLA looks like this:

Quote: The ABC-program is an interactive application. Average response times are now at 2.4 seconds and have to be reduced to below 1.5 seconds on average. Single responses taking longer than 2.5 seconds must not occur.

This can be measured, and it will tell you – and your customer – when you have reached the agreed target.

By contrast, here’s a typical example of work that is not covered by an SLA, a graveyard of hundreds of hours of uncounted, wasted man-hours:

Quote: The ABC-program is a bit slow, but we can’t afford a new system right now, therefore make it as fast as possible without replacing the machine or adding new resources.

The correct answer for such an order is: “if the system is not important enough for you to spend any money on upgrade it, why should it be important enough for me to put any serious work into?”

This is from the section, “What Does Performance Mean?”

“Another all too common misconception is the meaning of “performance”, especially its confusion with speed. Performance is not just about being fast. It’s about being fast enough for a defined purpose under an agreed set of circumstances.

A simple comparison of the difference between performance and speed can be described with this analogy: We have a Ferrari, a large truck, and a Land Rover. Which is fastest? Most people would say the Ferrari, because it can travel at over 300kph. But suppose you’re driving deep in the country with narrow, windy, bumpy roads? The Ferrari’s speed would be reduced to near zero. So, the Land Rover would be the fastest, as it can handle this terrain with relative ease, at near the 100kph limit. Right? But, suppose, then, that we have a 10-tonne truck which can travel at barely 60kph along these roads? If each of these vehicles are carrying cargo, it seems clear that the truck can carry many times more the cargo of the Ferrari and the Land Rover combined. So again: which is the “fastest”? It depends on the purpose (amount of cargo to transport) and environment (streets to go). This is the difference between “performance” and “speed”. The truck may be the slowest vehicle, but if delivering a lot of cargo is part of the goal it might still be the one finishing the task fastest.

There is a succinct difference between fast and fast enough. Most of us work for demanding customers, under economic constraints. We have to not only accommodate their wishes, which are usually easy – throw more hardware at the task – but also their wallet, which is usually empty. Every system is a trade-off between what a customer wants, and what he is willing to pay for. This is another reason why SLA’s are so important. You can attach a price tag to the work the customer is ordering, so they know exactly what they’re getting.”

This is from the section, “Work Like You Walk—One Step at a Time”:

“If you try to tune a system, change one parameter, then monitor again and see what impact that had, or whether it had any impact at all. Even if you have to resort to sets of (carefully crafted) parameter changes do one set, then monitor before moving onto the next set.

Otherwise you run into the problem that you don’t really know what you are measuring, or why. For example, suppose you change the kernel tuning on a system while, at the same time, your colleague has dynamically added several GB of memory to that system. To make matters “worse” the guy from storage is in the process of moving the relevant disks to another, faster subsystem. At the end, your system’s response time improved by 10%.

Great! But how? If you need to gain another 5%, where would you start? If you had known that adding 1GB of memory had improved the response time by 3% and that adding 3 GB more was responsible for most of the rest, while the disk change brought absolutely nothing, and the kernel tuning brought around 0.5%, you could start by adding another 3GB, and then check if that still has a positive impact. Maybe it didn’t, but it’s a promising place to start. As it is, you only know that something you, or your colleagues, did caused the effect, and you have learned little about your problem or your system.”

And this is from the conclusion:

“Always remember that, as a SysAdmin, you do not operate in a vacuum. You are part of a complex environment which includes network admins, DBAs, storage admins and so on. Most of what they do affects what you do. Have a lousy SAN layout? Your I/O-performance will suffer. Have a lousy network setup? Your faster-than-light machine may look like a slug to the users. There is much to be gained if you provide these people with the best information you can glean from your system, because the better the service you offer to them, the better the service you can expect back from them! The network guy will love you if you do not only tell him a hostname but also a port, some connection data, interface statistics and your theory about possible reasons for network problems. The storage admin will adore you if you turn out to be a partner in getting the best storage layout possible, instead of being only a demanding customer.

Unix is all about small specialized entities working together in an orchestrated effort to get something done. The key point in this is that the utility itself might be small and specialized but its interface is usually very powerful and generalized. What works in Unix utilities also works in people working together: increase your “interface” by creating better and more meaningful data and you will see that others will better be able to pool their efforts with yours towards a common goal.”

There’s a lot more, so take the time to read the whole thing.

The PowerVM Story Gets Better

Edit: Some links no longer work.

Originally posted January 10, 2017 on AIXchange

Why do I consider PowerVM to be such a powerful virtualization technology? It has many advantages compared to competing virtualization technologies, including the capabilities it borrows from the mainframe.

This IBM site has a detailed list of advantages, but I’ll highlight some particularly significant ones:

  • PowerVM hypervisor—Supports multiple operating environments on a single system.
  • Micro-partitioning—Enables up to 20 VMs per processor core.
  • Dynamic logical partitioning—Processor, memory, and I/O resources can be dynamically moved between VMs.
  • Shared processor pools—Processor resources for a group of VMs can be capped, reducing software license costs, VMs can use shared (capped or uncapped) processor resources. Processor resources can automatically move between VMs based on workload demands.

Consider how many LPARs you can consolidate and run on a single physical frame without the performance penalties and overhead you encounter when compared to competing hypervisors that run on x86 systems. PowerVM is recognized for how well it scales and performs.

I’ve discussed SAP HANA on POWER before, but the story gets better as SAP recently announced that its HANA workloads can run up to eight production databases on a single server running PowerVM.

Compare that with these notes on what you can do with VMware:

“Just like with the vSphere 5.5 SAP HANA support release in the beginning 2014, vSphere 6 supports currently only one production level VM that may get co-deployed with non-production level SAP HANA VMs. No resource sharing with other VMs is supported for production level SAP HANA VMs.”

Here’s a summary from the IBM Systems blog:

“Often, SAP workloads are among the most important workloads running in enterprises today. They deliver the most benefits from high levels of flexibility, resiliency and performance — of which virtualization is key. As the first platform to support up to eight virtualized production instances of SAP HANA, IBM Power Systems enables clients to run multiple HANA instances on the same system without the restrictions of VMware… .

With features like capacity on demand, virtual machines (LPARs) and hypervisor scheduling, PowerVM virtualization makes it simple to consolidate, integrate and manage multiple SAP systems, so you can reduce your data center footprint and accelerate speed to production through fewer servers. These features also give the ability to manage capacity and shift running applications to take advantage of additional available resources. This helps clients to conduct real-time transactions and make insights available for more rapid decisions… .

SAP HANA on Power Systems offers a smarter, more scalable in-memory database. It depends heavily on large memory configurations and low virtualization overhead to deliver rapid, actionable insights. And because Power Systems with PowerVM supports up to two times more virtualized HANA production databases than competitors’ x86 platforms, clients can run more HANA instances on one server, simplify deployment to production and manage their systems more easily.”

Estimate rperf for your LPAR

Edit: Interesting tool.

Originally posted January 3, 2017 on AIXchange

A recent Nigel Griffiths tweet highlighted this page:

“This is a simple script that outputs the current machine or LPAR to give you the rPerf number.

The rperf numbers are only available for certain number of CPUs.

If you have a different number of CPUs then a rough calculations is made based on the top number of CPUs and dividing appropriately.

If you want to know what rperf used to work out your rating use: rperf -v

There are some problems:
* Older machines don’t have rPerf numbers so the script outputs the roltp number. There is no way to convert a roltp number to a rPerf. You will have to apply your own rules for that.
* Only certain numbers of CPU have official rPerf Numbers like 4 way, 8 way and 16 way. With LPARs, we can have lots of odd numbers of CPU. In this case, the script guesses the rPerf based on rPerf numbers in a fairly crude way. These are a simple calculation and will not be exact – i.e. it straightens out the SMP curve. The script will give a lower than actual rPerf number.
* Shared CPU LPARs that include a faction of a CPU are not handled well – the tool will find the Virtual Processor number and use that as the maximum number of CPUs the LPAR can get.
* On shared CPU LPARs the script is not Entitlement aware but entitlement is not a limiting factor on a uncapped LPAR any way. If capped should the script use Entitlement and not VP?

How will the script get updated? – Easy it is a straight forward simple shell script – you can up date it yourself and give the script back to your AIX community via the comments below.

By definition: The rPerf number is the Relative Performance number when compared to the above RS/6000 44p Model 270 374 MHz announced on 7th February 2000, which has a rPerf of exactly 1.

Syntax by example:

Assuming you rename the script to just “rperf” and have it in your $PATH
blue:nag:/home/nag/rperf $ rperf -?
Usage: .
./rperf [-vehH]

blue:nag:/home/nag/rperf $ rperf
82.77 rPerf estimated based on 8.00 Virtual CPU cores

blue:nag:/home/nag/rperf $ rperf -e
82.77 rPerf estimated based on 8.00 Virtual CPU cores
41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores

blue:nag:/home/nag/rperf $ rperf -h
blue 82.77 rPerf estimated based on 8.00 Virtual CPU cores

blue:nag:/home/nag/rperf $ rperf -h -e
blue 82.77 rPerf estimated based on 8.00 Virtual CPU cores
blue 41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores

blue:nag:/home/nag/rperf $ rperf -v
Information is from public documents from www.ibm.com
— – System p Performance Report
— – System p Facts and Features Document
— – Power Systems Facts and Features Document
— – rperf script Version:31 Date:18thJune2015
Machine=IBM,8233-E8B MHz=3550 Rounded-MHz=3550 CPUs=8 CPUType=PowerPC_POWER7
lookup IBM,8233-E8B_3550_8
matchup 32 331.06
calculate cpus=8 from 32 331.06
82.77 rPerf estimated based on 8.00 Virtual CPU cores
41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores
blue:nag:/home/nag/rperf $ rperf -H
rperf -v -e -h -H
  -v = verbose mode and Entitlement (-e)
  -e = output Entitlement rating and Capped / Uncapped state (in addition)
  -h = output the short hostname at the start of the line
  -H = Help = this output
rperf outputs the performance number of the current machine or LPAR
either Relative Performance (rPerf) or Relative OLTP (roltp)
Depending on the age of the machine.
There is no simple way to convert from roltp to rPerf – sorry.

If it says estimated then it is NOT an official number.
For LPARs the number may be estimated but it is a simple maths calculation
i.e. if we have the official number for 4 CPUs then a 1 CPU LPAR is simply
a fourth – this will be an under estimate.

rperf script wiki page
https://www.ibm.com/developerworks/community/wikis/home#!/wiki/Power%20Systems/page/rperf

e-mail to XXXX@uk.ibm.com

Got a machine that is not on the list or any other problem ?
Make sure you have the latest rperf version
Run: rperf -v
Capture the output
Add that output as a comment at the bottom of this webpage
I get automatically notified and will sort it out
If you are a proper Techie: work out the missing line and put that in the comment too.
Thanks for your use, help and feedback, Nigel Griffiths”

There are links for downloading the files. As of this writing, rperf_v33 is the most recent, from November.

Booting AIX in Debug Mode

Edit: Still good to know.

Originally posted December 20, 2016 on AIXchange

I recently had an AIX LPAR that wasn’t booting. In an effort to gather information, IBM Support had me boot it a few different ways. This document details what we needed to do. I’m copying it here because I want to make sure you’re aware of this option.

“How to enable verbose (debug) output during boot and capture it for later analysis by IBM.

Note this technique is for customers using an HMC to manage their systems.

We will capture the console output by logging in to the HMC via an SSH client such as PuTTY, with logging enabled. This will save the output in a file on the user’s PC.

1. Configure an SSH client (eg PuTTY) to log session output to a local file on the PC.
2. Open a connection to the HMC and login as user ‘hscroot’.
3. Bring up a menu of managed servers by running the command “vtmenu”. If there is only 1 managed server this will bring up a list of LPARs available to connect to.
4. At the vtmenu, select the server to which you desire a console session.
5. Select the LPAR from which you need boot debug.
6. Wait for “Open Completed” message (if LPAR were Running you would get a Console: login)

Booting the LPAR to the Open Firmware (OK) prompt
1. Make sure the LPAR is not activated. If it is hung, go to the HMC GUI, and under Systems Management -> Servers -> server name, check the box next to the LPAR. Then from the arrow on the right side of the LPAR name, popup the menu and select “Operations -> Shut Down”.
2. Wait until the LPAR is in a “Not Activated” state, and the Reference Code shows all zeros.
3. Mouse click on the arrows to the right of the LPAR name again, to get the popup menu. Click “Operations -> Activate -> Profile”
4. From the Activate Logical Partition popup window, click the “Advanced” button.
5. From the Activate Logical Partition – Advanced popup window, click “Open Firmware OK Prompt” from the Boot Mode drop down list.

Enabling the debug boot image
1. Back in the SSH console session window, wait for the Open Firmware prompt “0>”
At the 0> prompt, enter “boot -s verbose”

2. For cdrom boot debug enter:
0> boot cdrom:\ppc\chrp\bootfile.exe -s verbose

At this point, the LPAR will continue to boot and debug information will be sent to the console. While the LPAR is booted in this debug state, all commands that are run will output debug information, such as exec() system calls.

Capturing the debug information
The console session is being run via the SSH connection to the HMC and the output will be captured in the log file configured in the first step. Once the system boot fails or hangs, stop the LPAR and send the boot debug log file to IBM Support for review.

Finishing up
To disconnect from the virtual console you have selected, type the characters tilde and dot.
~.”

The console will ask if you wish to terminate the connection. Type “y” to be disconnected from the virtual console.

At this point you can type <ENTER> to stay in the vtmenu session and choose another console, or type “q” to quit back to the HMC shell prompt.

If you are quitting, then type “exit” to close the HMC ssh session and quit the putty tool.

Once we had collected the data, IBM was able to help determine the problem.

As a reminder, you can also get debug information from your VIO server as well, using this technique:

    Login to VIOS as padmin
    $ oem_setup_env
    # script -a /home/padmin/<PMR#.Branch#>clidebug33.out
    # su – padmin
    $ ioslevel
    $ uname -LMm
    $ export CLI_DEBUG=33
    Run offending command to reproduce error
    $ export CLI_DEBUG=”” (to disable debugging mode)
    $ exit (padmin)
    # exit (script)

Back Up Your HMC or Get Ready to Rebuild

Edit: Backup everything. Then test it.

Originally posted December 13, 2016 on AIXchange

A customer had an HMC issue. There were no backups, so the HMC had to reinstalled from scratch. There wasn’t any documentation either, meaning that the customer had no idea what the network settings should be.

Stop reading for a moment and put yourself in this uncomfortable picture. Do you have backups of your HMC? Is your network information well-documented? Is it documented at all? If you had to rebuild your HMC right now, could you?

Luckily for my customer, they had a simple environment, and their HMC was onsite so they could visually inspect their equipment. One network cable was directly plugged into the HMC port of their single POWER8 system; another connected directly from their HMC into their switch. Knowing this, it was simple enough to determine which port should be the private network and which should be the open network.

What about you? Do you know which physical cables from your HMC are used for which network in your environment?

Back to our story: Configuring the open network was straightforward, and my customer was soon able to use the GUI to connect to the HMC. Once the firewall settings were fixed and the ssh port opened, they could login to the HMC via the command line.

Their next issue was getting the HMC to recognize their managed system. They picked a range of IP addresses to use for their DHCP server, but how would they know which IP address was in use by the managed system?

After looking over this documentation, they ran lshmc -n -F clients. That provided the IP address that had been served out by their DHCP server.

From there, it was a snap to add the managed system, since they knew the address it was using. But what about the password? No one in the group was around when it was originally set up, so no one knew the password for the managed system.

Again, ask yourself: Do you know the passwords for your managed system, ASMI, etc.?

A few failed guesses (naturally) resulted in an authentication failure message. The HMC went into a firmware password locked state. With nowhere else to turn, they did a web search and found this IBM Support document with this helpful bit of information:

       Note: The default password for user admin is admin.

So they tried “admin.” Unsurprisingly, that default was still in place. The customer was able to connect to their managed machine. Everything looked as they expected.

I know this is basic, but even the basics can mess you up if you haven’t thought about them, particularly if you weren’t the one who setup your HMC.

Should you ever find yourself in a similar predicament, here’s a pretty good reference HMC setup.

Bare Metal Recovery Options for Linux

Edit: Still a good question.

Originally posted December 6, 2016 on AIXchange

I recently wrote about backups, though I didn’t get into the bare metal recovery options for Linux.

I wrote about this topic in 2005, and here I am, 11 years later, still wondering where is my integrated bare metal recovery mechanism for Linux? The answer is still going to include Storix, though there’s another utility that may also work for you. It’s called Relax-and-Recover:

 “Set up and forget nature

designed to be easy to setup

designed to require no maintenance (e.g. cron integration, nagios monitoring)

Two-step recovery, with optional guided menus

disaster recovery process targeted at operational teams

migration process offers flexibility and control

Bare metal recovery on dissimilar hardware

support for physical-to-virtual (P2V), virtual-to-physical (V2P)

support for physical-to-physical (P2P) and virtual-to-virtual (V2V)

various virtualization technologies supported (KVM, Xen, VMware)”

Relax-and-Recover is a no-cost product, available through a General Public License (although the developers are happy to take donations or sponsorships, and they do offer support contracts).

Check out the quick start guide and these usage scenarios:

“Relax-and-Recover will not automatically add itself to the Grub bootloader. It copies itself to your /boot folder.

To enable this, add

    GRUB_RESCUE=1

to your local configuration.

The entry in the bootloader is password protected. The default password is REAR. Change it in your own local.conf

    GRUB_RESCUE_PASSWORD=”SECRET”

The most straightforward way to store your DR images is using a central NFS server. The configuration below will store both a backup and the rescue CD in a directory on the share.

     OUTPUT=ISO
  BACKUP=NETFS
  BACKUP_URL=”nfs://192.168.122.1/nfs/rear/”

Backup integration
Relax-and-Recover integrates with various backup solutions. Your backup software takes care of backing up all system files, Relax-and-Recover recreates the filesystems and starts the file restore.

Currently Bacula, Bareos, SEP Sesam, HP DataProtector, CommVault Galaxy, Symantec NetBackup, EMC NetWorker (Legato) and IBM Tivoli Storage Manager are supported.

The following /etc/rear/local.conf uses a USB stick for the rescue system and Bacula for backups. Multiple systems can use the USB stick since the size of the rescue system is probably less than 40M. It relies on your Bacula infrastructure to restore all files.

     BACKUP=BACULA
  OUTPUT=USB
  OUTPUT_URL=”usb:///dev/disk/by-label/REAR-000″

I haven’t tried this tool yet, but it looks interesting. Apparently there are even ppc64le images and pcc64 images.

How do you go about a bare metal recovery of your Linux partitions?

Tech Terms Defined Redefined

Edit: Someone needs to update the IBM Jargon file.

Originally posted November 29, 2016 on AIXchange

I love clever definitions of technology-related terms. In the past I mentioned the IBM Jargon and General Computing Dictionary (which will be 30 years old soon).

Here’s a similar list that — while it’s directed toward an academic audience — is more up-to-date. Don’t worry, you’ll recognize all these terms:

“Analytics, n. pl. The use of numbers to confirm existing prejudices, and the design of complex systems to generate these numbers.

App, n. An elegant way to avoid the World Wide Web.

Asynchronous, adj. The delightful state of being able to engage with someone online without their seeing you, while allowing you to make a sandwich.

Badges, n. pl. The curious conceit that since nobody likes transcripts or degrees, the best thing to do is to shrink them into children’s sizes that nobody recognizes. (see Open Badges)

Best practice, n. An educational approach that someone heard worked well somewhere. See also “transformative,” “game changer,” and “disruptive.”

Chromebook, n. A device that recognizes that the mainframe wasn’t such a bad idea after all.

Cloud, n. 1. A place of terror and dismay, a mysterious digital onslaught, into which we all quietly moved. A “just other people’s computers.”

Counsel, n. Well paid, well trained in neither education nor technology, and rules decisively on (and against) both.

Forum, n. 1. Social Darwinism using 1980s technology.

Infographic, n. An easy way to avoid reading and writing.

Powerpoint, n. 1. A popular and low cost narcotic, mysteriously decriminalized.

Shadow IT department, n. A mysterious alliance that does a lot of work on campus. It seems to include little start-up companies like Google, Amazon, Apple, Microsoft, and others.”

You’ll find many more definitions in that link. Or just check out the Original Hacker’s Dictionary or the Business Jargon Dictionary.

I am sure that there are other tech dictionaries written in a similar vein. Please share your favorites in comments.

Adjusting to a Linux World

Edit: I still love AIX.

Originally posted November 22, 2016 on AIXchange

I use Linux, and have for many years. I run Linux on Power hardware, which is something any Linux enterprise user should consider. Still, I prefer to live in the world of AIX. I understand that Linux is a fixture now, but there are features and capabilities that I take for granted with AIX that aren’t there (at least not yet) with Linux.

This article is a few years old, but it gets at the challenges of working simultaneously with open source and proprietary operating systems. (Incidentally, the author cites two books — “The Cathedral and the Bazaar,” by Eric Raymond, and “The Design of Design,” by Frederick P. Brooks — that would aid in your understanding of what he’s talking about):

“Quality happens only when someone is responsible for it. …

Getting hooked on computers is easy—almost anybody can make a program work, just as almost anybody can nail two pieces of wood together in a few tries. The trouble is that the market for two pieces of wood nailed together—inexpertly—is fairly small outside of the “proud grandfather” segment, and getting from there to a decent set of chairs or fitted cupboards takes talent, practice, and education.”

I enjoy the author’s discussion of the bloat and prereqs and dependencies that exist in modern systems:

“… the map helpfully tells you that if you want to have www/firefox, you will first need to get devel/nspr, security/nss, databases/sqlite3, and so on. Once you look up those in the map and find their dependencies, and recursively look up their dependencies, you will have a shopping list of the 122 packages you will need before you can get to www/firefox. Here is one example of an ironic piece of waste: Sam Leffler’s graphics/libtiff is one of the 122 packages on the road to www/firefox, yet the resulting Firefox browser does not render TIFF images. For reasons I have not tried to uncover, 10 of the 122 packages need Perl and seven need Python; one of them, devel/glib20, needs both languages for reasons I cannot even imagine.

libtool’s configure probes no fewer than 26 different names for the Fortran compiler my system does not have, and then spends another 26 tests to find out if each of these nonexistent Fortran compilers supports the -g option.

That is the sorry reality of the bazaar Raymond praised in his book: a pile of old festering hacks, endlessly copied and pasted by a clueless generation of IT “professionals” who wouldn’t recognize sound IT architecture if you hit them over the head with it.

One of Brooks’s many excellent points is that quality happens only if somebody has the responsibility for it, and that “somebody” can be no more than one single person—with an exception for a dynamic duo.”

Who is responsible for Linux? Which distribution do you even consider to be “Linux”? Who’s in charge of making the switch to systemd, or making sure that Linux distributions don’t break as a result of that change? Who do you call when they do break?

Who coordinates between the different distributions and vendors, and how do you know which one is right for you? Are you going with a commercially supported product like Redhat, SUSE or Ubuntu? How about a community supported flavor like Centos or Fedora? 

With AIX, you know who’s responsible. It’s the project managers at IBM, who take input from customers, prioritize what goes into the next release, conduct proper testing to ensure that the large enterprises who rely on AIX will continue to have stable environments, and to allow for a significant amount of backwards compatibility while introducing new features:

“More than once in recent years, others have reached the same conclusion as Brooks. Some have tried to impose a kind of sanity, or even to lay down the law formally in the form of technical standards, hoping to bring order and structure to the bazaar. So far they have all failed spectacularly, because the generation of lost dot-com wunderkinder in the bazaar has never seen a cathedral and therefore cannot even imagine why you would want one in the first place, much less what it should look like.”

This is the kind of thing that I notice. I get that different sets of programmers and designers will have different opinions about what’s important and necessary, but we’re talking about two different worlds here. How many Linux developers/open source users have spent any time working with mainframes or commercial UNIX operating systems? If all you’ve ever seen is Linux, Windows or macOS, how can you even begin to understand what those of us who manage enterprise systems need to effectively do our jobs?

It’s not that I’m not willing to change. For instance I now realize that I shouldn’t think of my critical systems as friendly pets that require special care and feeding. But the problem, for me, is the significant differences in philosophy and design between those who use enterprise systems and those who use Linux/open source solutions.

With AIX, I have a built in logical volume manager (LVM) that allows me to easily import and export volume groups and resize filesystems. Or consider the capability to migrate rootvg to another disk and run bosboot while the system is up and running. This is not always an option on other operating systems. It can be frustrating to learn that you cannot easily resize partitions or filesystems, or find that default filesystems have changed, and not always for the better. With AIX, I easily find new hardware on my running system with cfgmgr, I check and change the attributes of my adapters without writing a value to /proc, and I dynamically remove CPUs and memory. And being able to choose whether I run one adapter virtually with VIO server and another adapter physically by assigning the card to my LPAR is a nice touch.

Now think about multibos and alt_rootvg and alt_disk_install and the power of being able to boot from hdisk1 after a migration, and how, if there are issues, you can boot from your original OS that is still on hdisk0 and try again later. Some Linux distributions don’t even allow migration from one version of the OS to another; it’s suggested you reinstall. With AIX, I’ve upgraded the same systems for years with no problems.

AIX enjoys deep integration with the hardware, and performance tuning is well understood. There’s built in bare metal backup and restore of the operating system, built in error reporting at the hardware level (try finding which component needs to be replaced on your x86 hardware while your system is running’ maybe your light path diagnostics will work, but maybe not), and rock-solid, well considered virtualization solutions at the hardware level.

Last but not least, as of AIX 7.2, there is Live Update.

Yes, the shift away from proprietary UNIX is happening. Shift isn’t even the word.  Linux is learning, absorbing and getting smarter. Linux is eating the world. But as we continue down this path, I’ll continue to think of the features I already have or those that I’ll have to give up, at least until the Linux phenomenon catches up.

Another Case for Backups

Edit: Still good stuff.

Originally posted November 15, 2016 on AIXchange

As I’ve mentioned, the AIX mailing list is a great place to go to pose questions and receive good answers from other AIX pros. While traffic is typically pretty light, I recently came across an interesting thread about the need to take care when editing critical files:

“A coworker was editing the /etc/passwd file on an LPAR on our P720 server. When he tried to save the file, emacs hiccupped and he ended up putting an empty /etc/passwd file in its place.

Now, with no open sessions to the LPAR, no one can access the LPAR. This LPAR is one of our primary NFS servers. So far, only a few items have stopped working, SAMBA being one. But in general, the hundreds of AIX/Linux/Unix clients in our R&D group are still able to reach the NFS mounts (at least the ones they had automounted when this all happened).

I went to the HMC and got a terminal/console there, but still need a password to get in.

Any ideas as to what I might do to crack this nut and get into the box?”

Put on your thinking cap for a moment. How would you get out of this pickle?

The first two replies offer great suggestions:

“What is your backup product? You may be able to restore it using the agent already running on the system.

New logins will be impossible. You’ll have to leverage something that already has access.”

The second simply consists of a link to this IBM Knowledge Center doc, plus the following:

“once in

echo ‘root:!:0:0::/:/usr/bin/ksh’ > /etc/passwd
chmod 644 /etc/passwd
sync;sync;sync;reboot”

The next day, the solution was posted:

“Thank you for all your suggestions. It brought back to heart why I so loved AIX and the support I can get (and occasionally give).

Here is the solution:

We had a backup of the system, but the tape was offsite at our DR site (Some cave under Lake Erie, or the like).

Then it hit me, I do not need “the” /etc/passwd file from this LPAR (last backed up in a Full backup in August!). I just need “a” /etc/passwd file. ANY /etc/passwd file. Or just a one line passwd file I could make myself.

I just needed the back-door of NetBackup to place the file there.

By now I had the NetBackup guys on the line and in Priority 1 mode, so I asked them to pull the /etc/passwd file off of the twin LPAR on the other P720 we have. Then restore it to this LPAR. In less than two minutes, we were back in business!! Then I had a copy of the real passwd file, which is nearly identical to the one from the other LPAR, and I put that in place.”

I’ll also cite the reply to that, because it’s an awesome punch line:

“Having a close call is a good time to review your backups and your bootable media.

If you don’t have NIM then it’s critical to keep media at close level to what you are running available near the machine.”

The rest of the discussion covers things like making sure your NIM server ready to go, along with some more details around what went wrong with editing /etc/passwd in the first place. It turns out they did have a backup of /etc/passwd, but since they couldn’t log into the machine at all, they were unable to copy that saved file.

Indeed, the best time to ensure your machine is backed up is before a disaster strikes. Run through this checklist:

  • Do you have a current viosbr?
  • Have you run backupios?
  • Do you have a current accessible mksysb of your VIO server? Do you have current mksysbs of your LPARs? Do you have a local Alt Disk Copy of rootvg?
  • Is your HMC backup current? Do you have a mksysb of your NIM server?
  • If you take backups, that’s great. But have you tested them? Are they accessible if your computer room burns down? I wrote about this more than 10 years ago, yet here we are, still needing to backup our machines and still needing to know how to restore them.

We’re all busy, but it’s essential to take time now to figure out how you can recover your systems.

Building Virtual Environments

Edit: Some links no longer work.

Originally posted November 8, 2016 on AIXchange

This IBM developerWorks page offers helpful information about building virtual environments. While it hasn’t been updated in awhile, the content is certainly relevant.

There are four sections, covering “pre-virtualization,” planning and design, implementation, and management and administration. The information that follows is excerpted in bits and pieces:

“Virtualization is large subject, so this section will assume you know the basics and you have at least done your homework in reading the two Advanced POWER Virtualization Redbooks. …

Skill Up on Virtualization
* You need to invest time and practice before starting a Virtualization implementation because misunderstandings can cost time and effort to sort out – the old saying “do it right the first time” applies here.
* There is no quick path. …

Assess the virtualization skills on hand
* It is not recommended to start virtualization with only one trained person due to the obvious risk of that person becoming “unavailable”
* Some computer sites run with comparatively low-skilled operations staff and bring in consultants or technical specialists for implementation work – in which case, you may need to check the skill levels of those people. …

Practice makes perfect
* In the pSeries, the same virtualization features are available from top to bottom — this makes having a small machine on which to practice, learn and test a realistic proposition. The current bottom of the line p505 is available at relatively low cost, so if you are preparing to run virtualization on larger machines, you can get experience for a tiny cost.
* Also, in many sites machines in the production computer room have to undergo strict “lock down” and process management – a small test machine does not have to be run this way, and I have seen system administrators and operations run a “crash and burn” machine under their desk to allow more flexibility.”

There’s a nice list of different scenarios that cover small machines, “ranches” of machines, production, etc. The last list mentions a dual VIO server, although I would argue that is the rule and not the exception:

“When to go Dual Virtual IO Server and when not to?

This is impossible to answer, but here are a few thoughts:

* The Virtual IO Server is running its own AIX internal code like the LVM and device drivers (virtual device drivers for the clients and real device drivers for the adapters). Some may argue that there is little to go wrong here. Adding a second Virtual IO Server complicates things, so only add a second one if you really need it.
* Only add a second Virtual IO Server for resilience if you would normally insist on setting up a high-availability environment. Typically, this would be on production machines or partitions. But if you are going to have an HACMP setup (to protect from machine outage, power supply, computer room or even site outage), then why would you need two Virtual IO Servers? If the VIO Server fails, you can do a HACMP fail over the other machine.
* If this is a less-critical server, say one used for developers, system test and training, then you might decide the simplicity of a single VIO Server is OK, particularly if these partitions have scheduled downtime for updates to the Virtual IO Server. Plan on scheduled maintenance. Also note the VIO Server and VIO Clients start quickly so the downtime is far less then older standalone systems.”

VIO server sizing info is one area where the content is old. Nigel Griffiths has updated information, noting, among other things, that the VIO server must be monitored as workloads increase.

Back to the original link. This is found in the section headed, “Common Mistakes”:

“Priorities in Emergencies
Network I/O is very high priority (dropped packets due to neglect require retranmission and are thus painfully slow) compared to disk I/O because the disk adapters will just finish and sit and wait if neglected due to high CPU loads. This means if a Virtual IO server is starved of CPU power, something that should be avoided, but if it happens then the Virtual IO Server will deal with network as a priority. For this reason some people consider splitting the Virtual IO Server into two. One for networks and one for disks, so that disks do not get neglected. This is only a worst case scenario and we should plan and guarantee this starvation does not happen. …

Virtual IO Server below a whole CPU
For excellent Virtual IO Server responsiveness giving the VIO Server a whole CPU is a good idea as it results in no latency waiting to get scheduled on to the CPU. But on small machines, say 4 CPU, this is a lot of computer power compared to the VIO client LPARs (i.e. 25%). If you decide to give the VIO Server say half a CPU (Entitle Capacity = 0.5) then be generous, never make the VIO Server Capped and give it a very large weight factor.”

These excerpts are from the “Implementation” section:

“* In most large installations the configuration is an iterative process that will not quite match the initial design so some modification may have to be made. …
* Also opportunities may appear too to add flexibility of a pool of resources that can be assigned later, once real life performance has been monitored for a few weeks.”

Finally, from the “Management/Administration” section:

“Maintain VIO Server Software
* New and very useful function appear in the latest VIO Server software which makes updating it worthwhile.
* Take careful note the this may require firmware updates too and it is worth scheduling these and in advance of VIO Server software updates.
* There are also fixes for the VIO Server to overcome particular problems.

It is worth making a “read only” HMC or IVM user account for people to take a look at the configuration and know they can’t “mess it up”.

I often get people claim that their Virtual I/O resource is not available when they create a new LPAR and 90% of the time it is due to mistakes on the HMC. The IVM features automation of these setup tasks and is much easier. Also recent new versions of the HMC software make the cross checking of the virtual VIO Server VIO client resources all match up.

It is strongly recommended the the HMC, system firmware and VIO Server software is all kept up to date to make the latest VIO features and user interface advances available and to remove known and fixed problems with early releases.”

At the very end of the document there are a few examples of how to get configuration data from the HMC, create an LPAR using the command line, and create LPARs using a configuration file.

Again, some of this information is dated. But overall, there’s lots of good advice.

My Hosted PowerSC Trial Session

Edit: Some links no longer work.

Originally posted November 1, 2016 on AIXchange

Did you see this AIX EXTRA article about PowerSC?

“PowerSC 1.1.5 will bring us a new user interface that makes the security compliance aspect of the product significantly easier to manage. Many Power Systems clients need to adhere to different security compliance standards for their particular industry. Examples include COBIT, PCI, HIPAA, NERC, and DoD.

PowerSC, and previously aixpert, have always been great tools in managing compliance with these standards. They took the rules and requirements of the different standards and applied them to the AIX operating system, so you didn’t have to. But in order to manage these profiles, users had to log in and execute commands on each machine individually.”

The piece also mentions an IBM hosted trial period, which ended last week. I was among the users who took part in the trial, and while I can’t say a lot about it due to the confidentiality agreement I signed, I will tell you that I liked what I saw with PowerSC. I also like the direction IBM could be taking with this product.

The process to get access to the environment was very easy. We set up a mutually convenient time via email, and at that time we got on a shared screen session together. I was given control of the session, and I was asked about what I saw, what I liked and what I didn’t like. I performed tasks with the product so we could see how intuitive the process was.

After running the product through its paces, I provided my feedback. Again, I think you’ll like it once you get your hands on it. I imagine this new iteration with a GUI might make some of you more motivated to implement PowerSC in your environment.

I’d love to hear from anyone else who participated in the trial. I’m also curious if you knew about the trial before reading this. I previously tweeted about it (@robmcnelly), and so did @AIXmag. But I always wonder how you get your information, whether it’s from this blog, Twitter or the AIX EXTRA email newsletter.

And if you missed out on the trial, I hope you’ll take advantage of chances to test out other IBM products in the future. It’s a free opportunity to learn about tools that could really help you.

AIX Keeps Making History

Edit: I still like to remember the good old days.

Originally posted October 25, 2016 on AIXchange

I’m a fan of history, especially technology-related history. So as I get older, I like to reminisce about “the good old days.” Like when I attended Desert Code Camp 2016 earlier this month.

The event, held at Chandler-Gilbert Community College in Chandler, Ariz., was great. Sessions were focused toward developers, including one that covered IBM Watson and Bluemix.

What got me reminiscing is the fact that I actually attended this school back in 1987, shortly after it opened. It was fun to walk around the campus and see the growth and change that’s taken place over nearly 30 years. While several of the original buildings and computer labs still stand, it was enough of a change to show me that life marches on.

Adding to that weekend’s retro feel, The Retro Wagon filled a room with classic hardware: everything from Altairs, teletypes, Commodore 64s, TRS80s and Apple II computers to slide rules and acoustic couplers. It was like walking into a time warp. If you follow me on Twitter (@robmcnelly) you might have seen photos of some vintage machines. Otherwise, check out The Retro Wagon Twitter feed, which is available from their homepage.

It’s amazing to think that when I was in college and a lot of that technology was being unveiled, my favorite operating system was also part of that era’s innovation. Yes, AIX turns 30 this year. If you’re wondering what AIX was like at its inception, read what some of the key people involved in its creation had to say in this IBM Systems Magazine 20-year retrospective from 2006. There are some great memories, along with some names that you may remember from conferences you’ve attended over the years.

But what about the rest of the story? What can we say about the past 10 years of AIX?

One highlight that immediately comes to my mind is that the latest release of AIX. AIX 7.2 TL1 allows customers to upgrade their operating system with no downtime. Think of what we can do on the fly now: we can patch one VIO server, reboot it and patch the redundant VIO server in the pair — and the VIO clients shouldn’t even notice. We can non-disruptively update system firmware in many cases. We can take minor outages to patch our PowerHA clusters. And now we can patch our OS without an outage. I see Ubuntu is working on something called Livepatch, but I wonder how long it will be before another operating system can be patched on the fly the way AIX can.

The past 10 years of AIX have also given us Live Partition Mobility, where we can move running workloads between POWER6, POWER7 and POWER8 servers.

We have PowerHA, built into the OS with Cluster Aware AIX (CAA).

We have shared storage pools. We have multiple shared processor pools.

We have more granularity when creating virtual machines.

We have CAPI and the capability to have I/O cards talk directly to the CPU.

We have POWER8 processors that give us up to 8 threads per core.

We have active memory expansion and we have WPARs. We have the capability to run AIX 5.2 or AIX 5.3 in a WPAR.

Then there are the products that come with AIX Enterprise Edition, including PowerSC, PowerVC, Cloud Manager and the Dynamic System Optimizer.

There’s plenty more that I didn’t mention. The point is that the past 10 years has produced substantial improvements that have made AIX a more powerful operating system with more advanced virtualization capabilities and more powerful hardware. And these improvements have made our jobs easier.

It’s nearly impossible to imagine where AIX will be in another 10 years, but as much as I like looking back, I’m even more excited about what’s ahead. Just think, 2026 will be the 40th anniversary of AIX. What will we be able to do then? What won’t we be able to do?

Digging into Last Week’s IBM Announcements

Edit: Some links no longer work.

Originally posted October 18, 2016 on AIXchange

Last week IBM announced new hardware models, along with new features and functionality within AIX and IBM i. I believe IBM is once again showing a strong commitment to the Power brand, and, by providing the capability to update your operating system on the fly, giving customers another reason to choose Power Systems.

Here’s the announcement summary. Some highlights include AIX 7.2 enhancements, including the capability to perform live updates. Before we could use AIX Live Update for interim fixes, but now we can perform live updates of the AIX operating system:

  • Introduced in AIX 7.2, AIX Live Update is extended in Technology Level 1 to support any future update without a reboot, with either the geninstall command or NIM.
  • The genld command is enhanced to list processes that have an old version of a library loaded so that processes can be restarted when needed in order to load the updated libraries.

For more about this new feature, read this from IBM developerWorks, and watch this. Before you make that jump to some other operating system, Live Update might give you pause. What other operating system lets you update it while it’s running?

There’s also the capability to use large pages with Active Memory Expansion (AME). According to previously referenced AIX 7.2 announcement letter, “the 64k pages can be configured/compressed in an LPAR, and the amepat command is enabled for 64k page modeling.”

There’s an enhancement to the AIX Toolbox for Linux. IBM will commit to maintaining it with current levels, along with enhancements to yum and updating the USB device library. Again, this is from the AIX 7.2 announcement letter:

“To facilitate the installation of supplementary open source software packages for AIX, IBM introduces the yum package management tool for AIX. Along with yum, there is a mandatory update of the RPM Package Manager to version 4.9.1.3. In this update, new function enables yum to perform automatic open source software dependence discovery and update maintenance for RPM-based open source software installed on your system.

A new policy maintains and addresses open source security vulnerabilities in selected key open source software packages. IBM expands its commitment to keep the key open source packages updated to reasonably current levels….

The cloud-init utility and all of its dependencies are now available on the AIX Toolbox for Linux Applications website. With yum, you can easily install cloud-init, and licensed AIX users receive support.

The libusb development library for USB device access is added to the AIX Toolbox for Linux Applications.”

There’s a new E850C server to go along with the E870C and the E880C:

“The Power E850C server (8408-44E) is the latest enhancement to the Power System portfolio. It offers an improved 4-socket 4U system that delivers faster POWER8 processors up to 4.22 GHz, with up to 4TB of DDR4 memory, built-in IBM PowerVM virtualization, and Capacity on Demand. It integrates cloud management to help clients deploy scalable, mission-critical business applications in virtualized, private cloud infrastructures.

Like its predecessor Power E850 server, which was launched in 2015, the new Power E850C server utilizes 8-core, 10-core, or 12-core POWER8 processor modules. But the E850C processors are 13% – 20% faster and deliver a system with up to 32 cores at 4.22 GHz, up to 40 cores at 3.95 GHz, or up to 48 cores at 3.65 GHz and utilize DDR4 memory. A minimum of two processor modules must be installed in each system, with a minimum quantity of one processor module’s cores activate.”

There are new and improved HA, DR, and backup/recovery solutions, including a new PowerHA interface and dashboard. This link cites Live Partition Mobility resiliency improvements and simplified remote restart enhancements that provide for automated policy-based VM remote restart and VM remote restart when the system is powered off. It also mentions the HMC:

HMC V8.8.6 has been enhanced to include support for the following:

  • Ability to export performance and capacity data collected by the HMC to a CSV formatted flat file for use by other analysis tools
  • Reporting on energy consumption, which can be used either by the REST APIs or by the new export facility
  • Dynamic setting of the Simplified Remote Restart VM property, which enables this property to be turned on or off dynamically.

The PowerHA System Mirror 7.2.1 announcement letter specifically covers the new GUI “that enables at-a-glance health monitoring for a PowerHA on AIX cluster or group of clusters, easy to digest view of PowerHA cluster environment, immediate notification of health status, click-on event status, and intelligent filtering of relevant event logs.”

We’re not done yet. Enhanced I/O and server options include:

“DDR4 CDIMM memory options provide energy savings and DDR4 configuration options:
Smaller-capacity CDIMMs join the existing large-capacity CDIMM for IBM Power® E880, E880C, E870, and E870C servers.

For IBM Power S812L, S814, S822, S822L, S824, and S824L servers, a full set of DDR4 DIMMs is announced that match existing DDR3 capacities.

Capacity Backup for Power Enterprise Systems is new simplified and cost-effective HA/DR offering that replaces the existing Capacity Backup for PowerHA offering. It is now available for the IBM Power E870, E880, E870C, and E880C servers.”

PowerSC 1.1.5 will feature a new interface:

“A new compliance user interface where users can manage compliance profiles across their environment, create custom profiles, and groups of endpoints.

Compliance automation profile updates to Payment Card Industry (PCI) version 3.1 and North American Electric Reliability Corporation (NERC) version 5.

Trusted Network Connect now supports patch management of critical security fixes for open source packages on AIX® base for packages that have been downloaded from the AIX toolbox or other web download sites for AIX Open Source Packages.”

If you manage SANs, you should know about the IBM Network Advisor V14. A key enhancement is an “at-a-glance summary of all discovered b-type devices, including inventory and event summary information used to identify problem areas and help prevent network downtime.

And for those of you who also manage IBM i, 7.2 TR5 has also been announced, along with 7.3 TR1.

Believe it or not, there’s more I haven’t covered, so dig into the links.

The Best Documentation is Well-Organized

Edit: Still some good websites to visit.

Originally posted October 11, 2016 on AIXchange

It’s once again time to nominate IBM Champions. You can do so here.

Seeing that notice reminded me of the IBM Champions event I attended in Austin, Texas, some months back. I’d known a lot of these folks for years, but I was meeting a few of them for the first time. Balazs Babinecz was one of those people I’d never been face to face with. Like me, Balazs has a blog:

“This blog is intended for anyone who is working with AIX and encountered problems and looking for fast solutions or just want to study about AIX. This is not a usual blog, it is not updated every day. I tried to organize AIX related subjects into several topics, and when I find new info/solutions/interesting stuff I will add it to its topic. You can read here about many things of the world of AIX. (NIM, Storage, Network, VIO, PowerHA, HMC, Performance Tuning…)

The structure of each subject is very similar. First I try to give a general overview about a topic with the most important terms and definitions. This is followed by some important/useful commands, what are probably needed for everyday work. At the end, there are some common situations with solutions which could come up during the daily work of an administrator.

I tried to keep it as simple as possible, so without any further instructions you should be able to navigate through this site very easily.

All of these materials have been gathered by me through my experience, IBM Redbooks, forums and other internet sources. It means not all of them is written by me! If I find an interesting data and I think it is valuable, I publish it on this blog. (Basically this blog is my personal viewpoint about AIX related stuff, and it is not an official IBM site.) Most of the things have been tested successfully but it can occur that you encounter typos, missing steps and erroneous data (I cannot guarantee everything works perfectly), so please look and think before you act.”

It was fun getting a chance to meet Balazs, since I’ve frequented his blog over the years. It’s very well organized, with links to information about filesystems, the logical volume manager (LVM), HMC, networks, NIM, performance, storage and backup, install, PowerHA, PowerVM and more. Many of the topics include a section called “Basics,” which provide a good, quick overview of a particular topic.

I listed some other useful AIX resources here, and I should add William Favorite’s AIX QuickSheet to that list.

Balazs’s blog in particular reminds me of the advantages of a simple, easy to navigate web design, where you can start with broad topics and then drill down into specific details. It may seem like a minor thing, but it matters. Read through the comments, and you’ll see that many other admins agree.

A Performance Analysis Tool for Linux

Edit: Did you know this tool exists? Some links no longer work.

Originally posted October 4, 2016 on AIXchange

I often hear from people who want to know how to conduct in-depth performance analysis on Linux. These folks are new to the platform and wonder why they can’t find many of the tools (PerfPMR, for instance) that they take for granted with AIX.

If you find yourself in this situation, you should know about a performance-focused script called the Linux Performance Customer Profiler Utility (lpcpu), which gathers data from both x86- and Power Linux-based systems:

“This script captures a lot of potentially interesting performance data (profile information, system information, and some system configuration information) on a single pass approach. Gathering all of this information at once allows the context to be understood for the performance profiling data to be analyzed.

  • The script will check to be sure you have the basic tools installed on your system.

This script takes advantage of all of the “normal” performance tools used on Linux.

  • iostat
  • mpstat
  • vmstat
  • perf
  • meminfo
  • top
  • sar
  • oprofile
  • perf

In addition, relevant system information is gathered, with the profiler output, into a single tarball saved on your system. By default, the file is saved in /tmp.

The script is designed to run on both x86 and Power servers, the focus being SLES and RHEL distros. It should work on OpenSUSE and Fedora as well.

The script creates a zipped tar-ball placed in /tmp by default. You can un-zip the file and poke around the data files captured to learn more about your system. Typically, the zipped tar-ball is returned to performance analysts at IBM who can help with problem determination and ideas.

In 95% of our interactions with product teams and customers, there is generally something easy to address first. There are naturally many other in-depth tools we might leverage in subsequent data runs, but first we want to be sure everyone is “on the same page.”

Here’s more on testing the script and processing the results:

“This checks for errors and attempts to run all of the default profilers. A typical error will be that the profiler being run is not installed. Obviously, in that case the profiler should be installed, or if not available, you can override the profiler list to skip that tool (but keep in mind that the data gathered may not be as useful).

You do need a number of rpm packages installed on your system.

  • sysstat
  • profile (this rpm is on the SDK image for SLES)

The script does assume the Linux kernel with symbol information is available. This depends on your distro and version since they are generally packaged differently. The script does parse and check all of the common correct places to find vmlinux (the unstripped kernel). On RHEL 6.2, you will need the kernel-debuginfo*.rpm packages installed.

This script is not targeted or focused on Java applications, but it does serve well as the first pass data gathering tool.

Typically, the workload being tested reaches a fairly steady state (has settled down) and performance data can be collected from the system.

In this case, you can run the script for the default 2 minutes, and the script will profile the information and gather everything together.

Along with the script, we have the ability to format the results into a series of html pages and charts.

Previous releases of LPCPU have required an x86 system for producing charts, however the latest release removes this requirement, unless you would prefer to force the old behavior to be used (see the README for details if you would like to force the old behavior).
Take the lpcpu.tar.bz2 file, and unpack it.

    # cd /tmp
    # tar -jxf lpcpu.tar.bz2

Copy the data tarball to the system you would like to host your data on (this could be the test system or a workstation). Unpack the tarball and issues the following commands:

    # pwd
    /var/www/html
    # tar -jxf lpcpu_data.mysystem.default.2012-04-04_1147.tar.bz2
    # cd lpcpu_data.mysystem.default.2012-04-04_1147/
    # ./postprocess.sh   /tmp/lpcpu/
    <lots of messages>
    # ls -l summary.html

View that summary.html file in a browser. Depending on your browser, you may need a web server to make use of the new charting abilities. A Python script is included for running a simple web server should you not have one available. If you cannot run the Python script and do not have a web server available, please fall back on the old charting method (see the README for details).”

If you’ve tried out the tool, I’d like to hear from you. What other things would you like to see it do in your environment?

Removing a Static Route from the ODM

Edit: Still good information.

Originally posted September 27, 2016 on AIXchange

I was recently asked how to remove a static route from the AIX Object Data Manager (ODM), so I pointed my customer to this techdoc.

Although this information is pretty basic, many times when we revisit the basics we’re reminded of something we already knew. And sometimes, we even learn new things.

Some information from the techdoc follows. However, I encourage you to open the above link, which also has images and output:

“Question
How do I remove a static route from the ODM?

Answer
When you are trying to remove a static route it’s best to use smitty or a chdev command to get rid of it. When you use the route delete command that just removes it from the running Kernel. Here is how we can remove the static route.

First Option: Smitty
Step 1: Run netstat -rn.
Step 2: Verify the route you want to remove. Also look at the ODM so you can see later that it was removed from there to. To verify from the odm run lsattr -El inet0. In this example we will remove the route circled in red. Notice on the flags column and you will see it has a flag of H, meaning it is a Host route.

Here is the odm output and circled in red is the same route from the netstat -rn output. It also shows you that it is a Host route we are going to remove. It looks similar to the route above it, but one is a network route and the other is a host specific route.

Step 3: Type smitty route.
Step 4: Select remove a static route.
Step 5: Enter the information for destination and gateway exactly how you see it in the routing table.

For Destination Type we can hit F4 and it will give us two options: net and host. In our case we will select host since we are removing a host specific route.

Under Destination Address we will enter what is in the Destination column of the netstat -rn.

The Gateway value will be what’s in the Gateway column of the netstat -rn.

Hit enter when done.

Step 6: Verify that it was gone with the lsattr command. lsattr -El inet0.

Notice we don’t see the following value for route any longer:

    host,-interface,,,,,,153.6.24.0,153.6.24.56

Second Option: Command line using chdev command.
Step 1: Verify the route we want to remove in the netstat -rn output.
Step 2: Verify which route is the offending route in the lsattr -El inet0 output.
Step 3: Run the following command:

    chdev -l inet0 -a delroute=”net,-interface,,,,,,153.6.24.0,153.6.24.56″

Step 4: Verify that the route is gone in the ODM.”

For further reading on routes and networking with AIX, check out these articles (here and here) as well.

More on the HMC and root

Edit: Still a good discussion.

Originally posted September 20, 2016 on AIXchange

Did you know about the AIX forums that are hosted at unix.com? It had been awhile since I checked them, but when I did recently, I found an open letter that was written to me a few weeks after I wrote about whether IBM should allow root access to the HMC.

It was a happy discovery and an interesting read, so I’ll share a summarized version of it here. I do agree with many of the points that were brought up in the letter and the discussion that followed.

From the first post:

“So, do I want root on the HMC, as McNelly finally asks? No, for the most time a decent user account with a normal, not-restricted shell would suffice. But to manage this account — in the same responsible way I manage the rest of my 350 LPARs — I’d like to become root now and then to do whatever administrators do. Of course I know how to jailbreak the HMC (like perhaps every halfways capable admin does), but why do I need to “break into” a system I have set up, a system I run and for which I (well, actually my company) have paid good money?

…we are not talking about some mobile phone for $69.99. We are talking about the two HMCs I use to manage one and a half dozen p780s and p880s, about 2 million dollars apiece. Do you think it is necessary to squeeze out some minimal additional benefit by pestering me with a restricted shell for my daily work? And if you really think I couldn’t handle the responsibility for such a vital system: don’t you think I should be removed from the position where I manage the LPARs running the corporate SAP systems too?”

Another commenter replied to the thread and argued that a user with unfettered access could blow up the HMC. Then this came up, concerning education:

“Second: this is digging into a much larger area so I’ll try to keep it short. The reason that so few capable admins for AIX are there is because IBM did (and, IMHO, still does) a very bad job at educating them. If I am a Linux admin and want to hone my skills I get myself a PC for $300 and start hacking. I will perhaps make it go FUBAR a few times but all this will teach me valuable lessons and I will be all the more capable once I work on really productive systems professionally. If I am an AIX admin I do… what? Buy myself a system for ~ $20k only to find out I can’t even create an LPAR because I need to shell out another $50k in various licenses for one thing or the other? This might be OK for a bank, but is beyond my financial reach.”

I’ve written plenty about education over the years (for starters, herehere and here), and I do believe this continues to be a problem.

This is from another commenter:

“I can agree with most of what has been said above, I can understand IBM wanting to lock the HMC appliance down as much as possible and I understand the sysadmin desire to have full control of any machine on the network as Bakunin says – if there’s not a competency issue. In truth, my main reason for coming down on the restricted side of this argument is exactly that – competency! I have a number of systems that have been up and running for longer than many of my support contacts have been systems admins, I don’t actually have privileged access to many of the systems – I have elevated access or “root” access on none of the systems. Should I need root access, it has to be requested, approved and I am issued with a one-time password.

I find it to be a total pain, but that is the implemented system. On investigation the reason for the system being implemented was, you guessed it, competency! Cited examples, well I could give you any number. But an example that I think sums it up quite well is one that was easy to recover from, but could have been catastrophic had it been a customer facing system with say five or six thousand users. Instead of a development system, with just a couple of hundred developers. Where the “root” user executed a recursive delete command with a space in it, from the root directory and effectively deleted the full contents of the server – mostly source code and development tools.

I have worked in the *NIX world since 1981, over that time I have watched the skill level of the sysadmin degrade, a lot of it revolves around training – my first “Sysadmin I” course was five weeks long and I never actually saw a machine. It was all spent sitting at a Wyse 30 terminal, with a number of other trainees. Now I see sysadmins working for major vendors, with no training whatsoever.”

The final post on the thread covers at length an issue caused by being locked out of the HMC. Here’s the conclusion:

“Yes, it was my fault not to have the idea with the /var FS earlier. I was tricked by both HMCs losing connection at about the same time and investigated in the completely wrong direction. On the other hand, this is not a UNIX system, it is an appliance. Why am I supposed to act as am admin checking for filesystems when I was first denied all the tools admins have?

Second, my life was made so much easier by being forced to rely on tricks like pulling MAC addresses out of the routers logs instead of simply issuing ifconfig. Find out how long a system is up: uptime. Find out how long a HMC is up: impossible. Check how many packets are being sent/received on a UNIX system: entstat or netstat. Find out the same on a HMC: impossible. This list goes on and on.

And finally: even if I had diagnosed the problem correctly it wouldn’t have helped me any. We actually tried the “official” methods of cleaning up before, but they didn’t work at all (as they usually do — I have seen them fail more often than not). Only breaking in and using normal UNIX commands did what was expected. And why did IBM not see that full FS in the 2.6GB dump they required me to upload? Do I really want to take the risk of my multi-million-dollar environment becoming completely unusable because I have a system at the center which I can neither diagnose nor administrate…”

I can certainly commiserate with the sentiment, although, had it been me, I would have engaged a duty manager and escalated the support ticket. I’d also ask for the one-time HMC password to help with the diagnosis, and maybe even request a shared-screen conversation so I knew I was getting a technician’s full attention. If you’re really stuck, you owe it to yourself to utilize the minds and resources at IBM Support. Keep making noise there until you get what you need.

Anyway, this is a great discussion, and I wouldn’t mind seeing it continued here. So what do you think? Should IBM just give us root to the HMC? Should they continue to offer the one-time password option via support? Is there another solution?

And if you haven’t signed up for the AIX forums, you should. You may not use it regularly, but it’s a great place for launching discussions and getting answers.

POWER9 Media Coverage

Edit: Have you migrated yet? Some links no longer work.

Originally posted September 13, 2016 on AIXchange

Last month’s Hot Chips conference generated quite a bit of press about the soon to be available POWER9 processors:

“Intel has the kind of control in the datacenter that only one vendor in the history of data processing has ever enjoyed. That other company is, of course, IBM, and Big Blue wants to take back some of the real estate it lost in the datacenters of the world in the past twenty years.

The POWER9 chip, unveiled at the Hot Chips conference this week, is the best chance the company has had to make some share gains against X86 processors since the POWER4 chip came out a decade and a half ago and set IBM on the path to dominance in the RISC/Unix market.

As it turns out, IBM will be delivering four different variants of the future POWER9 chip, as Brian Thompto, senior technical staff member for the Power processor design team at the company, revealed in his presentation at Hot Chips. There was only one POWER7 and one POWER7+, with variants just having different cores and caches activated. There were three POWER8 chips, one with six cores aimed at scale out workloads and with two chips sharing a single package and one single-die, twelve-core chip aimed at bigger NUMA machines; this year saw the launch of the POWER8 chip (not a POWER8+ even though IBM did call it that for some time) with twelve cores with the NVLink interconnect from Nvidia woven into it.

With the POWER9 chip, there will be the POWER9 SO (short for scale out) variant for machines aimed at servers with one or two sockets, due in the second half of 2017, and the POWER9 SU (short for scale up) that will follow in 2018 for machines with four or more sockets and, we think, largely sold by IBM itself for its customers running its own AIX and IBM i operating systems.

The four versions of the POWER9 chip differ from each other in terms of the number of cores, whether or not the systems have directly attached memory or use the “Centaur” memory buffer chips, and level of simultaneous multithreading available for specific server configurations… .

The twist on the SMT level is the new bit we did not know, and we also did not know the core counts that would be available on the POWER9 SU variants. We knew that the POWER9 SO chip would have 24 cores, and by the way, Thompto tells The Next Platform that the POWER9 SO chip is a single die chip with 24 cores. The POWER9 SU chip will top out at twelve cores, just like the biggest POWER8 chip did. Both POWER9 chips have eight DDR memory ports, each with its own controller on the die, which now can either talk directly to two DDR memory sticks on the POWER9 SO or to a Centaur buffer chip that in turn talks to four DDR memory sticks each.”

This ChannelWorld article notes that greater throughput is expected:

“Each NVLink 2.0 lane in the POWER9 chip will communicate at 25Gbps (bits per second), seven to 10 times the speed of PCI-Express 3.0, according to IBM. POWER9 will have multiple communication lanes for NVLink 2.0, and they could provide massive throughput when combined.

Recent Nvidia GPUs like the Tesla P100 are based on the company’s Pascal architecture and use NVLink 1.0. The Volta GPU architecture will succeed Pascal, also used in GPUs like the GeForce GTX 1080.

With a tremendous bandwidth improvement in over its predecessor, the NVLink 2.0 technology will be important for applications driven by GPUs, like cognitive computing.”

eetimes.com is intrigued by POWER9’s acceleration strategy:

“Across a range of benchmarks, POWER9 should deliver from 50% to more than twice the performance of the POWER8 when the new chip arrives late next year, said Brian Thompto, a lead architect for the chip. New core and chip-level designs contribute to the performance boost.

The diversity of choices could help attract OEMs. IBM has been trying to encourage others to build Power systems through its OpenPower group that now sports more than 200 members. So far, it’s gaining most interest from China where one partner is making its own Power chips.

Use of standard DDR4 DIMMs on some parts will lower barriers for OEMs by enabling commodity packaging and thus lower costs.

POWER9’s acceleration strategy is perhaps the most interesting aspect of the new chip.

It will be one of the first microprocessors to implement the 16 GTransfer/second PCI Express Gen 4 interconnect that is still awaiting approval of a final spec. Separately, it implements a new 25 Gbit/s physical interconnect called IBM BlueLink.

Both interconnects support 48 lanes and will accommodate multiple protocols. The PCIe link will also use IBM’s CAPI 2.0 to connect to FPGAs and ASICs. BlueLink will carry the next generation NVLink co-developed for Nvidia GPUs as well as a new CAPI.”

eWeek mentions the OpenPower Foundation:

“We want people to know there is an alternative to x86 chips and that alternative can bring a lot of performance with it,” Dylan Boday, IBM Power engineer, told eWEEK last week standing outside of the Moscone Center, home to IDF. “At the end of the day, most people want choice, but they also want to see advantages to that choice.”

IBM traditionally had developed Power chips to run only in its Power servers. However, the company three years ago—with such partners as Nvidia and Google—launched the OpenPower Foundation, enabling third parties to license the architecture to create their own Power-based systems. It was part of a larger effort to embrace open technologies—such as Linux, OpenStack and the Open Compute Project (OCP)—for its Power architecture.

The work is paying off, according to IBM officials. At the first OpenPower Summit last year, the group had about 130 members. That has since grown to more than 200. At the same time, there are more than 2,300 applications that run on Linux on Power, they said.”

As much as I love working with POWER8, I’m already excited for POWER9. (To be honest, I’m even looking forward to the day when I can help customers upgrade to POWER12.) The point is, the future looks bright for the POWER platform.

10G Ethernet on POWER Tips

Edit: Some links no longer work.

Originally posted September 6, 2016 on AIXchange

This great new techdoc from Steve Knudson recently went live. It includes a set of slides that cover Ethernet on POWER, along with a cheat sheet that you may find valuable as you transition to 10G adapters on POWER8 servers:

“Moving some older FCoE 10Gb adapters from POWER7, PCIe-Gen1 slots, to POWER8, PCIe-Gen3 slots, we saw SEA throughput on a single 10Gb Ethernet port move from approx 4.2Gb/sec, up to 8.95Gb/sec. LPAR to LPAR, within the POWER8 hypervisor, we saw an astonishing 45Gb/sec, AIX to AIX. See the full slide deck attached.

The cheat sheet for AIX and SEA performance:

1) Before SEA is configured, put dcbflush_local=yes on the trunked virtual adapters. If SEA is already configured, skip this.

    $ chdev -dev entX -attr dcbflush_local=yes

2) Configure SEA. largesend is on the SEA by default, put large_receive on also.

    $ chdev -dev entY -attr large_receive=yes

3) Up in AIX, before IP is configured, put dcbflush_local on virtual Ethernet adapters. If IP is already configured, skip this.

    # chdev -l ent0 -a dcbflush_local=yes (slide 55)

4) Up in AIX, put thread and mtu_bypass on the interface en0 (slide 55).

    # chdev -l en0 -a thread=on
    # chdev -l en0 -a mtu_bypass=on

5) Assure you have enough CPU in sending AIX, sending VIO, receive VIO, and receiving AIX. See slides 75-76.”

From the agenda in the slides:

    -Physical Ethernet Adapters
    -Jumbo Frames
    -Link Aggregation Configuration
    -Shared Ethernet Adapter SEA Configuration
    -VIO 2.2.3, Simplified SEA Configuration
    -SEA VLAN Tagging
    -VLAN awareness in SMS
    -10 Gb SEA, active – active
    -ha_mode=sharing, active – active
    -Dynamic VLANs on SEA
    -SEA Throughput
    -Virtual Switch – VEB versus VEPA mode
    -AIX Virtual Ethernet adapter
    -AIX IP interface
    -AIX TCP settings
    -AIX NFS settings
    -largesend, large_receive with binary ftp for network performance
    -iperf tool for network performance

    Most syntax in this presentation is VIO padmin, sometimes root smitty.

From slide 13:

    Jumbo frames is a physical setting. It is set
        -on Ethernet switch ports
        -on physical adapters
        -on the link aggregation, if used
        -on the Shared Ethernet Adapter.

-Jumbo frames is NOT set on the virtual adapter or interface in the AIX client LPAR.
-Do not change MTU on the AIX client LPAR interface. We will use mtu_bypass (largesend) in AIX.
-mtu_bypass – up to 64KB segments sent from AIX to SEA, resegmentation on the SEA for the physical network (1500 or 9000 as appropriate).

From slide 16, link aggregation configuration:

-Mode – standard if network admin explicitly configures switch ports in a channel group for our server.
-Mode – 8023ad if network admin configures LACP switch ports for our server. ad = Autodetect – if our server approaches switch with one adapter, switch sees one adapter. If our server approaches switch with a Link Aggregation, switch auto detects that. For 10Gb, we should be LACP/8023ad.
-Hash Mode – default is by IP address, good fan out for one server to many clients. But will transmit to a given IP peer on only one adapter.
-Hash Mode – src_dst_port, uses source and destination port numbers in hash. Multiple connections between two peers likely hash over different adapters. Best opportunity for multi-adapter bandwidth between two peers. Whichever mode used, we prefer hash_mode=src_dst_port
-Backup adapter – optional, standby, single adapter to same network on a different switch. Would not use this for link aggregations underneath SEA Failover configuration. Also would likely not use on a large switch, where active adapters are connected to different, isolated “halves” of a large “logical” switch.
-Address to ping – Not typically used. Aids detection for failover to backup adapter. Needs to be a reliable address, but perhaps not the default gateway. Do not use this on the Link Aggregation, if SEA will be built on top of it. Instead use netaddr attribute on SEA, and put VIO IP address on SEA interface.
-Using mode and hash_mode, AIX readily transmits on all adapters. You may find switch delivers receives on only adapter – switches must enable hash_mode setting as well.

From slide 19, Shared Ethernet Adapter (SEA) configuration:

-Some cautions with largesend
-POWER Linux does not handle largesend on SEA. It has negative performance impact on sftp and nfs in Redhat RHEL.
-A few customers have had trouble with what has been referred to as a DUP-ACK storm when packets are small, and largesend is turned off in one client. Master APAR IV12424 lists APARs for several levels of AIX.
-A potential “denial of service” attack can be waged against largesend, using a specially crafted sequence of packets. ifixes for various AIX levels are listed here.
-largesend is NOT a universal problem, and these ifixes are not believed to be widely needed.From slide 77, iperf 10 Gb, SEA:

-If you are getting less than the values on the two previous slides…
-It appears that LARGESEND is on physical 10Gb adapter interfaces automatically, but you can set it explicitly:

        $ chdev –dev en4 –attr mtu_bypass=on

-Check that largesend, large_receive are on SEA at both ends:

         $ chdev –dev ent4 –attr largesend=1 large_receive=yes

-Check that mtu_bypass (largesend) is on AIX client LPAR interfaces:

         # chdev –l en0 –a mtu_bypass=on

-Watch CPU usage in both VIOs, both Client LPARs during iperf interval and make sure no LPAR is pegged or starving.

You’ll find plenty of other helpful tips and tricks here, so take the time to read through the slides. I’m sure you’ll learn at least one new thing by learn something you didn’t already know.

Using LPM from the Command Line

Edit: If only ISVs would embrace LPM instead of punishing us for using it.

Originally posted August 30, 2016 on AIXchange

In February I wrote about disabling live partition mobility on selected partitions, and recently I received a related question from someone looking for an alternative to using the HMC GUI. Specifically, how do you turn LPM on and off from the command line on a per partition basis?

This post lays out an audit trail:

“Any change to this attribute is logged as a system event, and can be checked for auditing purposes. A system event will also be logged when the Remote Restart or Simplified Remote Restart capability is set. More specifically, a system event is logged when:

    * any of these three attributes are set during the partition creation
    * any of these three attributes are modified
    * restoring profile data.

Users can check system events using the lssvcevents CLI and /or the View Management Console Events GUI. Using HMC’s rsyslog support, these system events can also be sent to a remote server on the same network as the HMC.”

These system events can be logged:

  2420 User name {0}: Disabled partition migration for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2421 User name {0}: Enabled partition migration for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2422 User name {0}: Disabled Simplified Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2423 User name {0}: Enabled Simplified Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2424 User name {0}: Disabled Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2425 User name {0}: Enabled Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.”

In a real life example, a reader sent me the following information. (Note: The command is in the log output).

“Just FYI, there is a way to make this change from the terminal on the HMC because I can see the command in the audit logs:

02 03 2016 07:48:43 10.9.0.1 <USER:INFO> Feb  3 07:48:43 hmc01 HMC: HSCE2123 User name hscroot: chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=1 command was executed successfully.

02 03 2016 08:21:01 10.9.0.1 <USER:INFO> Feb  3 08:21:01 hmc01 HMC: HSCE2123 User name hscroot: chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=0 command was executed successfully.

In this case we were able to run:

chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=1
chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=0

to do our testing.

I was also able to deduce from those commands, that you could run this command to show the disable_migration state:

lssyscfg -m system1 -r lpar –filter lpar_ids=20 -F migration_disabled”

This information should come in handy should you find yourself wanting to make a change from the command line.

Using AIX System Accounts

Edit: Still good to know.

Originally posted August 23, 2016 on AIXchange

I recently was asked about AIX system accounts. You’ll find the answers — why they’re there, how you login to them, etc. — in this IBM Support doc. It’s an older document that covers the basics, but the information is still relevant:

“Question: What are system Special Accounts?
Answer: Traditionally, UNIX has come with a default set of system user accounts to prevent root and system from owning all system filesystems and files. As such it is never recommended to remove the account but rather set an asterisk in the /etc/security/passwd for all except root. This document describes the default set of user accounts.

root — Commonly called the superuser (UID 0), this is the account that system administrators log into to perform system maintenance and problem determination.

daemon — A user used to execute system server processes. This user only exists to own these processes (and the associated files) and to guarantee that they execute with appropriate file access permissions.

bin — A second system account used primarily to break up owners of important system directories and files from being solely owned by root and system. This account typically owns the executable files for most user commands.

sys — sys user owns the default mounting point for the Distributed File Service (DFS) cache which is necessary before installation and configuration of DFS on a client. /usr/sys directory can also be used to put install images.

adm — The adm user in the /etc/passwd is basically responsible for two system functions:

    * ownership of diagnostic tools, as evidenced by the directory /usr/sbin/perf/diag_tool/
    * accounting, as evidenced by System Accounting Directories:
         /usr/sbin/acct
         /usr/lib/acct
         /var/adm
         /var/adm/acct/fiscal
         /var/adm/acct/nite
         /var/adm/acct/sum

guest — Many computer centers provide accounts for visitors to play games while they wait for an appointment, or to allow them to use a modem or network connection to contact their own computer. Typically, these accounts have names like open, guest, or play.

nobody — An account used by the Network File System (NFS) product, and to enable remote printing nobody exists when a program needs to permit temporary root access to root users. For example, before turning on Secure RPC or Secure NFS, check /etc/public key on the master NIS server to see if every user has been assigned a public key and a secret key. You can create an entry in the database for a user by becoming the superuser and entering:

    newkey -u username

You can also create an entry in the database for the special user, nobody. Users can now run the chkey program to create their own entries in the database.

uucp — UUCP is a system for transferring files and electronic mail between UNIX computers connected by telephone. When one computer dials to another computer, it must log in. Instead of logging in as root, the remote computer logs in as uucp. Electronic mail that is awaiting transmission to the remote machine is stored in directories that are readable only by the uucp user so that other users on the computer cannot read each other’s personal mail.

nuucp — The operating system provides a default nuucp login ID for transferring files. This is normally used for the uucp communication. These two ID’s, uucp and nuucp, are created when the bos.net.uucp fileset is installed. As logging in as the uucp user is not allowed, the nuucp user was created. Basically, uucp user id will not have a password entry set in /etc/security/passwd, but the nuucp user ID will have a password set. You can remove the user nuucp if you wish.

lpd, lp — Used for starting the lpd daemon which is necessary in order for the AIX Spooler to do remote printing.

invscout — Used by Inventory Scout which is a tool that checks the software and hardware configurations on the Hardware Management Console (HMC).

imnadm — IMN Search engine (used by Documentation Library Search).

snapp — Allows access to the snappd command which allows for hand-held PDA devices to be attached to a tty port on an AIX box. The PDA can then function in similar capacities to a dumb terminal.”

Here’s a more recent document from the IBM Knowledge Center:

“AIX provides a default set of system special user accounts that prevents the root and system accounts from owning all operating system files and file systems.

Attention: Use caution when removing a system special user account. You can disable a specific account by inserting an asterisk (*) at the beginning of its corresponding line of the /etc/security/passwd file. However, be careful not to disable the root user account. If you remove system special user accounts or disable the root account, the operating system will not function.”

Finally, here’s a list of accounts you may be able to remove, and here’s a link to accounts that are created by different security components on the system.

If you’re new to this area, these links should help you. And even if you already know this stuff, it never hurts to revisit the basics.

Linux on Power Resources

Edit: Some links no longer work.

Originally posted August 16, 2016 on AIXchange

I know more of you are evaluating and using Linux on Power, so I want to highlight some good resources. (Note: A ton of links follow, and I’ve noticed that some don’t seem to work properly with Internet Explorer, so try another browser if you encounter issues.)

This list of Linux on Power “general resources” from IBM developerWorks shows you how to, among other things, install Linux, get Linux evaluation copies for RedHat and SUSE, and find support options. Don’t forget you can run community supported distributions like Debian, Fedora, OpenSuse, CentOS and Ubuntu on Power as well.

IBM developerWorks also has several other good resources. This one is called the Open Source POWER Availability tool:

“The Open Source POWER Availability Tool (OSPAT) is a search engine that was designed to help you find open source packages that are available on the IBM POWER architecture. The results provide the package name and version and the Linux distribution that supports the package.”

At the Linux on Power Community wiki, there are many links to more information. You can meet the experts and check out this Linux on Power FAQ (though it’s dated, much of the information remains relevant, like this list of supported Linux distributions).

Finally, you can see which software packages have been ported to Linux on Power and determine if there are Docker containers for them:

“There are hundreds of open source packages for ppc64le available on IBM Power Systems and more are being added all the time. These pages include lists of the available packages. To help you find what you’re looking for, we’ve organized the lists by application type and for each type, we’ve listed the ported apps, the Linux distribution(s) they’re available on and where they’re maintained. And if you prefer, you can download a spreadsheet that contains the full list for each category.

Linux distributions officially supported on the IBM Power LE platform (ppc64le) are Ubuntu, Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES). Further, there are community editions like Debian, Fedora and OpenSuse as well as CentOS which are ported and can be deployed.”

At the time of this writing the last update was July 2016, so it seems pretty current. My only complaint is that I did not see a way to download a single file with a list of all the available packages; this would save users from having to jump from page to page or spreadsheet to spreadsheet. As is though, you at least get an idea of whether or not the packages you’re interested in have been ported to the distribution you hope to run them on. Just from browsing the list you can see the sizable number of Linux on Power packages. Rest assured, more are on the way.

Please let me know if a list like this is helpful to you. For that matter, let me know if you find posts that aggregate related things on the web of value.

Connecting with IBMers on Sametime

Edit: Sametime has gone away, Slack is the new tool IBMers use at the time of this writing. Some links no longer work.

Originally posted August 9, 2016 on AIXchange

I worked for IBM from 2000-2006. During that time I used Sametime extensively to communicate with coworkers worldwide. When I left the company, I wanted to continue to IM my former colleagues.

There were, and are, several ways for people outside of IBM to connect with IBMers on Sametime. One option is Pidgin, which is described as the “universal chat client.” If you choose this option, you may have to mess around with the XML file to get it to work (see here):

“Pidgin: First backup, then open and edit the following file (location is Windows 7 specific) with your favourite text editor :
C:\Users\[username]\AppData\Roaming\.purple\accounts.xml.
Now add or edit the following lines within the Sametime settings section under protocol prpl-meanwhile.
<settings>
   <setting name=’fake_client_id’ type=’bool’>1</setting>
   <setting name=’port’ type=’int’>80</setting>
   <setting name=’force_login’ type=’bool’>0</setting>
   <setting name=’server’ type=’string’>extst.ibm.com</setting>
   <setting name=’client_id_val’ type=’int’>4676</setting>
   <setting name=’client_minor’ type=’int’>8511</setting>
</settings>

Congratulations! You should now be connected to IBM’s internal Sametime server. To add contacts or buddies, first find their email address. If you don’t already know your buddies email address, you can search for it using this IBM Employee directory. When you add an internal IBM email address, prefix it with @E. For example, to add Sam you would user name “@E sam@us.ibm.com”. This tells the external Sametime Gateway to add an external contact via email. To add non-IBM users who are also using the Sametime gateway (like me) you can just add them by email address, without the @E prefix.”

Here are two other articles that offer alternative ways to connect with IBMers on Sametime. First, from wissel.net:

“IBM External Sametime Server: You need to have an IBM id, to get one register online.
Once you have it, create a (new) community in your Sametime client (see below). Thereafter lookup your IBMer to add him/her to your buddy list.
Server/Port: extst.ibm.com / 80
Advantage: You can reach any IBMer using Sametime, surprise them.
Disadvantage: Availability is not production level”

This is from IBM developerWorks:

“An ibm.com id – these are free and available from Sign up for an IBMid if you don’t already have one
A Sametime/IBM Instant Messaging compatible client installed on your computer/device. Previously a web client was available however that link is no longer working, so a “fat client” install would seem to be the way to go. You can download the latest Sametime client from Lotus Greenhouse site which will also require a (free) ID to be created. This is a different ID to the IBMid mentioned above, but just as quick and easy to get. You can use other non-IBM clients such as Adium or Pidgin but those clients will require some ‘hacking’ to allow them to connect to the IBM Instant Messaging Gateway — if you’re keen, please check out this blog post from nomaen that details that configuration. Personally, the IBM client does the job really nicely and is available for Windows, Mac, and Linux (RPM and DEB) so I’d just go that route.”

I mention this because I noticed this circulating on Twitter recently. I’m sure a lot of you know at least one IBMer, so this seems like good information to pass along.

Another good resource is IBM whois. This allows you to look up contact information for IBM employees by name.

And while I’m on the subject of instant messaging, don’t forget about IRC.

PowerVC Resources

Edit: I still regularly speed up my videos. Some links no longer work.

Originally posted August 2, 2016 on AIXchange

By now you’re familiar with PowerVC. No? Well, then this post is for you:

“IBM PowerVC Virtualization Center is an advanced virtualization and cloud management offering, built on OpenStack, that provides simplified virtualization management and cloud deployments for IBM AIX, IBM i and Linux virtual machines (VMs) running on IBM Power Systems. PowerVC is designed to improve administrator productivity and simplify the cloud management of VMs on Power Systems servers. PowerVC provides the foundation for Power Systems scalable cloud management, including integration to higher-level cloud orchestrators based on OpenStack technology.

PowerVC helps Power Systems customers lower their total cost of ownership with a simplified user experience that allows simple cloud deployment and movement of workloads and policies to maximize resource utilization. PowerVC has been built to require little or no training to accelerate cloud deployments on Power Systems. PowerVC has the capability to manage the existing infrastructure by automatically capturing information, such as existing VM definitions, storage, network and server configuration information.

PowerVC allows clients to capture and manage a library of VM images, enabling IT managers to quickly deploy a VM environment by launching a stored image of that environment, instead of having to manually recreate a particular environment. By saving virtual images and centralizing image management, IT managers and administrators can migrate and move virtual images to available systems to expedite deployment.”

This IBM Redpiece — that’s a Redbook that’s still in draft form — has a great deal of good information. It covers the latest version of PowerVC, Version 1.3.1.

Nigel Griffiths has also posted a series of PowerVC videos: Part 1Part 2Part 3, and Part 4.

This is a video by PowerVC developer Ana Santos:

There’s also this recent webinar. View the replay here. It’s part of the “IBM Power Systems Technical Webinar Series (including Power Systems Virtualization – PowerVM).” Yep, that whole thing is the name of the webinar series. Catchy, huh? Go here for the slides.

Here’s a session from back in January (slides and replay).

If you’re using the older version of PowerVC, the AIX Virtual User Group did a PowerVC demo in December 2013 (slides and replay). I expect the VUG will have an update on this in the near future.

Finally, there’s the PowerVC cheat sheet, courtesy of Nigel and the AIXpert blog. The same site has this older piece as well.

The trick to getting PowerVC installed in your environment is having a licensed copy of Redhat Linux for either Power or x86. That way, you can install packages from repositories other than those found on the installation DVD. In the future I’ll get further into installation, but I thought it would be helpful to present these resources first.

Another trick: If you find it daunting to watch these long videos, you can download Google Chrome plugins. Start by searching for “video speed controller” or “youtube playback speed control.” These allow you to speed up YouTube videos to, typically, 1.5X or 1.75X playback speeds. Even at the faster speed, you should still be able to understand what’s being said. Assuming you aren’t overly annoyed by the audio, you can save significant time digesting the information.

Lessons Learned from Camp

Edit: I am still missing camp.

Originally posted July 26, 2016 on AIXchange

As long time readers know, I work with Boy Scouts. Recently we took 19 boys to a week-long summer camp, and while I always find being around kids to be instructive, this time I realized that some of these lessons apply to techies as well as campers.

1) We take cell phone coverage for granted.
I don’t know about you, but I’m online quite a bit, to the point where things like checking email and the news are second nature. In addition, I do web searches and I send myself reminders and notes. Except, of course, when I’m out in the wilderness. At least, none of these capabilities exist where our camp is located. If I had to, I could get a signal by literally climbing a mountain. Inconvenient as it is though, I don’t mind putting down my phone for a week. Not only is it less of a distraction when I’m camping (and engaged in activities like swimming, rowing, hiking and horse-riding), I have a greater appreciation for its capabilities when I’m back home. Really, I noticed all of the adults in our camp were more engaged once they realized and accepted that checking for messages wasn’t an option.

2) The world will go on without you.
Between work and camping, I will squeeze in an occasional vacation. But if I have access to my phone, I tend to use it, which can make my vacations seem a lot like working remotely. I routinely find myself checking in, answering questions and generally being available. If camping isn’t for you, find some other way to disconnect when you’re out of the office. Set your out of office message and trust your team members to hold the fort while you go and recharge your mental batteries.

3) We all have adversity to overcome and things to learn. A little enthusiasm helps with both.
Most 12- and 13-year-olds who spend their lives in comfortable suburban surroundings struggle with homesickness, fear of swimming, fear of heights and climbing, fear of sleeping in the woods by themselves (as part of the wilderness survival merit badge), and fear of just being outdoors when a thunderstorm rolls in. But with Scouts, I get to see them overcome their fears and meet the challenges before them. It can be tough facing new technologies and techniques in our careers, especially as we get older and set in our ways. But approaching these challenges with enthusiasm does make a difference. 

4) Meetings go better with food.
When someone orders in lunch or brings treats to a work meeting, it lightens the mood and makes it easier to pay attention. It’s the same with kids. Provide the treats and you’ll get their attention — at least for a few moments.

5) We all need plenty of rest.
Kids need their rest, but one thing we learn in Scouts is you have to get your charges good and tired before bedtime so they’re too exhausted to run around and play pranks all night. Get them up early, keep them up late and make sure they’re active throughout the day. At camp we started at 5 a.m. with the polar bear swim (the staff even throws ice in the pool to really intimidate the campers). Then there were merit badge classes. After that, we had some free time, followed by dinner and campfires. By the time their heads hit the pillows they were out until sunup. The grown-up version of this is get as much accomplished as you can during the day, but when it’s time to rest, rest.

6) Being in shape will help you keep up
Whatever you want to do professionally or personally, you’ll have more energy and get more enjoyment from what you’re doing if you’re exercising and eating right.

As exhausting as a week of herding cats — I mean… kids — is, Scouts camp was a fantastic experience. I’m already looking forward to next year.

Lots of Potential Bookmarks

Edit: Some links no longer work.

Originally posted July 19, 2016 on AIXchange

The January 2016 AIX Virtual User Group meeting featured a presentation that you should check out. It’s from Steve Pittman (download the PDF; watch the video).

One of the things Steve talks about is this web page. It contains links to tons of information covering a wide variety of topics. Seriously. Tons.

There are best practices and scripts. You can learn how to download ISO images from IBM, open PMRs and set up VNC.

In his presentation, Steve says, “most of the how-tos are written for AIX V5.3, but are applicable to AIX V6.1 and V7.1, since there are not many differences between AIX V5.3, V6.1, and V7.1”

In all, I count close to 40 links. Here are just a few of the topics:

  • How to download ISO images of installation CDs for software which is entitled on a Power System server.
  • How to open and view AIX software trouble tickets (Problem Management Records/PMRs) on the Internet.
  • How to initiate a stand-alone dump if an AIX LPAR is hung.
  • How to retrieve a history of diagnostic codes which have been displayed by an LPAR.
  • How to use AIX V5.3 filemon to determine where I/O requests originate.
  • How to use AIX V5.3 fileplace to determine the location on disk of a given file block.
  • How to install, configure, and use SSH on AIX V5.3.
  • How to install, configure, and use VNC on AIX V5.3.
  • How to configure AIX V5.3 as an NTP client.
  • How to configure AIX V5.3 to send mail to users on other hosts.
  • How to monitor for hardware errors on AIX V5.3.
  • How to monitor for issues with dump space on AIX V5.3.
  • How to monitor paging space utilization on AIX V5.3.
  • How to change the order in which AIX V5.3 mounts filesystems.

If that isn’t enough for you, this page — introduced as “a collaborative site for AIX technical information” — has still more links. Although some of this material may be a bit dated, overall it is great information.

I’ll just list the headings for these. Most come with 3-10 different links:

  • Hot new topics and popular wiki pages
  • Getting Started
  • Maintenance
  • Performance
  • Virtualization
  • Security
  • Enterprise Edition and Management Edition for AIX
  • POWER7 and AIX 7
  • POWER6 and AIX 6 Redbooks for AIX and Virtualization
  • Best Practices
  • SAP and Power Systems
  • Code Development
  • Cloud-based Benchmarks
  • Communities and Social Networking

Were you aware of this information? Take some time with both pages, and you might find a number of things worth bookmarking.

A Pictorial Guide to vSCSI Disk Settings

Edit: Still a good Redbook.

Originally posted July 12, 2016 on AIXchange

One of the challenges of configuring virtual disks with the VIO server is knowing which settings must be changed during setup. I recently had something brought to my attention that should help clarify things.

It’s from the IBM Redbook, “PowerVM Virtualization Introduction and Configuration.” Go to page 498, and you’ll find a diagram listing the different settings that need to be changed at each layer of the virtual environment. The pages that follow offer good explanations of what each setting means and why you’d want to change it. These settings are specifically for vSCSI disks.

Here are the proper settings in the client at the hdisk level:

    algorithm=failover
    reserve_policy=no_reserve
    hcheck_mode=nonactive
    hceck_interval=60
    queue_depth=xxx

Use these settings at the vSCSI client level:

    vscsi_path_to=30
    vscsi_err_recov=fast_fail

Use these settings at the hdisk level on the VIO servers:

    algorithm=load_balance
    reserve_policy=no_reserve
    hcheck_mode=nonactive
    hceck_interval=60

Use these settings for your fscsi devices on the VIO server:

    dyntrk=yes
    fc_err_recov=fast_fail

This Redbook clears up a number of other topics as well. There’s an I/O virtualization overview, a planning section and an implementation section with examples. Processor and memory virtualization is covered in a similar manner. In addition, the authors hit on recent PowerVM enhancements, capacity on demand and the System Planning Tool.

To be sure, it’s a lengthy document, but I’m willing to bet you will learn something — likely, many things — if you take the time to read through it.

Note: On a personal note, nine years ago this week — July 16, 2007 — was the date that AIXchange debuted.

For nine years, I’ve been writing these articles, one week at a time. When I scroll through posts from the fall of 2007, I can see the same themes pop up that still hold my interest today: education and tech conferences, virtualization, the HMC. I even wrote about an early demonstration of Live Partition Mobility.

Over time, I can see my writing “voice” evolve. Numerous times I wonder about you, the reader. The web stats say you’re out there. I also know you’re out there because occasionally, I’ll do a web search and one of my posts that I’d long forgotten about will pop up as the answer to my query. I admit, I feel a sense of accomplishment from this sort of thing.

Still, it would be nice to be able to get a better feel for who you are. How did you find this blog? Which topics most interest you? Why do you keep reading? If you once read this blog but do no longer, why did you stop?

I’m often asked how I find things to write about, but honestly, it isn’t that difficult. There’s AIX and Linux and IBM i, servers and storage and virtualization, Redbooks and other documentation, and commands and scripts (did I mention I love scripts?). Plus I talk to customers, attend workshops and conferences and follow people on Twitter. There are tons of things to write about.

Of course the technology is ever-evolving, but the basics don’t change. We have the best hardware and the best operating systems. We need to virtualize, we need change control, we need to find ways to keep up to date with the technology around us. Hopefully the links and articles I share help you keep up to speed.

I plan to make a bigger deal of the 10-year anniversary in 2017, but for now, let me just say thank you for reading, one week at a time.

Power Systems from a Competitor’s View

Edit: Why wouldn’t you run POWER?

Originally posted July 5, 2016 on AIXchange

I’m always interested in stories about customers that choose to migrate from x86 to POWER8 systems. When I hear about 2X performance compared to x86 when running workloads on POWER, I wonder how anyone could consider anything else. Throw in the AIX, IBM i, and Linux on Power operating systems, and to me there’s utterly no reason to run another operating system on other hardware.

Of course, IBM’s competition will make their own cases. Via Twitter, I found this document on migrating from Power Systems to HPE Open Systems, and I’m legitimately curious to hear your own responses to these arguments:

“Hewlett Packard has several decades of experience in migrating mission-critical applications from IBM Power Systems to HP (and now HP Enterprise) open systems. HPE has demonstrated that the majority of such migrations result in a significantly less expensive operating environment – often by a factor exceeding 50 percent. At the same time, the new HPE open environments match or exceed the performance and availability attributes of the original Power Systems.”

First they talk about fewer ISVs supporting POWER. Then they discuss costs.

“Of special importance is the cost of the Oracle database management system. Many of the applications being migrated use an Oracle database. Oracle charges twice as much per CPU for Power Systems than it does for x86 platforms. Furthermore, Oracle RAC (Real Application Cluster) costs $11,500 on an x86 and $23,000 per core on an IBM Power System.”

Why does Oracle charge twice as much per CPU? Because POWER can do twice as much work, so you need half as many cores to run your workload.

It might be worth reading through these slides and thinking about why infrastructure matters and why you might consider POWER systems. Some of these same concepts were covered in a recent AIX Virtual User Group session (video here).

POWER8 has 4X threads per core, 4X the memory bandwidth, 6X cache per core, and runs at higher clock frequencies. Performance per core has grown with each POWER generation, which means you need fewer cores to do the same amount of work, and you can consolidate more workload onto the same server.

Why is Google interested in POWER servers? Why are these new high performance computing contracts being won by POWER servers?

“The other reason to think that Google is serious about the OpenPower effort is that Google is a big believer in the brawny core – as opposed to the wimpy one – and the POWER8 chip has more threads, more memory bandwidth, and will have several high speed interfaces, including IBM’s CAPI and Nvidia’s NVLink, to attach adjunct memory and processing devices directly into the Power chip memory subsystem and processing complex.

“There are only two brawny cores out there: Xeon and Power,” MacKean explained. “Power is directly suitable when you are looking at warehouse-scale or hyperscale computing from that perspective. I actually think that the industry is going to be moving towards more purpose-built computing, and I think that is different users are going to be able to leverage the advanced I/O that IBM is opening up through OpenPower. They are going to be able to go with purpose-built platforms that suit their workloads. I think this is a big part of this. We just heard about the CAPI 2.0 interface having twice the bandwidth and we are actually excited about how that will play out at the system level. It is open, and we are seeing a lot of people innovating in a lot of directions.”

Google gets it. When you’re running critical workloads, you’re not looking for ways to cut corners.

Where do you see yourself going in the future?

Working with Snap Files

Edit: Still a valuable technique.

Originally posted June 28, 2016 on AIXchange

Awhile ago Russell Adams posted an interesting message to the AIX mailing list. He wrote about working with AIX snap files and included a link to his website, which provides some background:

“I frequently work with customer systems where I need a systems inventory. This could be for troubleshooting or just to save the final state of a system for later reference.

I have worked with many consultants who have an inventory script they give customers but I have found that I prefer to use the tools native to the platform when they are available. On AIX I use IBM’s native snap command. If you’ve ever been on the phone with IBM support before, you know they barely wait to ask your name before they ask for you to upload a snap.”

The command he runs on all LPARs in the environment is snap -cfgGiknLt. As he explains:

“This gives a good overview of the system without including a system dump. Most of the time the snap files range from 5 MB to 25 MB.

Always run ‘snap -r’ to clear the snap cache in /tmp/ibmsupt before taking a new snap. This is generally safe as the only files it will remove are files snap knows that it wrote.

By renaming the snap file as follows, you can run a couple of scripts to manipulate the data:

    mv /tmp/ibmsupt/snap.pax.Z /tmp/ibmsupt/$(hostname)_$(date +%Y%m%d).snap.pax.Z”

Russell runs his scripts on his Linux machine, but he’s confident that, with a few tweaks, this could run on AIX as well. Hopefully an enterprising reader will take this on and share the results.

There are two scripts: This one uncompresses and normalizes the snaps, while this one extracts the commands.

His site has numerous examples of extracting snap files and running basic commands. Here’s a small subset:

    % ls -l snap.pax.Z
    -rw-rw-r– 1 adamsrl adamsrl 6748366 May 26 17:20 snap.pax.Z

    % ~/scripts/NormalizeSnap.sh snap.pax.Z
    ========================================
    Untarring, # of files: snap.pax.Z
    pax: ustar vol 1, 199 files, 5775360 bytes read, 0 bytes written.
    Checking general exists.
    ./snapRtvZKUtV/general
    Moving subdirs.
    Opening subsnaps.
    Fixing perms.
    Cleaning empty dirs.
    rmdir ./snapRtvZKUtV/testcase ./snapRtvZKUtV/scraid ./snapRtvZKUtV/other     ./snapRtvZKUtV/hacmp
    Collecting data.
    Cleaning dump and security.
    Renaming to final destination
    Retarring files into: ./7044-170_0110BDC8C_GILSAIX_5300-06-00-    0000_20071205_203526.snap.tar.bz2
    Number of files compressed: 194
    Successfully extracted snap to: ./7044-170_0110BDC8C_GILSAIX_5300-06-00-0000_20071205_203526.snap

He adds:

“Now I can also use standard UNIX text utilities to run aggregate reports on the data from the snaps.

Imagine checking 15 hosts for no_reserve on hdisks, or iostat set to true on sys0. This method of working with snaps can be very powerful even in an offline manner.”

I think this is where the real power of this methodology comes into play. With up-to-date snap information from your systems, you can find out quite a bit about an environment without needing to be VPNed in or logged in at all.

In his mailing list post, Russell explains, “In the spirit of cooperation I wanted to share some of the methods I use for working with AIX snap files. I won’t repeat the full article here but it documents a technique I use for offline data mining for AIX systems including ready to run scripts.”

I would hope — as you make your own changes to these methods — that you would also share your improvements.

Are you already using a method like this in your environment? Can you think of ways to enhance it?