Moving a Filesystem

Edit: Link to technote no longer works.

Originally posted September 20, 2011 on AIXchange

More than once I’ve found myself on a system where all of the filesystems were placed in rootvg rather than split out into different volume groups. By default, the mksysb backs up all of rootvg. You can set up exclude lists, but then you must remember to maintain those lists. If someone adds another filesystem to rootvg without excluding it from the mksysb, those backups can become huge. In a perfect world we’d keep our mksysb files small by putting non rootvg filesystems in application volume groups.

I had a filesystem that was mistakenly placed in rootvg on an AIX 6 machine. I wanted to move that filesystem into datavg. It was a very simple procedure. When searching for help, I found this IBM
technote.

First I ran:

umount /export/myfilesystem

Then I ran:

cplv –v datavg fslv01

This returned with:

cplv: Logical volume fslv01 successfully copied to fslv03

The document told me to run logform, but when I did I got:

logform /dev/fslv03
logform: 0507-503 file system /dev/fslv03 does not exist. Change
logical volume type to be jfs2log for an outlinelog

I double checked in /etc/filesystems, and sure enough I had an inline log set up. So I decided to gamble and just run the chfs command as outlined in the document:

chfs -a dev=/dev/fslv03 -a log=INLINE /export/myfilesystem

Then I ran my fsck as instructed:

fsck -p /dev/fslv03

Then I mounted my filesystem:

mount /export/myfilesystem

Like magic, my filesystem moved and it was all pretty painless.

I realize that this method requires unmounting the fileystem that’s being moved, so this maintenance may need to occur during off-hours. Still, it’s nice to know that the option exists to move a filesystem to another volume group should the need arise.

Now I have a clean rootvg and my application data is in datavg.

Higher Availability for VIO Clients: An Alternative

Edit: Some links no longer work.

Originally posted September 13, 2011 on AIXchange

As I’ve noted, VIO server configuration can be tricky. But while I was sitting in on Steve Knudson’s NIM presentation, he shared a unique solution for providing higher availability for VIO clients.

In VIO server environments, automatic failover is set up with shared Ethernet adapters on VIO servers. Though an effective solution, if the control channel isn’t properly configured, problems can result. Another drawback to this method is that, with ever-increasing adapter speeds, it feels wasteful to have one or more 10-GB network adapters just sitting idle until a VIOS fails.

Steve’s recommendation for better utilizing network adapters is actually spelled out in this document, “Using Virtual Switches in PowerVM to Drive Maximum Value of 10Gb Ethernet.

The authors, Glenn E. Miller and Kris Speetjens, recommend an alternative to automatic failover. They suggest enabling both VIO servers to be active at the same time, and using network interface backup (NIB) at the VIO client level. This way the administrator can manually choose which LPAR uses which VIO server, and load balance that way. In the process, we end up using all the network adapters that we paid for, which is a good thing.

From the document:

“Something that we haven’t pointed out thus far in the discussion is the fact that redundancy does have its drawbacks. The backup adapter is fundamentally unused unless a failure occurs. In the example depicted in Figure 2, there are three physical adapters and their corresponding Ethernet switch ports that are never used except when a failure condition occurs. These ports have associated costs. Within the more common 1-GB environment, it’s not too drastic. However, in the 10-GB environment it’s vastly different. One customer estimated that it cost them $16,000 for each 10 Gb/s connection provided in their data center, taking into account the cost of the Ethernet adapter, cabling and the proportionate cost of the chassis, blade and port of the Ethernet switch. Obviously, 10 GB connectivity is going to be a necessity in the near future as customers continue to consolidate more and more workloads onto smaller, much more powerful systems. However, it it may be difficult to justify 40 GB worth of bandwidth when only 10 GB will be utilized.

“A significant benefit to this design is that both VIO servers can be active at the same time. Of course, each individual client LPAR is only using one, but half of the clients could be configured to use VIO Server 1 and the other half to use VIO Server 2 as their primary paths. Each client would failover to its respective secondary path in the case that its primary path was lost. So the customer’s investment in hardware is more effectively utilized.

“Protection against this scenario is accomplished by configuring two VIO servers on each Power Systems frame and assigning resources to the VIO clients from both VIO servers. The ‘classic’ design that allows use of VLAN tagging (Figure 2) uses a control channel to allow the VIO servers to detect a failure and handle Ethernet traffic accordingly. The vSwitch design handles this at the client level by pinging external resources and failing over the Client NIB Etherchannel when a threshold of failed pings is reached.

“The classic design’s advantages are that it requires no configuration at the VIO client level and all clients can be migrated from one VIO server to another with the execution of one command on the VIO server during system maintenance. The disadvantages of the classic design is that only one VIO server is carrying Ethernet traffic at any time which means a systems is only utilizing 50 percent of its available bandwidth at any time. It also means that there is no way to test if the failover link is correct without failing over every VIO client on a frame. The vSwitch design’s advantages are that it allows both VIO servers to carry Ethernet traffic at the same time. This means that administrators are given more granular control over moving Ethernet traffic from one VIO server to another as well as utilizing a higher percentage of bandwidth during normal operations. The disadvantage of the vSwitch design is that it requires every VIO client (which uses Network Interface Backup to verify path integrity) to ping an address outside of the frame to test for failures.”

The document details the pros and cons of this option, as well as explaining how to set it up. It’s well worth reading in its entirety.

So do you see any reasons not to implement this in your environment?

Steve Knudson on NIM

Edit: Some links no longer work.

Originally posted September 6, 2011 on AIXchange

At the recent tech briefing I attended, IBMer Steve Knudson had a great session called “NIM Master Tuning and NIM Master Group Migrations.” (He also covers some of this material in this techdoc.) One thing Steve explained in the session was how to get better NIM server performance when you have several clients enabled for installation. I know I’ll put this information to use when I’m building out new servers; I often end up deploying dozens of LPARs at once across multiple frames.

He pointed out that with all of this activity happening at the same time, you might experience slow processing when enabling the next NIM client install or resetting a NIM client due to the extensive rereading of /etc/exports. In addition, the NIM master will end up un-exporting and re-exporting NIM resources for different sets of NIM clients.

From the techdoc:

“Consider setting global_export=yes. If you perform frequent simultaneous installs, when one install completes, the default behavior of the master is to unexport NFS exports, remove the completed client from the export lists and re-export the filesystems. During this interval, other ‘in-flight’ client installs may see the message, ‘NFS server not responding, still trying’ on the client console.”

As an alternative to the traditional way of exporting NIM resources to each client, you can export NIM resources as read-only for all enabled NIM clients. The NIM master will keep them set to read-only until the last client install completes. With no clients enabled, and no reservations held for any resource, you can run:

nim –o change –a global_export=yes master

If you run a showmount –e before and after making the change, you can see the difference. Before the change the resources were exported to particular clients, while after they’re exported read-only for all users. This is from Steve’s document:

    showmount -e
    export list for bmark29:
    /export/mksysb/image_53ML3 sq07.dfw.ibm.com,sq08.dfw.ibm.com
    /export/53/lppsource_53ML3 sq07.dfw.ibm.com,sq08.dfw.ibm.com
    /export/53/spot_53ML2/usr sq07.dfw.ibm.com,sq08.dfw.ibm.com

    With global_export, exports are read-only for everyone:

    # exportfs
    /export/mksysb/image_53ML3 -ro,anon=0
    /export/53/lppsource_53ML3 -ro,anon=0
    /export/53/spot_53ML3/usr -ro,anon=0

Steve does note the potential security issue here, but unless you’re worried about users getting access to data in your mksysb file or some such thing, I don’t see it as a big deal if others view your AIX install content.

Steve also points out that you can change the max_nimesis_threads attribute from the default of 20 to support a high number of simultaneous installs (16 or more). For example:

nim –c change –a max_nimesis_threads=60 master

Finally, Steve says that while the networking defaults should be fine on a default AIX install, this can verified by running:

ifconfig –a

When doing so, look for tcp_sendspace, tcp_recvspace, and rfc1323.

Also check that use_isno is set with:

no –a | grep isno
use_isno = 1

As this is a restricted setting in AIX 6.1, the –F flag must be used:

no –F –a | grep use_isno

Steve is a go-to authority on NIM. People still rely upon his NIM basics and advanced slides (from presentations in 2007) to set up their master servers and get in-depth details. It was great listening to Steve’s NIM expertise in person.

Important HMC Fix

Edit: Hopefully none of you are still running this version. Some links no longer work.

Originally posted August 30, 2011 on AIXchange

This information has been circulating for awhile, and Anthony English covers the topic here and here. But I want to make sure HMC users are aware of this important update and the need to make sure you have the fix loaded if you’re at V7R7.3.0.

A problem is known to exist when using dual HMCs in one of two environments: either one HMC is at a different level than the other, or both HMCs are at the base HMC V7R7.3.0 level without fixes.

The problem is possible exposure to corruption that could cause you to lose partition profiles.

A fix is available and should be installed immediately on any HMC that might possibly be impacted by this problem.

If you’re using an HMC and an SDMC, be sure to get the fix for the SDMC as well.

From the IBM technical bulletin:

“This PTF was released July 18, 2011, to correct an issue that may result in partition configuration and partition activation profiles becoming unusable. This is more likely to occur on HMCs that are managing multiple systems. A symptom of this problem is the system may display Recovery and some or all profiles for partitions will disappear. If you are already running HMC V7R7.3.x, IBM strongly recommends installing PTF MH01263 to avoid this issue. If you are planning to upgrade your HMC to the V7R7.3.x code level, IBM strongly recommends that you install PTF MH01263 during the same maintenance window to avoid this issue.”

The efix can be found here. This package includes these fixes:

  • Fixed a problem where managed systems lose profiles and profiles get corrupted resulting in Recovery state which prevent the ability to do DLPAR/LPM.
  • Fixed a security vulnerability with the HMC help content.

As noted, this is the statement IBM released in July, before the fix became available. The fix–MH1263 PTF–is now out, so be sure to install it.

Again, from IBM:

“Abstract: HMC / SDMC Save Corruption Exposure
Systems Affected: All 7042s
Communicable to Clients: Yes

“Description:
IBM has learned that HMCs running V7R7.3.0 or SDMC running V6R7.3.0 could potentially be exposed to save area corruption (where partition profile data is stored).

“Symptoms include loss of profiles and/or recovery state due to a checksum failure against the profiles in the save area. In addition, shared processor pools names can be affected (processor pool number and configuration are not lost), system profiles lost, virtual ethernet MAC address base may change causing next partition activation to fail or to have different virtual Ethernet MAC addresses, loss of a default profile for all or some of the partitions.

“Partitions will continue to run, but reactivation via profile will fail if the profile is missing or corrupted. All mobility operations and some DLPAR operations will fail if a partition has missing or corrupted profiles.

“Environments using HMCs or SDMCs to control multiple managed systems have the greatest exposure. Triggers for exposure include any of the following operations performed in parallel to any managed system: Live Partition Mobility (LPM), Dynamic LPAR (DLPAR), profile changes, partition activation, rebuild of the managed system, rebooting with multiple servers attached, disconnecting or reconnecting a server, hibernate or resume, or establishing a new RMC connection.

“Recommended Service Actions:
Prevention/Workaround:
There is no real work-around other than limiting the configurations to a single HMC managing a single managed system.

“Customers who have not yet upgraded or installed HMC 7.7.3 should delay the upgrade/install if at all possible until a fix is available.

“Customers who have not yet installed and deployed SDMC 6.7.3.0 should avoid discovering     production servers until a fix is available.

“Customers that have 7.7.3 or SDMC 6.7.3.0 deployed should:

  • Immediately do a profile backup operation for all managed servers:

    bkprofdata -m -f

  •  Minimize the risk of encountering the problem by using only a single HMC or SDMC to  manage a single server via the following options:
  1. Power off dual HMC/SDMC or remove the connection from any dual HMC/SDMC.
  2. Use one HMC per server (remove/add connections as needed if necessary).
  3. A single HMC/SDMC managing multiple servers might be done relatively safely if the operations listed under triggers above are NOT done to two different servers concurrently.

“Recovery:
 NOTE: Recovery will be easiest with a valid backup of the profile data. So it is extremely important to backup profile data prior to an HMC upgrade or after any configuration changes to the save area. If a    profile data backup exists this problem can be rectified by restoring using:

    rstprofdata -m -l 3 -f

“In addition to user backups, profile backups can be extracted from the previous save upgrade data (DVD or disk); a backup console data (if available); or pedbg.

“If a good backup does not exist, call your HMC/SDMC support to determine if recovery is possible.

 “Fix:
A fix to prevent this from occurring is due out by the end of July (Editor’s note: We realize this is now available but wanted to include the verbiage for completeness), but the PTF will not fix an already corrupted save area. A follow-up notification will be sent as soon as it is available.”

Please heed the warnings and load this fix as soon as possible if you’re running V7R7.3.0. And don’t run any HMCs at V7R7.3.0 while running others at a lower level.

Thoughts on IBM’s New Support Model

Edit: Any thoughts on changes to your IBM support experience?

Originally posted August 23, 2011 on AIXchange

I assume many of you saw this e-mail recently, but if not, I’ll share it here:

Dear Valued Customer,
   
We wanted to let you know about an upcoming change to our service delivery model. We know that you have come to rely on us for a high-quality remote technical support experience with access to a skilled technical representative, and we can assure you that this model change will not detract from that experience.

Effective Sept. 1, 2011, requests for remote technical support for AIX and Storage software products, entitled to base support, will receive a callback from a technical support representative in lieu of a live call transfer. This change will ensure a specialist with the required skills is assigned to your problem and create a more consistent remote software support experience across all IBM software products.

Clients with enhanced support (formerly called premium support) will continue to have live call transfer by using an assigned DAC (Direct Access Code). If your business requires a live call transfer solution, please consider one of our enhanced support offerings which includes a responsiveness component in addition to many pro-active elements to help promote IT stability.

All IBM clients with technical support contracts can open support requests electronically via IBM’s web based Service Request (SR) tool. This option allows you to provide very detailed information about your issue and environment. Electronic service requests are handled with the same priority as one submitted by phone. Regardless of your call entry choice, the service request will be routed to the appropriate technical support team and they will respond with either a callback or an electronic response.

If you haven’t visited the Support area of our website recently, we invite you to take a fresh look.

http://www.ibm.com/support/entry/portal/Overview

The IBM Support Portal offers increased access to information and solutions that will help to manage your IT environment. You can now customize your IBM Support Portal to meet your specific product information needs and ensure the resources you require are always at your fingertips.

All of us at the IBM AIX and Storage support organization look forward to assisting you with your software technical support needs, and we thank you for doing business with IBM. Please contact your service representative if you have any questions.

Here’s what I was told about how this change is designed to benefit customers:

  • “IBM will be able to better align experts with client needs and more effectively solve customer problems without having to transfer clients to another resource for support. This will improve IBM’s ability to be more responsive on high priority issues.”

Hopefully this will mean that the first person that you talk to will be able to solve the issue, instead of needing to be transferred to different people. However, it will also be important when we open calls to give IBM good information so that the right person can call us back.

  • “Maintains high client satisfaction as demonstrated in the SWMA pilot program.”
  • “No change in Service Level Objectives, and we will continue to meet and exceed our 2-hour response time objective on problem submission.”

Keep in mind, this does NOT mean that it will take two hours to get a response, just that we should expect to have heard something back from IBM within two hours in a worst-case scenario.

  • “No change in the world-class service we deliver to our clients.”
  • “A large percentage of clients open support calls electronically (vs. via phone) and are, as a result, already accustomed to callback mode.”

I know it can be convenient to open a call online and have someone call you back, but one scenario that I run into will take some getting used to. For many customers, getting a call directly into their data center is a challenge. In many cases data center phones aren’t configured to take incoming calls originating from outside of the company. And even when they are, many times IT staff don’t know the external number of this phone line. Hopefully your data center offers acceptable cell phone reception and you have a workable callback number to give IBM.

One final point: According to the letter, if we have a severity one (SEV1) problem and a system is down, we should be able to reach a duty manager and see about getting a live transfer instead of waiting for a call back.

So how do you feel about this change? Please register your thoughts in Comments, and let me know what you see as things move forward.

SSD: What’s Holding You Back?

Edit: I cannot remember the last time I had to tolerate spinning rust on a laptop. Cost has come down a ton since I first wrote this.

Originally posted August 16, 2011 on AIXchange

I’ve written about the benefits of solid-state drives (SSD). Perhaps that’s why someone sent me this 3-minute video. The speaker, whose name is Arthur Bergman, gives a rather impassioned — and let’s just say, earthy — endorsement of SSD over spinning disk (or as he calls it, “spinning rust”).

Seriously, beware the swear. This may not be suitable for listening in your workplace. Watch live streaming video from oreillyconfs at livestream.com

Some points I transcribed from the video:

  • “Everyone who doesn’t have a SSD in their machine is wasting their life.”
  • He looked at the audience, and started comparing the boot times of their machines. On his laptop he could boot in 12 seconds, versus a minute or more with a traditional hard disk. When he multiplied that time savings across everyone that was listening to his talk he said, “Every time we boot our computer we are wasting a day of time.”
  • He asked if anyone in the room had their production machines running on SSD, and did not find anyone that was, except for one guy whose environment he had built.
  • “Go buy a SSD and put it in your laptop. I keep telling everyone to get a SSD, and I keep getting back that they are too expensive. Actually they are cheaper than drives. Relevant metric is GB per IOPS.”
  • He explained that on a SSD fileserver, they were running and fsck across 8 million files in 9 minutes, and it was taking 12 minutes for an rsync backup. He was seeing 4 GB/second random reads with average latency of 0.1ms and 2.2 GB/sec random read with average latency of 0.1ms.
  • “If you don’t access your data, don’t get SSD.”
  • He also argues how you will save power with SSDs, in his case he is showing that they use 1 watt vs. 15 watts with traditional disks.
  • “1 SSD is like 44,000 IOPS, one disk drive is like 180 IOPS.”
  • Start small: “You can’t drive a Formula One car, and you are currently on a bicycle, so just get a Ferrari. $1,000 for 600GB.”

As Bergman acknowledges, price keeps a lot of folks away from SSD. But as he also points out, you can start small. On that front, IBM has come out with a SSD solution that’s designed for affordability. Check out this video on IBM Easy Tier.

IBM argues that most operations are performed on a small subset of data. With this in mind, Easy Tier is designed to automatically and dynamically migrate I/O hot spots to SSD from traditional disks. Because these systems move highly active data behind the scenes with no intervention, customers benefit from SSD without having to manually migrate data. IBM claims that Easy Tier should provide 3X throughput with 10-15 percent of the data moved to SSD.

Easy Tier is available on the IBM Storwize V7000 storage subsystem as well as the SVC and the DS8000 product lines.

So what’s your view of SSD? If you don’t use it, why? Is it due to cost, data density per drive or some other factor? Please register your thoughts in Comments. I’m curious to learn what is holding you back.

Built for Speed

Edit: Now they are talking about 5G. Most of this is still applicable.

Originally posted August 9, 2011 on AIXchange

Just what are these strange devices we’re all carrying around these days? Are they phones or PDAs? Are they small computers? (They do have greater processing power available than some of the larger machines I cared for 20-some years ago.) Are they replacements for a cable modem? Does the name “smartphone” do them justice?

On my device, I can play games like Scrabble or hangman. I can shoot birds at pigs. I can take pictures and videos. I can use it as a GPS or a map, complete with turn by turn directions. I can use it as a flashlight. I can use it as an alarm clock.

I can, of course, use it to browse the Internet. I can check my e-mail. I can use it to locate nearby hotels and restaurants. I can use it to track flights and get weather information. I can use it as an ssh client. I can ssh to it. I can use it as an external drive and move data around with it.

I almost forgot–I can also use it make phone calls and texts. And I’m almost certainly forgetting several other uses.

I’m not here to argue for particular brands or wireless carriers. For one thing, it wouldn’t be practical. Readers from this blog live around the world, and not everyone has access to every model. Besides, it’s indisputable that all of these devices–iOS on the iPhone, the Android operating system, BlackBerry OS, among others–can do so many things.

Here in the U.S., marketing folks are calling faster network speeds 4G, so that’s what I’ll call them as well. I am shocked at the amazing real-world speeds that I am seeing throughout the country when I get on these 4G networks.

If you’re wondering what sort of speed you have in your hand, check out an app called Speed Test.

As I travel throughout the country installing Power servers, I find more and more locations are enabling 4G, so I get to enjoy these fast speeds while I’m on the road. Depending on my location, I’ve seen 4G speeds in the 16-29 Mbps range. That’s faster than hotel wifi; sometimes I can’t even get that speed at home. As a guy who used to plug in a 14.4Kbps modem to get online while on the road, I’m just amazed.

The best part is I can enable my device as a hot spot, which means that my laptop runs this fast as well, even while I’m on the move in a car. The benefits aren’t just work-specific, either. This summer I’ve learned that letting the kids stream Netflix is a good way to keep them occupied during long road trips. Now the cry from the back seat isn’t “are we there yet?” it’s “how much further until we get 4G?” My children are quickly learning the difference between 3G and 4G. (Side note: Keeping the youngsters entertained gets really tough when you’re, say, traveling California’s Pacific Coast Highway. For hundreds of miles, you’re lucky to have 1X speed, if you have cellular coverage at all.)

So why do you care? If you’re in an area that’s served by these faster speeds, and you can get an unlimited data plan, you should see if it’s worth it to switch over. Keep in mind you don’t have to use your phone to get the faster network speeds on the cellular data network. The data dongles will work fine on your laptop. I appreciate this because I can bypass a company’s network. If you’re consultant, you understand. So many companies don’t want us connecting to their network, either from network security or physical connectivity points of view. In many cases, companies simply aren’t set up to accommodate us, so having this alternative can be a big help.

Sure, I’ve been able to get e-mail and download files at customer sites for awhile–I used cellular cards for years–but it’s so much easier and faster now. (Only occasionally do I find myself in computer rooms or buildings where I can’t get cellular coverage.) Basically, I’m carrying my own network wherever I go.

So how much do you love your devices? More importantly, how much do you need them? And how much more do you love and need them in light of the faster speeds? Are there other things that I should be doing when I’m mobile? Let me know in Comments.

Media Makes the Difference

Edit: Some links no longer work.

Originally posted August 2, 2011 on AIXchange

A customer recently called because they couldn’t login to their machine. A new server was being built, and someone had rebooted the virtual machine. Once the system came back up, no one could ssh or telnet to it, though they were able to ping it across the network.

I was in a location that allowed me to set up webex. This way, we could both see what was going on instead of me simply hearing about it over the phone.

We started by running putty and making an ssh connection to the HMC. From there, we ran vtmenu, chose a frame and selected the LPAR on that frame. We were able to open the console window, and we had a login prompt. However, we couldn’t login as root. We tried a few different combinations of user IDs and passwords, but no luck. The machine appeared responsive, though. Had someone changed the passwords?

The decision was made to reboot the machine and login in maintenance mode. This way we could change the root password and get logged in to verify the network communications.

Because this environment wasn’t virtualized, it wasn’t as easy as simply booting from a virtual optical disk. We also discovered that the NIM server lived on this non booting LPAR, so booting from NIM to get into maintenance mode wasn’t going to work.

Luckily the disk controller that the CD was attached to was available, so we made the controller and the CD available to this LPAR and had someone load the physical AIX DVD into the drive. We booted the LPAR into SMS mode and then selected the correct CD device to boot the machine. Instead of choosing to install AIX, we started maintenance mode for system recovery. Then we chose to access a root volume group and start a shell.

Now we were logged in as root, and we were able to poke around. The filesystems looked OK after running a df, but when we tried to run the passwd command, we got an error. Everything pointed to a corrupt /etc/passwd file, but when we attempted to look at that file, we found that it didn’t exist. Someone had accidentally wiped it out. However, because /etc/security/passwd still existed their passwords were still there, and we just needed to get a copy of /etc/passwd back into the system. Once we did so and rebooted the machine, it came right up and we could login.

We did see a few rm –rf commands in .sh_history, but we didn’t find the actual smoking gun to prove that the file was deleted. We did learn though that someone was copying /etc/passwd files around the environment, so it was certainly possible that this person erred when manipulating the files.

So how is your environment set up? Are you taking mksysbs? Are you backing up individual files so that you can recover them if needed? Do you have a NIM server available to boot and restore from? Do you have install media handy that you can boot from? Install media was the key in this case. Although my customer’s problem was fairly trivial and relatively easy to fix, having the install media on hand allowed us to resolve the issue quickly.

Virtualization Webinars Add to AIX Education Offerings

Edit: Many links no longer work.

Originally posted July 26, 2011 on AIXchange

For some time, I’ve informally collected a few go-to resources for AIX pros. For starters, there’s Anthony English’s AIX Down Under blogChris Gibson’s AIX blog and the AIX Virtual User Group-USA. And for sure, get on Twitter.

Others who provide good AIX info include Andy WojoNigel GriffithsWaldemar Mark Duszyk and whoever is behind AIX Mind. (Feel free to let us know in the Comments section.)

Beyond those, have a look at these great AIX movies. And here’s some links to the Quicksheet and QuickStart documents that cover AIX and PowerVM.

Finally, courtesy of Anthony English, here’s word of a new webinar series on Power Systems Virtualization from IBM:

“As an IT professional, you may have heard of IBM PowerVM or Power Systems based around the IBM POWER processor. You may even have seen a presentation on it, but have you wondered:

  • What is it like to actually use?
  • What are the key features for POWER and AIX, Linux for Power and IBM i?
  • How will it save me systems administration time and reduce weekend working?
  • What do I need to run it and how do I get started?

“What are we planning to do?

  • Well …. it is best to let the product talk for itself via a series of live lectures and hands-on demonstration of these features.
  • The sessions aim to be about 50 minutes long and roughly once every two to three weeks.

“Who should attend?

  • These webinars are aimed at a technical audience (operators, systems administrators and technical specialists) — people using (or planning to use) IBM’s Power based systems.
  • Primarily customers, but also available to IBMers and IBM Business Partners.”

These webinars are being held during U.K. business hours (hence the euro spelling of “virtualisation”). Currently four sessions are listed; replays are available for the first two, which have already taken place.

Session 1: Exploiting Virtualisation on IBM Power Systems with PowerVM
Session 2: VIOS — how to get going
Session 3: Controlling processor resources in virtualised partitions
Session 4: Deeper dive into Active Memory Sharing

If you haven’t figured it out by now, I’m always looking for more tips and tricks and information. So let me know: Who do you follow? And how do you keep your skills and knowledge current?

The Value of IBM Tech Briefings

Edit: Briefings and virtual briefings cannot be beat.

Originally posted July 19, 2011 on AIXchange

Last month I was fortunate enough to attend an IBM technical briefing covering Power Systems and Storage Systems.

This one-day conference covered an array of information. For starters, IBMer Ian Jarman offered some stories and anecdotes about Watson and the IBM Jeopardy! Challenge.

This talk was followed by two simultaneous breakout sessions. Rolf Kocheisen and John Purcell covered IBM Systems Director, showing attendees how to install and use the solution in a live demo. Meanwhile, Bill Wiegand’s presentation, “Simplify Storage Management with Virtualization,” examined storage virtualization solutions, including products like XIV, V7000 and SVC.

After lunch, the conference broke into five different tracks. The storage track featured seasons covering VMware on XIV (Pete Kisich), TSM for Virtual Environments (Greg Van Hise) and Data Deduplication with ProtecTIER (Neville Yates). The AIX and IBM i track covered AIX Performance (Steve Nasypany), Oracle RAC and Oracle 11g on IBM Power Systems (Rebecca Ballough), VIO introduction for IBM i (Allyn Walsh), and IBM Storage Systems on Power (Brian Sherman).

Later sessions covered: Systems Director Management Console (SDMC) (Gary Anderson), Shared Storage Pools and VIO Server Enhancements (Ron Barker), PowerHA for IBM i (Eric Hess), Cloud Computing 101 (Jaqui Lynch), “What’s new in PowerHA for AIX?” (Shawn Bodily), NIM Master Tuning and NIM Master Group Migrations (Steve Knudson), Upgrade Planning for POWER7 Hardware and IBM i 7.1 (Allyn Walsh), and WebSphere Performance and Tuning on Power (Surya Duggirala).

So why did I list all these sessions and presenters after the fact? To illustrate the breadth of information that was presented and the technical “firepower” that delivered it. If you’ve worked on IBM Power Systems for any amount of time, you surely recognize at least some of the names I shared.

The point is, even though this IBM technical briefing is past, there will be others. And if you get a chance to attend an event like this — which is free of charge, by the way — jump at it. It’s a day well-spent. I’ve heard people compare IBM tech briefings to drinking from a firehose — you get so much information that it can be overwhelming — but I’ll take my chances. (Others like to joke that IBM stands for Information Between Meals. IBM teaches you, feeds you, and then moves you along to the next session where you can learn more.)

So reach out to your local IBM reps. They’ll e-mail you with information on upcoming events, and they can also connect with IBM presenters to get you slides from previous sessions. They may even be able to help you bring an event like this to your area. For that matter, if you’re close enough to the IBM Briefing Centers in Rochester, Minn., or Austin, Texas, simply schedule a briefing for your company.

Remember, IBM and other AIX pros have produced so many freely available resources: conferences, blogs and documentation like IBM Redbooks. It’s out there, and all we need to do is ask for it or look for it.

Twitter Yields More AIX Tips

Edit: Links no longer work.

Originally posted July 12, 2011 on AIXchange

Once again, Twitter had some interesting things to tell me when I searched in #aix.

I got a laugh from this Anthony English tweet:

“Found reference to #AIX 5.4 in doco http://t.co/FUQCAuU AIX 5.4 never released – 6.1 & #Power6 took its place.”

Sure enough, check this out:

“Enhanced JFS is the default file system for 64-bit kernel environments. Due to address space limitations of the 32–bit kernel, Enhanced JFS is not recommended for use in 32-bit kernel environments. Support for datasets has been integrated into JFS2 as part of AIX Version 5.4. A dataset is a unit of data administration.”

I wonder when that reference will be changed.

Nigel Griffiths tweeted about his article on keeping VIO servers up to date.

Chris Gibson had a tweet about extending error log size in AIX. As noted here, by default, AIX sets its error log size at 1 MB. However, since it’s a circular log, useful diagnostic information is often overwritten. The size of the log can be increased dynamically by use of the  “errdemon” command in AIX.

You’ll see here the current log size with the 1 MB restriction:

# /usr/lib/errdemon -l
Error Log Attributes
——————————————–
Log File                /var/adm/ras/errlog
Log Size                1048576 bytes <<<
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000

Use this command to reset the maximum log size to 40 MB:

# /usr/lib/errdemon -s 41943040

And here’s how to confirm the maximum size:

# /usr/lib/errdemon -l
Error Log Attributes
——————————————–
Log File                /var/adm/ras/errlog
Log Size                41943040 bytes <<<
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000

Finally, this tweet linked to a nice way to “retrieve all HBA WWNs on AIX”:

for i in $(lsdev -C|awk ‘/^fcs/ {print $1}’);do echo “$i\t$(lscfg -vl
$i|awk -F. ‘/Network Address/ {print $NF}’)” ;done

fcs0    C05092032BFC00C0
fcs1    C05092032BFC00C2
fcs2    C05092032BFC00C4
fcs3    C05092032BFC00C6

As always, it pays to follow AIX pros on Twitter. You’ll find all kinds of interesting facts, tips and tricks.

Migrating Subsystem Storage Data

Edit: This is still relevant.

Originally posted July 5, 2011 on AIXchange

Both Anthony English and I (go here) have recently written articles about migrating data from one storage subsystem to another. Take the time to read them so you can add more tools to your bag of tricks.

I’ve done quite a few migrations lately, and my preferred procedure is pretty simple. Assuming I’m adding new disk (hdisk1) to my existing disk (hdisk0), I like is to add the new LUN or hdisk to the volume group, using:

extendvg rootvg hdisk1

Then I run:

mirrorvg –S hdisk0 hdisk1

After the mirror completes and I verify that the logical volumes have changed from stale to synced and the mirror is taking place in rootvg, I run:

bosboot –ad hdisk1
bootlist –m normal hdisk1

Then I verify my bootlist by running:

bootlist –m normal –o

Then I unmirror the volume group by running:

unmirrorvg datavg hdiskX
chpv –c hdiskX
reducevg rootvg hdisk0
rmdev –dl hdisk0

Finally, I can remove the mappings, adapters, backing devices or whatever I used in the VIOS to present the LUN to the client.

Were I mirroring some datavg, obviously I’d skip the bosboot and bootlist and chpv commands, but the rest would be the same. Read the two articles and you’ll find other methods you can use, like migratepv (to migrate either the entire physical volume or just one logical volume at a time) or mklvcopy.

As far as data migrations go on AIX, do you have a preference? Do you like to run sync right away? Or do you- as Anthony suggests in his piece–wait until a less busy time? Share your thoughts in Comments.

One quick thing regarding last week’s post: Anthony English pointed out in the comments that you need to run oem_setup_env and become root first. Then you can run the bosboot and bootlist commands on your VIO servers if you’re going to be messing with those commands as root. I’d neglected to mention that I wasn’t running these commands as padmin. I made an assumption, and we all know what happens when you assume.

Protecting Your Data with mirrorios

Edit: The link no longer works.

Originally posted June 28, 2011 on AIXchange

In the “good old days” of AIX administration, companies had standalone servers, and rootvg lived on internal disks. We always had at least one pair of internal disks, mirroring them to one another. In the event of a disk failure, you’d unmirror the disks and then replace the failing/failed disk. This was usually accomplished on the fly with hot swap disks. Typically the end users never even knew there was a problem.

Those who still run servers on physical internal disks still need to make sure that they’re mirrored in the event a disk needs to be replaced. But even with companies that have moved on to virtualization technology, much of this thinking is preserved today with dual VIO servers and virtual SCSI devices. If you lose one physical path to the storage, the VIO client uses the path provided by the redundant VIOS to keep running.

Most customers I work with these days boot their LPARs (I know, I’m supposed to call them virtual servers, but old habits are hard to break) from SAN. The disk protection and physical disk replacement happens on the back end with a SAN team. Although I do see customers where the SAN guy and the AIX admin are the same person, in all cases the data protection occurs behind the scenes as far as AIX is concerned. The hdisk isn’t affected as far as the OS knows.

When installing your VIOS and booting it from internal disks, it’s still a good idea to mirror that disk to the other internal disk that’s assigned to the same bus on the VIOS. With split backplanes and dual VIO servers, this thinking just needs to be taken a step further: be sure to mirror all of the disks on all of your VIO servers, assuming they’re booting locally.

To help in this regard, VIOS has a built-in command called called mirrorios. When you run it, you’ll be prompted to reboot your machine when the mirror operation completes. However, that can be deferred by simply running the command, mirrorios –defer.

When this command completes, check your bootlist. You’ll find that it hasn’t been updated with the disk that you just mirrored to. To remedy this, you must manually run a bosboot on the new disk you mirrored to, and then update your bootlist to reflect the change. If you’re wondering why mirrorios can’t perform both steps, you’re not alone. This is supposed to be an appliance, after all.

So what other methods do you use to protect your data? Surely you take mksysb images and backup your system using TSM or some other method. This of course, brings up the familiar but important question: Have you tested your restore procedures lately? Are you sure they work?

Finally, have you made sure your VIO servers are set up correctly? As with a high availability cluster, the wrong time to find out that things aren’t set up correctly is when you really need them to work.

Connecting to a Remote HMC

Edit: Some links no longer work. The SDMC never did take over from the HMC.

Originally posted June 21, 2011 on AIXchange

What do the best practices documents tell us about HMC private networks when communicating between HMC and flexible service processor (FSPs)? Is a private network switch or VLAN really needed between the HMC and the FSP? Can an HMC in a remote data center be used to manage machines over regular network links?

This 2007 document, authored by IBMers Ron Barker, Minh Nguyen, and Shamsundar Ashok, states:

“The network connection between the HMC and the FSP can be either private or open on low-end to mid-range servers. Private is preferred, and therefore a best practice. A private network is required for systems that have a BPA, such as the models 590, 595 and 575.

“In an open configuration, the FSP’s IP addresses must be set manually on each managed server. They cannot be DHCP clients of any server other than a managing HMC.

“Addresses can be set using the Advanced System Management Interface (ASMI) on the FSP. This involves directly connecting a laptop to one of the ports on the FSP and using HTTPS to log into one of the two pre-defined IP addresses. The HMC1 port defaults to 192.168.2.147; HMC2 defaults to 192.168.3.147. The systems administrator can login as user ‘admin’ using the default password ‘admin,’ which should be changed during the initial installation for security reasons. If no laptop is available, an ASCII terminal can be used on the native serial port to access the FSP menus in character mode.

“Remember, with POWER7, the addresses have changed to:

Service processor A HMC1 169.254.2.147,  HMC2 169.254.3.147 and
Service processor B (if installed) HMC1 169.254.2.146  HMC2 169.254.3.146

“Open networks are used for communications between a logical partition and the HMC. This connection is largely to facilitate traffic over the Resource Monitoring and Control (RMC) subsystem, which is the backbone of Service Focal Point (SFP) and required for dynamic resource allocation. The open network also is the means by which remote workstations may access the HMC, and it could be the path by which an HMC communicates with IBM Service through an Internet connection.”

Though HMCs will be going away in favor of SDMC, the transition will be gradual. For the time being we need to keep our HMC skills sharp, and this is one question that frequently arises when customers plan to add new systems to an environment.

Recently I helped a client set up HMC communications over an open network in just this manner. We found that our FSP ports were still going 100MB, as the ports were requested to be set to 1000/Full Duplex. They wouldn’t link up to the network at that speed.

Once we had the HMC on the open network, we pointed the HMC to the new IP addresses we’d configured on the FSPs on the local machines. That worked. We then added managed systems that were located in a remote data center without a problem. Finally, we did the same thing with an HMC from the remote data center to manage the machines in the local data center.

While I wouldn’t recommend using a remote HMC for day-to-day tasks if a local one is available to you, this is a viable option when setting up machines.

How is the HMC set up in your environment? Please share your experiences in Comments.

Time’s Practical (and not so Practical) Complexities

Edit: One of the many reasons I moved to Arizona.

Originally posted June 15, 2011 on AIXchange

I devote a considerable amount of time to thinking about time. With family, friends, clients and fellow IT pros sprawled worldwide, I must think before picking up the phone. It’s never fun to be the recipient of a 3 a.m. call because someone incorrectly calculated a time-zone difference.

Even with e-mail, I must remind myself that, in some cases, I shouldn’t expect a reply any time soon  since it’s nighttime where the recipient lives. Or I realize that no, I’m not getting messages in the middle of the night, just from other parts of the world.

My calculations are made easier thanks to a tool called kworldclock (here).

It helps me visualize where the sun is shining around the world. I’d like to see it ported to other platforms so more people could use and enjoy it. However, the Android Market has a free app called “daylight world map” that I recently downloaded. It’s almost as good.

Another useful resource is the website, EveryTimeZone.com. I’m sure there are other similar sites out there, and I’d be curious to hear about your favorites.

I’ve also used Windows desktop gadgets that display times (and local weather conditions) in different parts of the world. And with Firefox’s foxclocks extension, times in different locations worldwide can be displayed in your browser.

While the world obviously needs different time zones, I don’t understand why we compound the confusion with Daylight Saving Time. For 20 years I lived in Arizona, the one U.S. state that doesn’t observe DST. I still can’t get over the fact that the rest of the country and other parts of the world burden themselves with it. Nonethless, having since lived in other areas of the United States, I now spring forward and fall back and take weeks to adjust to the changing hours like everyone else. Who came up with this idea?

Though I am in agreement with many others who’d like to abolish DST, this group would take it a step further and halve the four U.S. time zones.

“Congress appears to have felt we were not having enough of a difficult time so in 2007 they passed a law starting Daylight Savings Time three weeks earlier and ending it one week later. This cost U.S. companies billions to reset automated equipment, put us further out of sync with Asia and Africa time-wise, inconvenienced most of the country, all in the name of unproven studies that claim we save energy.”

I can attest to this. Back in 2007 I was patching machines so computer clocks could accommodate the change. It was like a Y2K flashback. I can only hope I don’t have to go through that again.

More from StandardTime.com:

“The activists here at StandardTime.com have a modest proposal to end Daylight Saving Time that will reap large benefits in addition to ending the semi-annual changing of the clock. It has not escaped our notice that in the United States, Eastern Standard Time is the same as Central Daylight Time and Mountain Standard Time is the same as Pacific Daylight Time. Thus, we propose that The Pacific and Central time zones remain on permanent Daylight Saving Time, and that the Mountain and Eastern time zones remain on permanent standard time.”

I don’t mind planning calls or going to other lengths to facilitate communications with others from around the world. But change my clocks twice a year? Let’s just say I have no time for that.

An AIX Migration Tip Leads the Grab Bag

Edit: How long has it been since you modified tunables. Some links no longer work. I still follow most of those users on twitter.

Originally posted June 7, 2011 on AIXchange

It’s been awhile since I’ve given you a grab bag of links and tips. I’ll start with a personal experience.

A recent client with an Power server running an Oracle database was migrating from AIX 5.3 to AIX 6.1 Certain settings they specified in their /etc/tunables/nextboot took effect when they booted AIX6, and they couldn’t figure out why Oracle was running so horribly. Jobs that normally ran for a few minutes were taking nearly an hour to process. Luckily, someone noticed some messages regarding changes to restricted tunables. Upon checking /etc/tunables/lastboot.log, they saw:

Setting maxperm% to 30
Warning: a restricted tunable has been modified
Setting maxclient% to 30
Warning: a restricted tunable has been modified
Setting strict_maxperm to 1
Warning: a restricted tunable has been modified

Once they changed /etc/tunables/nextboot and rebooted, Oracle ran like a champ and the machine was fine. So add this to your migration checklist: Try the AIX 6.1 default settings first, then make modifications if needed post-upgrade. And be sure to check the tunable settings that are carried over with a migration.

You know that I frequently link to IBMer Nigel Griffiths-–follow him on Twitter as @mr_nmon. In addition to sharing some hints and tips about IBM Systems Director, he’s posted a number of entries covering options for monitoring entire physical systems (as opposed to monitoring on a virtual server or a VM by VM basis).

For instance, here are 22 things you should do before setting up Systems Director. And here are eight things you should do once you’re running it. The latter installment tells you, among other things, how your boss can run Director from an iPad.

Nigel also offers systems monitoring tips involving Directorlpar2rrdtopas CEC analyzerIBM Tivoli Monitoring and Ganglia.

Recent AIX Virtual User Group meetings have also included Systems Director info. Listen to the replays, download the presentation materials and sign up for future meetings here.

As I’ve said before, you’ll find a lot of AIX knowledge on Twitter. @nicolettemcf, @cgibbo, @ibmaix, @aixmag, @aixdownunder are just a few of the users I follow. Search on #AIX and I’m sure you’ll find others you like.

Remote Access: From the Laptop to the Phone

Edit: This is still an issue, attackers still get in and we still need better security and intrusion protection.

Originally posted June 1, 2011 on AIXchange

As I wrote recently, I remotely access machines regularly, whether I’m logging in directly or using a tool like webex to observe or help others with their server configurations.

Given my reliance on remote access, I have an opinion about virtually every option out there. For instance, RSA tokens: It can be a pain if the physical token is in another location when you need to login to a server, but it’s still a step forward. And the more recent advent of RSA software is another step forward. This way you don’t have to worry about transporting (or forgetting to pack) a physical RSA token. Either way, with RSA, no one else can gain access without knowing the password and having access to the token or the laptop running the software. (Assuming, of course, that the RSA breach earlier this year didn’t compromise the entire system–see here, here and here.)

In contrast, while I have used Gmail, and I do like it, I worry about someone gaining access to my account and deleting and copying my mail. If someone gets my Google password, it’s game over. That hacker could log in from anywhere and do anything. It does happen. I read about a Gmail user who logged into his account and discovered all of his e-mail had been deleted. Even after he verified his identity, Google could only restore a small fraction of his mail. His data was gone. I recently enabled two-factor authentication for my Google account (see here).

While I’ve not had any issues with it, this reviewer found it difficult to manage with the myriad Google apps he was using. So your experience could be different from mine. In my case it was straight-forward. Once I enabled it and had the Google authenticator application loaded on my phone, it was a simple matter of logging in to my account as usual, and then, when prompted, entering the security code. For things like mobile Gmail on my phone or instant messaging using pidgin, I needed a new password from the Google account website, but that was all easily done.

Since I always carry my phone, I’d love to see more ways to run authentication software on it. With the continuing migration to smartphones it could become more common, but where would it all end? Once software on our smartphones becomes the norm, would we advance to swiping a fingerprint on a keyboard, looking into a webcam for an iris scan or using voice recognition? Who knows what other authentication mechanisms we will eventually conjure up as we try to keep our systems safe.

As a Term, LPAR isn’t that Logical

Edit: I still say LPAR all the time. Nigel’s link no longer works. People still fight over AS/400 and IBM i.

Originally posted May 24, 2011 on AIXchange

In an AIXchange blog entry last month, when I discussed the new SDMC IBM Redbook, I noted that:

“Section 1.5 shows us how the terminology will evolve. Managed systems are now called servers, frames are power units, LPARs are virtual servers, the hscroot ID becomes the sysadmin ID, partition mobility becomes relocation, etc.”

Nigel Griffiths took this a step further, recently arguing that the time has come to call our partitions virtual servers or virtual machines rather than LPARs:

“So the observant might have noticed a sharp decline in the term LPAR in the last three or four months. Apparently … this change is now recommended within IBM and IBM marketing, so you will see a lot more use of the new terms. This is a change in name that I whole heartedly approve- unlike, for example, RS/6000 to pSeries to System p to Power Systems (which now confuses the world’s fastest general purpose computers with mains electricity power supplies!). Of course, ‘LPAR’ will turn up out of habit on the Internet, in documents and articles for many years to come and be popular with IT luddites now that it is old fashioned. …

“When I think back to it, the Logical Partition (LPAR) name never did make much sense!

“Logical: means shared or pretend or not physical.
“Partition: means a part of the whole and started life as a disk term as a group of sectors.”

Like Nigel, I guess I never gave much thought to the terms “LPAR” and “logical partitions.” I was used to them. In my head I always compared LPAR with a hard or dedicated partition or a standalone server.  The dedicated partition would have some sort of dedicated hardware–dedicated processors or dedicated adapters. A logical partition was fully virtualized using virtual or logical devices, disks, network, shared processors, etc. To me it made sense to call it a logical partition since it was using logical devices. Because dedicated and logical partitions could be mixed and matched on a physical frame, I appreciated that this terminology easily differentiated the kind of LPAR we were talking about.

Virtual systems and virtual machines are appropriate terms for this updated technology. But with VM’s history, there is potential for confusion. There’s not only VM, the old mainframe operating system, but there’s VMware, the non-IBM virtualization software. PowerVM is obviously far more powerful than VMware, but again, they have that VM term in common.

I’ll certainly try to call them VMs going forward, but don’t be surprised if I occasionally make reference to LPARs. And I’m sure I won’t be the only one–after all, plenty of customers still tell me about their AS/400 and RS/6000 systems, even though they’ve been on POWER7 servers for some time. Despite the new direction in terminology, I wouldn’t be surprised if we continue to hear about LPAR well into the future.

All that matters really is that everyone understands what we’re talking about. Perhaps going forward we’ll find a way to differentiate a VM with dedicated adapters versus a fully virtualized VM. Or does that distinction even matter anymore? How quickly do you expect your vocabulary to change?

The Hidden Cost of Poor Service

Edit: Modified the Gitomer link, still good information here.

Originally posted May 17, 2011 on AIXchange

How do you respond to poor customer service? Do you flip out, demand to see a manager and cause a scene? Do you demand upgrades? Or do you quietly walk away, telling yourself that you’ll never be back, no matter what.

What do poor attitudes and poor customer service cost you and your company? Or, on the flip side, how much does your company benefit from great attitudes and great customer service? When customers like a company or a product, they’ll spread the word to their friends. People will also let others know if they don’t like a company or product.

As Jeffrey Gitomer puts it: “The one word definition of referral is risk. … When someone gives you a referral, it means they are willing to risk their relationship with the referred person or company. They have enough trust and faith in you to perform in an exemplary manner, and not jeopardize their existing friendship or business relationship.”

Last year I discussed a bad experience I had with a computer manufacturer:

“I placed my order, and waited for my delivery. And waited. And waited some more. Eventually I got an e-mail saying that the ship date had slipped by several weeks. No kidding. Unfortunately, in this case I was counting on the system to arrive by a certain date because I’d already promised my older system to someone else. They didn’t want to wait either.”

While I didn’t publicly name the manufacturer, I’ve since had people ask me for recommendations. Needless to say, I’ve always recommended someone else. And when I’ve had other machines that needed to be replaced, I’ve gone with another vendor.

I don’t think I’m exaggerating when I say that, as a result of this one bad experience, at least six systems were purchased from other vendors, either by me or by people I know. Some of these systems included dual monitors, SSD drives and other hardware upgrades. How much money did this poor customer service cost this company? Odds are I’m not the only one who’s had an issue this manufacturer. So take my experience and multiply it three, five, even 10 times. Now we’re talking about real money.

Maybe the saddest part is this manufacturer will probably never know what happened. I didn’t mention their name. I didn’t rant about my experience on Twitter. Other than the phone call that I had to make when I was forced to cancel my order, I’ve had no interactions with them. Sure, they still e-mail special offers, but I delete them as soon as I see them. I don’t do business with them anymore.

As a consumer, I have a long memory. I know which restaurants I won’t return to and which airlines I’ll never fly again. And I’m hardly alone in this regard. I know people who refuse to patronize vendors based on bad experiences that happened decades ago.

Of course, mistakes and accidents happen, and some things are beyond our control. Will your company shine when those moments come, or will it lose customers — along with several of those customers’ friends and acquaintances? What are you doing to bolster and/or maintain your company’s reputation for good customer service?

Sometimes the Latest isn’t the Greatest

Edit: I still have a landline, and I still like it.

Originally posted May 10, 2011 on AIXchange

I know I shouldn’t say this, since I work in technology, but I still have a landline phone at home, and I like it.

Sure, I’ve used voice over IP (VOIP) for webinars, and I’ve had different flavors of Cisco and Avaya IP phones on my desk through the years (and probably some others that I don’t recall at the moment). And it’s fine. I can seldom tell the difference between VOIP and traditional landlines. On my PC I use Skype and Google voice and different kinds of VOIP software. With these solutions, my computer makes for a perfectly acceptable phone.

Still, when you get right down to it, I prefer the voice quality of my regular old landline phone. Tell me I’m a luddite. Remind me — since most people on are cell phones or VOIP these days — that my landline calls go over IP at some point in their journey anyway. I still argue that you can run into issues with latency and jitter with VOIP that you don’t face with the regular old phone system. I also prefer to have a working landline phone in case the Internet goes down or the power goes out (although I’m not sure who I’d call since the rest of you have apparently switched to VOIP).

Maybe it’s because I still do numerous conference calls, but I prefer a landline with a nice, old-school Plantronics headset. Sure, I sacrifice mobility, but I don’t have to worry about dying batteries or the connection getting choppy while I download large files. I deal with these issues plenty when I’m on the road, so I know of what I speak. I’ll be using a laptop and wireless phone with a wireless headset, and eventually, inevitably, the batteries for each will slowly drain. My Bluetooth wireless headset is usually the first to go. While I do have wired headsets that I’ll then plug into the cell phone, I know it’s a matter of time before that phone battery goes next. Then I’ll generally plug the phone into an outlet (rather than spend a few minutes dropping the call and changing the battery).

Are these big hassles? No. But they’re still hassles. Then there’s the sound quality issue. I’m convinced that cell phones still lag behind landlines in that regard.

Getting back to VOIP, it has its own drawbacks. With a VOIP software client, you cannot leave your computer. You cannot reboot your computer. You cannot move large files around without affecting the call quality. If someone else on the network starts using the bandwidth, your call quality can be affected.

Admittedly, I see fewer issues with VOIP than I once did. I also know there are products that will route calls to your cell or VOIP phone, or your home phone. And I further know that it’s 2011. But I’m the guy who still loves — and uses — an IBM Model M keyboard. Even though I work with incredible, cutting-edge technology every day, there’s still a bit of old-school in me.

However, if you’d like to drop me a line and tell me I’m crazy to not be dropping my landline — or if you want to point out some new solutions I should pay more attention to — leave a message in Comments.

Setting Up NPIV

Edit: This is still good stuff.

Originally posted May 3, 2011 on AIXchange

Following up on this recent post, I want to go into greater detail on setting up NPIV (N_port ID virtualization).

With most customers, the first question I get is, “Do I have the hardware to run NPIV?” If you’re running at least POWER6, you have IBM 8-GB fibre cards and your SAN switches are NPIV-capable, you should have what you need.

This document can help you determine if you’re set to use NPIV. If you log into your VIO server, run lsnports and find the value for “fabric” is 1, you’ll know you can safely map virtual adapters to your physical adapters. (Also remember to read the configuration document I referenced in the previous post.)

Setup is straight-forward. Create a virtual fibre adapter in your VIO server, then create a virtual fibre adapter in your VIO client. Map the virtual adapter in the VIO server to a physical fibre adapter using the vfcmap command and give the virtual worldwide name (WWN) to your SAN team.

Lately I’ve done a number of logical disk migrations for people who initially set up virtual SCSI and want to move to NPIV. Using dynamic LPAR, virtual fibre adapters can be added to the VIO server and client. The virtual adapter is mapped to the physical adapter and WWNs are obtained from the HMC. If you use Live Partition Mobility in your NPIV environment, remember that you’ll need to map both virtual WWNs, as both are used during the actual migration.

NPIV allows you some flexibility as far as using virtual adapters. I’ve seen some environments that have one adapter per VIO server, and others that map a virtual fibre adapter to every physical adapter in their VIO server. Some argue that one virtual adapter per VIO reduces complexity while providing sufficient redundancy. In many of these environments, the first virtual adapter is mapped to fcs0, the second to fcs1, etc. Whichever method you choose, I believe it’s important to test the set-up by rebooting the VIO servers. You need to verify that what you think will happen when you bring down a VIO server is what will actually happen.

I have customers that reuse the same LUN that they were using with vSCSI. In those cases, we unmounted the filesystems, varied off and exported the volume groups, used the rmdev command to remove the disk and the disk’s mappings from both VIO servers, changed the SAN zoning to map to the virtual WWN instead of the VIO servers’ physical WWN, ran cfgmgr in the client LPAR to see the disk directly in the client (importvg –y vgname hdiskX) and mounted the filesystems. It’s almost as if we never made any changes — though you need to be aware of any disk drivers or MPIO software that’s now needed in the client instead of the VIO server.

I also have customers that — rather than go through the downtime associated with remapping their disks – are fortunate enough (because they have enough storage) to just create new LUNs. They leave their original vSCSI mappings in place, map their new LUNs via NPIV directly to the client and just use migratepv to move the data from the old disks to the new disks. Then they remove the old vSCSI disks and mappings at their leisure.

One other thing to keep in mind once you complete the move to NPIV: Just because you no longer use vSCSI for your disks, you should still keep a vSCSI adapter on your VIO server and client for virtual optical devices. I know I still want the capability to use virtual .iso images as I always have.

So are you looking forward to an NPIV migration project? And if you’re already up and running, what’s your experience been like? Please share your thoughts in Comments.

The SDMC Evolution

Edit: Did anyone ever run this?

Originally posted April 26, 2011 on AIXchange

The IBM Redbook covering the IBM Service Director Management Console (SDMC) is now available. Whether you’re making the move from the HMC to the SDMC now or later, this publication will help you with your transition. It’s well worth the download.

The first time I read it, I learned interesting things like:

* “The SDMC is available as a software and a hardware appliance. The software appliance will replace the Integrated Virtualization Manager, and can manage machines from the blades up to the 750 class servers. The hardware appliance is required for management of midrange systems and high-end systems. The SDMC releases can be used alongside the Hardware Management Console during trials and deployment, which eases transition.”

* “The SDMC virtual machine contains Linux as the base operating system. For the software appliance, the client supplied virtualization options for different hypervisors include Red Hat Enterprise Virtualization KVM or VMware ESX/ESXi.”

* Section 1.5 shows us how the terminology will evolve. Managed systems are now called servers, frames are power units, LPARs are virtual servers, the hscroot ID becomes the sysadmin ID, partition mobility becomes relocation, etc.

* “The SDMC incorporates most functions of the Hardware Management Console. This has been done through direct mapping of commands or by replacing functions that are present already in IBM Systems Director. Some functions are not available in the first release of the SDMC, notably the ability to handle system plans.”

Comment: As system plans are wonderful tools that I highly recommend, hopefully this will be fixed very quickly. From what I understand this will be addressed in one of the early service packs.

* “The command-line interface has been mostly kept the same. On the SDMC, most of the commands are just preceded by smcli. This new prefix might require changes to existing scripts that use the Hardware Management Console.”

* “SDMC provides the capability to back up the whole virtual machine onto removable media or a remote FTP server. You can restore using the backup file from the removable media or from a remote FTP server. The restore will be full image deployment and all existing files will be replaced from the backup. Unlike the HMC, SDMC backs up the entire disk instead of individual files. The backup function requires that the SDMC be temporarily shut down to quiesce the disks, but it will be immediately restarted while the disk files are copied to removable media or a remote FTP server. The restore function takes under an hour to complete.”

The SDMC has been a topic of discussion at workshops and IBM Technical University conference, so hopefully most customers are up on this change. It shouldn’t come as a surprise.

Basically, the SDMC is still an appliance just like the HMC is today. It will run Systems Director code under the covers. The hardware will be the same CR6 that we’re used to, but SDMC will require more memory and disk space. There will be two 500 GB disks running in a RAID0 setup, so be sure to backup the SDMC; these disks will not be mirrored. Although I’ve heard that existing CR6 machines will ultimately be upgradeable, at GA the machines will be net new. So initially, it will probably make sense to run the HMC and SDMC simultaneously until you get used to the SDMC’s new capabilities.

Those new capabilities are impressive. The SDMC will be able to manage the whole POWER6 and POWER7 lineup, including blade systems. This is a much nicer alternative than the current solution of using and managing each individual blade via IVM. It’s a pain to use the GUI and deal with the frequent timeouts that occur when using the IVM interface. Assuming you have sufficient resources, another thing you’ll be able to do with the SDMC that can’t be done with IVM is create dual VIO servers on your blades.

Finally, the SDMC will support the capability to run live partition mobility operations between blades and standalone servers and back again. This will give customers greater flexibility as far as purchasing hardware and running workloads. With this forgiving infrastructure you’ll be able to move workloads around on the fly, and with dynamic logical partition operations you’ll be able to adjust hardware allocations on the fly.

The SDMC transition will not be a big bang change from the HMC, but it will take some time. The rollout, in fact, is expected to take years. As is standard practice for IBM when introducing updated solutions, the HMC will continue to be supported through this transition, but over time advanced virtualization capabilities will increasingly be brought to the SDMC (and not necessarily the HMC). Customers are encouraged to try out the SDMC, make a transition plan, run it alongside the HMC and get used to it.

As noted, there is a strong thread between the two solutions. The SDMC, like the HMC, is an appliance with user management capabilities and a built-in firewall. The network topology is identical on both solutions. As with the HMC, the SDMC won’t allow admins root access or the capability install software. Just as larger environments have multiple HMCs now, you’ll be able to run multiple SDMCs. You’ll still need an additional Systems Director server to manage your SDMC stand-alone devices and take advantage of advanced plugins like Active Energy Manager or VMControl.

SDMC availability is planned for May 13.

So what do you think of this change? When do you expect to see an SDMC in your environment?

Getting Started With NPIV

Edit: The link still works. This is still a good comparison.

Originally posted April 19, 2011 on AIXchange

NPIV isn’t new functionality, but plenty of customers are only just now getting started with it. I know this because lately, I’m hearing a lot about NPIV. In response to the numerous queries coming my way, I searched and found this excellent IBM Support document on configuring NPIV:

“N_Port ID Virtualization or NPIV is a Fibre Channel facility allowing multiple N_Port IDs to share a single physical N_Port. This allows multiple Fibre Channel initiators to occupy a single physical port,
easing hardware requirements in Storage Area Network design. An NPIV-capable fibre channel HBA can have multiple N_Port IDs, each with a unique identity and world wide port name.”

Compared to using virtual SCSI devices (vSCSI), storage management is greatly simplified with NPIV. NPIV allows AIX admins to zone a LUN to a particular client LPAR directly, rather than use VIOS as a middleman. So with NPIV and SEA, the VIO servers handle the shared Ethernet and NPIV duties. Best of all, there’s no need to map and track LUNs — that duty can be left with the SAN team where it belongs.

In contrast, when using vSCSI with VIO servers, your lsmap –all output can be a mess to manage if a large number of LUNs are being mapped through your VIOS to client LPARs. I’ve seen servers with hundreds of LUNs being presented to the VIOS. In those cases, the AIX admins must manage the subsets of LUNs that are then mapped to individual VIO clients. All that disk-mapping must be tracked, and I’ve seen many different spreadsheets and documents that attempt to do this.

In a typical scenario, two VIO servers will be set up (so that one can be serviced or restarted without these activities impacting the client LPARs). A fibre card or two is usually attached to each VIO server. Then the SAN team can zone the VIO servers to the SAN using the World Wide Name (WWN) information from the physical adapters. This results in a pile of LUNs that AIX admins must map to the appropriate VIO clients. To make all of the LUNs accessible from both VIO servers, each LUN’s no reserve attribute must be set. So the admins end up doing the mappings twice, once on each VIO server.

On top of that, admins must pay attention to PVIDs or LUN IDs to ensure that the disk that’s mapped on VIOS1 is the same one mapped on VIOS2. Having the no reserve attribute set on the disk can open up a potential disaster if the same LUN is accidentally mapped to different clients. If two different clients are booting from the same LUN, it’s time to look for a mksysb and do a restore.

One plus with vSCSI is that MPIO software only needs to be loaded on the VIO server. The VIO clients usually just use the built-in AIX MPIO software as they have no visibility to the disks other than recognizing that they’re virtual SCSI disks.

From this lengthy explanation on vSCSI, you might have already figured that NPIV, once you have it set up, is much easier to use. And you’re correct. With NPIV, virtual WWN information is created for each client LPAR. The SAN team gives LUNs to the client LPARs directly. Virtual fibre adapters must still be mapped to a particular physical fibre card in the VIO server, but admins don’t need to map and track LUNs or worry about reserve locks on the LUNs. (We do, however, need to remember to load the MPIO software into the client LPARs, because the clients do recognize the disks and the storage subsystems from which they come.)

I’ll have more NPIV info next week, so stay tuned.

IBM Product Preview

Edit: POWER7 blades. SDMC. Those are names I have not heard in a long time.

Originally posted April 12, 2011 on AIXchange

IBM is conducting what it calls a “product preview” today. The subject of this preview is new hardware that is expected to be rolled out later this year.

I received this information during a recent conference call with IBM.

First, there will be a new POWER7 high-performance computing machine, the Model 775. As you’d expect, this machine is based on POWER7 processors, which come with eight cores per socket running at 3.8 GHz. The 775 will be packaged with a Quad Core Module (QCM). A QCM consists of four POWER7 chips; thus, each QCM will have 32 cores.

IBM will then take the QCMs and integrate them into what they call a drawer, or a node. A 2U drawer will have eight QCMs, giving us 256 (8×32) total cores in an efficient, densely packed 2U form factor. You’ll be able to have 2TB of memory per node, and IBM estimates peak performance of 7.8 teraflops in a 2U package.

The machine will have a high-speed interconnect fabric, which allows these 2U nodes to be connected using an optical interconnect. This will provide the capability to connect to a total of four drawers (IBM calls this a supernode) consisting of 1,024 cores. Twelve nodes will fit in a rack, with a maximum of 24 TB per rack and a peak performance of 95 teraflops in the rack. Optical interconnects allow for connecting supernodes together — as of now customers could have as many as 512 supernodes (or 524,000 cores) running at an estimated 15 petaflops at peak performance.

The 775 will be a quiet machine, because it will have fewer fans. It will be water-cooled with 100 percent heat capture. The 775 is expected to be used for climate and weather modeling and prediction, life sciences, nuclear resource management, and financial services.

Here are some other things that grabbed my attention:

* The 795 is expected to have new capabilities that allow for hot node add, hot memory upgrades and repair. You’ll also see concurrent GX adapter add and hot GX adapter repair, along with concurrent system controller repair. The maximum number of partitions that can be created on a frame will increase to 1,000 on the 795, 640 on the 770 and 780, and 320 on the 750. Relatedly, when active energy manager is used with these machines, administrators will be able to set up energy policy definitions by partition rather than by system, so different policies can enable energy savings while maintaining performance.

* New blades are planned. There will be a new single-wide Model 703 blade with a maximum of 16 cores running at 2.4Ghz and 128 GB of memory. There will also be double-wide Model 704 blades, which consist of 32 cores running at 2.4GHz and a maximum of 256 GB of memory. The 703 is expected to have one hard disk bay, and you can choose either an HDD or an SSD. The 704 will have two disk drive bays, so you can have either two HDDs or four SSDs. The blades would provide the capability to run both traditional rotating hard drives and SSD drives. These new blades are expected to run in BladeCenter H, HT or S chassis.

* The Model 750 will be refreshed with new processor options, including 4-core 3.7 GHz, 6-core 3.7 GHz and 8-core 3.2 and 3.6 GHz options. The 750 will still have four sockets and 512 GB of memory per machine.

* Support for dual VIO servers across all of the POWER7 blades will be enabled through the new Systems Director Management Console (SDMC). SDMC will be used to enable active memory expansion on the blades. There will also be support for running live partition mobility operations between blades and rack servers, which will open up a whole new way to manage workloads. The SDMC will run on familiar CR6 hardware, although with beefier disk and memory requirements.

The SDMC enhancements mean we’ll no longer need to run IVM to manage our blades, and since we’ll be able to run dual VIO servers, this will make blade offerings a much more attractive option to many customers. In addition, because IBM is making the SDMC the next-generation management console for Power systems, the HMC will be phased out over the next several years. (Although the HMC and IVM are expected to be kept current with new Power systems models into 2013, they will not incorporate future advanced management capabilities.) During the transition period, customers will be able to run the SDMC side by side with existing HMCs until they’re ready to switch permanently.

On the call it was stressed that the SDMC is meant to be evolutionary rather than revolutionary. In other words, IBM says it will give customers ample time to make this transition. And really, this shouldn’t come as a surprise. For awhile now, I’ve been hearing that “the HMC is going away” at conferences and workshops I’ve attended.

The SDMC will manage POWER6 and POWER7 servers, and there will be a virtual appliance version for small-tier systems. The SDMC will utilize the Systems Director user interface. It will support a superset of HMC capabilities, integrate platform and OS management, and maintain compatibility for CLI and scripting support.

I’ll write more about this in the near future, but rest assured this solution will make it much easier to manage an entire computing environment — including servers and blades — from a central location. In addition, read IBM Systems Magazine for more details about the SDMC (a cover story is planned for the May 2011 issue). And an IBM Redbook on the SDMC is expected to be ready later this month.

* Another interesting option that I saw was an SAS disk-only I/O drawer (the EXP24S) that could house up to 24 SFF drives in a 2U form factor. This I/O drawer would allow you to partition the drawer into four different sets of disks, thus making it easier to present a smaller group of disks in the drawer to different partitions. This could be a nice option if you’re not using a SAN but still need access to more external storage.

* Finally, IBM highlighted a change to the Power systems landing page on its website. Look for this URL: www.ibm.com/power.

So what do you think of these planned solutions? I expect there to be plenty of discussion around the HMC and SDMC as we learn more in the coming months.  Please leave your thoughts in Comments.

Remote Tech Support

Edit: I still use webex all the time.

Originally posted April 5, 2011 on AIXchange

I’ve been using screen and VNC on a daily basis for years — and I’ve been writing about them for quite awhile, too. Another tool I like, though I don’t use it all that often, is portmir.

Occasionally I’ll use VNC, screen or portmir to share a session so I can troubleshoot a problem with someone. It’s not the best arrangement — I may not have access to the other user’s network, and/or that other user may be unfamiliar with these tools, or may not have them on their system.

More frequently, I find myself working remotely with customers via VPN. Many customers happily provide me with VPN access so I can help them solve their problems. All I need is a decent network connection, which usually isn’t an issue. Even when I’m traveling in remote areas, I can usually find good wireless or cellular data connections these days.

 Remote technical support has its advantages. It’s much quicker than finding time on my calendar to book a flight and schedule a trip to the customer site. And many good VPN clients are available — I’ve used Cisco, Citrix, GreenBow IPSecopenvpn and Shrew Soft, among others.

But as far as VPN has come, I still work with plenty of customers who either don’t use it or don’t allow vendors to use it. So what do I do when I can’t get VPN access to networks and machines I need to look at? Or what do I do if a new customer, due to its internal processes, can’t get me access right away? What about customers that use physical RSA securID hardware tokens? In that scenario, I have to wait while the token is shipped to me. How do I remotely get access in the meantime?

Fortunately, there’s myriad free- and fee-based web-based conferencing solutions. I really like tools like webex, GoToMeeting and Sametime Unyte. Setting up a conference, having both participants connect, and then having the customer share their screen with me is pretty painless. Many of these solutions also allow you to remotely take control of the session. Some also provide audio capabilities, although I find it just as easy to set up a conference call or make a quick phone call. Everyone can talk to each other, and because we can all view the same desktop at the same time, everyone can watch the commands as I run them or the configuration changes as I make them. These solutions work great when I can’t get VPN access, or when I have to wait for it.

This technology can also be used for training or other types of collaboration. Again, I see exactly what they see. Since we’re usually on the phone, I can easily walk a group of customers through whatever they’re working on or whatever issues they’re having.

I think that more support organizations could benefit from these tools. I’d sure love to be able to call IBM Support, share my desktop and have them see exactly what I’m seeing. Obviously this wouldn’t work with hardware problems that keep you from being able to boot up or access a machine, but that sort of thing is less of an issue these days. Usually when I have problems it’s either a configuration issue or I need to modify settings. In those scenarios, sharing a screen with someone generally makes troubleshooting much quicker and easier.

What other tools and techniques do you use for remote access? If you have a tip or use a tool that I’ve not mentioned here, please let me know in Comments.

Watson’s Impact

Edit: It does not seem like it was that long ago, and yet.. Some of the links no longer work.

Originally posted March 29, 2011 on AIXchange

The IBM Jeopardy! challenge has ended, the experience succinctly summarized with Ken Jennings’ words after Final Jeopardy: “I, for one, welcome our new computer overlords.”

I heard that the project was originally code-named bluej. IBM code names are meant to be placeholders, but still, I’m glad they went with the name Watson. Hearing that bluej beat Jennings doesn’t have the same ring, and somehow, I’m not sure the Ken Jennings reddit AMA (ask me anything) would have been quite as interesting. (Warning: comments may not be safe for work.)

Here’s another reddit AMA with Watson team members.

Though the buzz has naturally subsided, interest is still strong. From what I understand there remains a huge demand for Watson team members to speak at different events. Really, just about any conference you may attend in the near future — Pulse, Impact, Innovate: SWG Events, STG Technical Conferences, COMMON Minneapolis, Power User Groups, Smarter Computing Summit, LinuxCon, University Events and many more — will feature presentations and demos.

There of course has been tons of discussion about what Watson’s victory means for humanity, including, already, a book.

The Jeopardy Archive breaks down Watson’s win (herehere and here).

Recently I saw the webcast, “Beyond Jeopardy!: The Business Implications of IBM Watson.” The participants explore potential real-world uses of this information processing technology (healthcare, for example).

Thanks to Watson, lately I find myself talking about Power systems with non-technical people. Generally, these folks would have a hard time imagining what I do for a living, but because they saw Alex Trebek in the computer room with those servers and computer racks, I found I could more easily explain what I do: “I rack and stack and configure and sell those 750s you saw on Jeopardy!, along with the rest of the IBM Power server product line.”

By the way, Watson may have won the challenge, but it isn’t undefeated. Rush Holt, the New Jersey congressman and a Jeopardy! champ from the 1970s, recently beat a slightly slower Watson version. From CNN:

“After beating Watson $8,600 to $6,200, Holt expressed admiration for the machine, saying the technology has the potential to be extremely useful in situations that require tough decision-making, such as medical diagnosis, air traffic control, and situations that require piecing together bits of knowledge.

“Such technology can also be extremely helpful in emergencies, like an outbreak of a food-borne illness or a natural disaster, said Chris Padilla, vice president of IBM Governmental Programs.

“‘In the modern world, we’re all flooded with information,’ Padilla said. ‘What Watson can do, is go through all of that data, and in response to a natural language question, rank the order of likely responses in terms of what you asked it in the first place.'”

IBM has a webpage filled with interesting facts. For instance, during preliminary sparring matches Watson only used 75 percent of its processing resources. And did you know that a computer with a single processing core takes more than two hours to perform the deep analytics needed to answer a single Jeopardy! clue? Watson, in contrast, holds all the information that it needs to compete on Jeopardy! in about 500 GB of space. You’ll also find flash animations of the machine on stage, a graphic depicting the physical server layout and links to Watson’s architecture, workload, energy storage and network usage and more.

Finally, check out IBM’s Watson website and this whitepaper.

I expect Watson will be talked about for quite some time, both for what it did and for what it can potentially do. Are you still following Watson? Did you enjoy seeing POWER7 machines on prime time television? Are your non-technical friends actually interested in what you do now? Feel free to post in Comments.

#watson

How Much Memory?

Edit: The link points to POWER8 servers at the time of this writing, but the principle is still the same.

Originally posted March 22, 2011 on AIXchange

When ordering a Power server, the number of sockets you pick and the dual inline memory module (DIMM) size you use matter. Consider the 8233-E8B server, commonly called the Model 750. This would be the same model machine that was selected to build the Watson cluster.

With a new machine you have a number of choices to make. A particularly important one is the amount of memory you want. You can choose from memory kits of different sizes, which will allow for different memory densities on the machine.

According to the facts and features guide, the maximum available memory for a 750 is 512 GB. The guide also notes that the machine supports from one to up to four sockets.

The number of sockets you choose will tell you how much memory you can order. If you have one socket, you’ll have eight memory slots. The ratio stays the same moving up: with two sockets, you’ll have 16 memory slots, with three sockets, 24 memory slots and with four sockets, 32 slots. If you’re looking to max out the memory on the machine, you’ll want to max out the number of sockets. While other choices can be made here — e.g., CPU clock speed and the number of cores per socket (either six or eight for the 750) — to reach 512GB on the system, you must choose the largest memory size, 32 GB, which is packaged as 2x16GB memory DIMMs.

Of course, other memory sizes are available for your machines: 8 GB (2x4GB) or 16 GB (2x8GB) DIMMs. But once you make your memory size selection, you need to stick with it, because, if you upgrade the machine in the future, 16 GB and 32 GB features won’t mix. They must be the same feature code. Since, in this scenario, you’re trying to max out the machine, the 32GB (2x16GB) memory option is the choice. And since you have 32 slots for memory in your 4-socket machine, you can see how 16GBx32 slots gives you 512 GB.

So how do I know that you cannot mix DIMM sizes? Here’s a little story, as told to me by someone who was there.

Once upon a time, a Power server was getting a memory upgrade. This system (not a 750) was being boosted from 16 GB to 32 GB total memory. There were 4x2GB DIMMs attached to each processor (eight total), and eight new memory DIMMs were ordered. Since the memory was going to be doubled, it seemed logical to just plug in the eight new DIMMs and power the machine back on. So, the new memory was installed and the machine was rebooted. And then? Error codes started flashing across the LED.

Perhaps the memory was not seated properly? After pulling and reseating all of the memory, the same error came up. At this moment, someone finally thought to check the boxes that the new memory came in. Sure enough, the new memory DIMMs were of a higher density — 8x4GB rather than 8x2GB. Because you can’t mix memory sizes, the machine issued errors. Once the 8x2GB memory was pulled, the machine came right back up with 32 GB.

Here’s something to think about when ordering machines: Do you expect to add more memory in the near future? Sure, this can be difficult to predict, but if you think you’ll eventually upgrade the memory, try to leave yourself some open slots. If you max out your memory with a smaller DIMM size, your only option down the road may be to pull out the smaller DIMMS and replace them with larger DIMMs.

If your machine supports it, think about using Capacity on Demand. That way you’ll have a machine with max memory physically installed, but you’ll only use (and pay for) the activated memory you need now. Should you eventually elect to upgrade, additional physical memory can be activated later. It can help take the guesswork out of future upgrades.

Whatever choices you make, be sure you know what’s installed on your machine — and what you’re adding to your machine — before opening it up.

It’s Lame to Blame

Edit: This is still an issue today.

Originally posted March 15, 2011 on AIXchange

I enjoyed reading this article on some of the turf wars that go on in IT:

“IT pros do battle every day — with cyber attackers, stubborn hardware, buggy software, clueless users, and the endless demands of other departments within their organization. But few can compare to the conflicts raging within IT itself. Programmers wage war with infrastructure geeks. IT staff butts heads with IT management. System admins battle for dominance. And everybody wishes security would just leave them alone.”

I can certainly relate to this. In fact, I cannot count the number of times I’ve seen server guys blame network guys who blame SAN guys who blame operations guys who blame management. At one point or another, any IT group may be viewed as “the enemy.”

From my server-centric point of view, even when problems are addressed, it can be frustrating. Often the server guys will tell the network team about some connectivity or response-time issues. The network team fixes the problem, but they seldom share the solution. It can really seem like they don’t want you to know what they did. You just get those infamous words, “try it now,” and the network problem magically vanishes. Sure, we’re all glad things are fixed, but there’s value in transparency. If we know the specifics about a problem with network connectivity or, say, accessing a LUN, we can remind the network folks or the SAN folks of what they previously did, should it happen again. We could save them time, if they’d just keep us informed.

I don’t mean to paint the server folks as angels. I’m sure I’ve told various users to “try it now” while neglecting to explain the source of the problem. I’m sure many admins, when asked why the time is wrong on the server, or why applications cannot resolve a hostname, or why users cannot login, or why users have the wrong home directory, or with any number of issues, respond with “try it now.”

I suppose part of the reason for turf wars stems from the fact that, in larger organizations, these groups often have different team members with different skill sets, and many times individual team members use only their own hardware, with no cross-training. The network guys work on the network switches, the SAN guys work on the SAN switches, the server guys work on the servers.

So everyone’s isolated — except of course, when there’s a problem. Then everyone must work together. And for the most part, everyone is a professional. Still, there are times when people are more interested in deflecting blame from themselves and their team. No one wants to cop to a mistake. Honestly, these turf wars could make for a great reality show — if only more IT people looked good on camera.

So how do we resolve turf wars? Start by remembering that you’re all in the one organization, and that, despite the many different areas of IT expertise, everyone in IT really has a stake in the computing environment.

Especially as organizations grow, it’s vital that everyone be kept informed of changes to the environment. To accomplish this, these changes must be documented, even if documenting changes typically unleashes unwanted bureaucracy. For instance, a new server is brought onto the raised floor. A ticket is written, and the notifications fly. I’ve seen these new-server tickets reach everyone from the network and SAN team to the backup and monitoring teams. While the bureaucracy frustrates me as much as anyone, these processes and procedures are generally in place for sound reasons. No one is looking to slow down your server build, but other teams do need to be informed of changes that will eventually impact their workloads as well.

That’s why we should play nicer with one another. Give those in the “other group” the benefit of the doubt. Assume we’re all doing our best.

More from the article:

“Down the road someone will ask, ‘Do you know so and so?’ and you’ll say, ‘Yes, he walked out on us and took our passwords with him.’ It’s a small industry. The only things that have meaning in this life are your name and reputation. Lose them and you’ll never get hired again.”

If you’re always combative and causing drama, people may stop working with you, or avoid you until they are forced to work with you.  It really is a small world and people will remember your interactions with them. It’s really in your best interests to make the effort to get along with others. And that will make everyone’s days that much easier.

One more snippet:

“The most important decisions a CIO faces aren’t about technology per se, but about business outcomes. And that may never enter the mind of an in-the-trenches IT grunt. ‘I’ve had a lot of discussions with a lot of very tech-savvy CIOs,’ he says. ‘But at the end of the day, the business decisions they need to make aren’t based on sexy technology — they’re based on business outcomes. There’s pressure on the CIO from the CEO to deliver business value. The IT guys are focused on the technology in their particular tower.”

In other words: Managers may have a completely valid reason for denying a new technology that you recommend.

It’s natural to get caught up in our areas of expertise. But remember that we’re not only supporting servers, but applications and the users of those applications. We’re all providing value to an organization.

10 Rules for Admins

Edit: This is still a good list of rules.

Originally posted March 8, 2011 on AIXchange

A few months ago I took a class with IBMer Tommy Todd, who highlighted 10 rules for administrators that he had accumulated over the years. I’ll run down his list, and comment about each rule. Then I’d appreciate your thoughts.

Documentation: Make sure your documentation is up to date. Ask yourself how you’re documenting your systems. I really like to generate a sysplan from the HMC. It shows me a diagram of the physical hardware, where the adapters are assigned, how the LPARs are configured, etc.

Make backups: How are you backing up your machine? Do you backup both the operating system (rootvg) and data (datavg)? Are you periodically running mksysb commands, and have you tested them? Can you restore your machines? Have you tested your disaster/recovery plans? Did you back up your HMC and VIO servers?

Try it three times: Did you fat finger something? Do you have poor typing skills? Did you use the wrong flag? Do you need to go look at the man pages?

Don’t overlook the obvious: Many times the answer will be simple. Recently someone was trying to remove a directory and couldn’t do it. Fuser, lsof — nothing was showing that the directory was in use. The admin was stumped. It turned out he still had a mounted filesystem on that mountpoint. Once he unmounted the filesystem, he was good to go. How many obvious things have you overlooked?

Try it, it might work: I like to log into test machines and try different things; you never know what you’ll learn. For me the best learning is hands-on learning.

Never say never, always avoid always: There will be exceptions, there is usually more than one way to reach the same endpoint. In other words, don’t say “it always works that way,” or “it will never work like that.” The technology does change. Things that didn’t work before do now, and vice versa.

Make a copy before you edit anything: You might have a copy out on a TSM server or backed up somewhere, but what if that backup copy has an issue? It’s nice to have that safety net, but it’s smart to cp /etc/hosts /etc/hosts.orig before making file changes. If you find yourself making changes to /etc/inittab without using chitab, be sure you back it up first.

There’s usually another way to do it: Especially in UNIX, there’s more than one way to do something. The religious wars come up when people believe that theirs is the only way. I like to hear about how other people do things and learn from them. Many times they do things their way because they had issues in the past. We can all learn from others’ mistakes and benefit from their hard-earned knowledge.

Login as yourself, switch to root when it’s needed: With tools like sudo and role based access control (RBAC), do we really need to be logging in and moving around as root? One wrong keystroke can spell disaster when you have super-user authority.

Don’t say, “I’ll go back and fix that later”: There’s no time like the present to fix your issues. If you must “fix that later,” be sure document it somewhere so you have a reminder to actually come back and fix it later.

Never keep your resume on the system you’re supporting: What if the machine crashes and you don’t have it on a backup server? What will you do then?

Do you abide by all these rules? What are your own rules? Please register your thoughts in Comments.

Debating Support Scenarios

Edit: These are still interesting topics to consider.

Originally posted March 1, 2011 on AIXchange

In a recent post, I said this:

“Troubleshooting and administration are done via the network, from anywhere on the globe. This is great, especially for companies that utilize sun-support scenarios, where different teams in different countries and different time zones support machines during their normal business hours. Provided that good turnover information is being passed on from shift to shift, and calls and trouble tickets are accurately logged in a searchable database, this is a terrific support setup. At least it’s preferable, I think, to having IT staff members carry pagers and get called in the middle of the night to work on problems.”

However, this counter-argument is sometimes made to the follow-the-sun support scenario: If the administrators who built the machines are the same people who will get paged in the middle of the night in the event of a problem, then these admins will be extra careful when configuring their machines in the first place. Ultimately, if extra care is taken up front, there are fewer emergency calls.

Beyond that, some believe that the admin who built the server is the best person to fix it. We do get to know our machines over time. We know how they normally behave, we know where the logs are and when the cron jobs run, and we remember that quick little change we implemented a few days or weeks ago. An administrator who’s servicing an unfamiliar machine on a 3 a.m. call may need some time to get familiar with the applications it runs and its other unique characteristics.

All of this sounds logical, but I feel that the familiarity factor is a bit overrated. These days, many organizations take the time to standardize the look and feel of all their machines so that any team member can log into any machine and get right to work. But let me expound on what I said in the previous post: What I like about the follow-the-sun scenario is that people are actually working on the machines during their normal daylight hours. They’re not sleep-deprived; they’re fresh and alert and able to work on issues during the normal course of their day. And anything that isn’t resolved can be left for those coming in on the next shift.

Of course, in those cases, there’s a need to bring new shift members up to speed on what’s already been tried. But this isn’t all bad, either. Many times I’ve worked with IBM Support on issues that took multiple shifts to resolve. The departing shift members fill in the people coming in, and we continue to troubleshoot problems. Sometimes it helps to have a new set of eyes looking at a problem. A group of people will spend lots of time on an issue, then a new person will come in and immediately spot something that the rest of us overlooked. I’ve seen it happen.

Admittedly, globally dispersed support teams are a luxury available to only a few large companies. The rest of us generally work within individual IT departments.

So how do you deal with support issues? Do you prefer to have the on-call pager for a week at a time?  Do you prefer to have dedicated staff working second and third shifts? Is your after-hours call volume so high that you can only handle a few days of it before exhaustion creeps in? I knew a guy who hated his turn on the pager rotation so much that he would bribe his teammates — to the tune of hundreds of dollars — to take his week for him.

Hopefully you’re on good terms with your IT team and can adjust your schedule when need be. And hopefully your bosses recognize the perils of pager duty and allow you time off after an extended period of night calls. But how does your organization handle this? If you have some solutions — or some horror stories — e-mail me or make a post in Comments.

More on VIOS Installs

Edit: Some links no longer work. It seems this is easier to do now via the HMC.

Originally posted February 22, 2011 on AIXchange

Anthony English offered an intriguing comment about an issue I had during a VIOS installation.

His response: “I prefer to install the VIOS without using physical media at all using the HMC command line and the installios command. Requires an FTP server or NFS mount. I download the VIOS install media from Entitlement Software Support and then install the VIOS without NIM or physical access to a managed system. True bare metal install. This requires an HMC. Don’t think it can be done for an IVM managed system.”

I decided to try installios command from the HMC as Anthony suggested. I installed a new server. I defined both VIO servers, and I installed my first VIOS from install media as I always do. I defined my second VIOS with appropriate physical and virtual resources and put the VIOS install media into the HMC DVD drive.

I logged into the HMC GUI and selected HMC management. Then I went to open restricted shell terminal. On the command line I typed installios. (Obviously if need be I could have connected remotely via ssh to the HMC to run installios.)

The machine came back with:

“The following objects of type ‘managed system’ were found, please select one.”

I chose the managed system that had the VIO definition created on it.

“The following objects of type ‘virtual I/O server partition’ were found. Please select one.”

I chose the VIO server definition I wanted to load.

“The following objects of type ‘profile’ were found. Please select one.”

I chose the profile I was interested in.

“Enter the source of the installation images [/dev/cdrom]:”

I hit enter to take the default. Then, as prompted, I entered the client’s intended IP address, intended subnet mask, gateway and client speed.

It came back with:

“Please select an adapter you would like to use for this installation. (Warning, the client IP address must be reachable through this adapter!)”

I chose the appropriate adapter, the one that had been configured as my “open” network adapter in this case. I then watched as the adapter information was retrieved and the HMC automatically powered up my LPAR and began the installation.

It came back with what looked like an SMS screen from inside of the VIOS. Here I was able to choose the correct network adapter to use for the installation.

It then prompted me for a language and locale (en_US in my case) and gave me a screen containing a summary of the information I’d entered up to this point. I was given the option to proceed (Enter) or cancel (type Ctrl-C). It then showed me a license agreement screen. Once I accepted, it fired up nimol resources to start loading the other VIOS. It copied booti.chrp.mp.ent.Z, ispot.tar.Z, mksysb, etc., from the CD media to the /extra filesystem on the HMC.

And at this point, it failed. When I looked at the log, I realized why. When prompted for the network card’s speed, I just took the defaults: 100/Full. As I was using a virtual network adapter, these settings were not correct. So I went back through and made sure to select auto/auto for the virtual network adapter. Once I did that, it worked as expected.

I saw:

Connecting to vios2
Connected
Checking for power off
Power off complete
Power on vios2 to open firmware
Power on complete
Client ip address
Server ip address
Gateway ip address
Subnetmask ip address
Getting adapter location codes
Network booting install adapter
Bootp sent over network

It then brought up the /var/log/nimol.log and displayed on the screen what was happening during the install.

One thing I don’t like about this is not having the option to select the disk I want VIOS installed to. This isn’t a big deal on a fresh build, but it could be if you’re installing to a system that could potentially overwrite existing data. However, if you don’t plan on creating a NIM server in your environment, this HMC method certainly works fine.

As Anthony noted in his comment, there are multiple options for loading VIO servers. Using the HMC is certainly another worthy option, especially considering how easy it is.

Hey, Cut Me Some Slack!

Edit: This is still relevant information to consider.

Originally posted February 15, 2011 on AIXchange

I recently shared some of my gripes concerning modern data centers, as well as the importance of keeping actual people in mind when designing and constructing these buildings.

On a somewhat related note, another trend I’m seeing is those nice new, pristine racks become a nightmare once it’s time to service the equipment.

I’ve seen plenty of newly constructed raised floors. These facilities look marvelous. The cables are all color-coordinated and very neatly laid out. They’re cut to precise lengths (no slack) and tied down from the switch, through the cable management trays, into the cable management arms and into the server. The people that tour these places must go away thinking that any company that goes to these lengths to organize its IT equipment this organized must be on top of its entire operation.

The problem is, raised floors aren’t meant to be pretty. When the cabling is that precise, it can actually be a problem. For instance, without the slack, you can no longer slide the drawers forward in the rack to service the components.

IBM has quite a few hot swap parts in their computers. They have great rails that they use to mount their servers in the computer racks, and these rails allow you to easily slide a drawer in and out of the rack. (Aside: I find that the latest design for the HMC and the POWER7 rails is the easiest to install, while the rails for some storage switches which shall remain nameless are the worst. The last thing I want to do when installing switches is assemble rails.)

Anyway, here’s the big deal with cables cut to size: When your machine needs service, the only way you can slide it forward and fix it is to unplug everything. The machines have rails for a reason — so you can move them a bit to tinker with them when necessary. If you tie down your cables without leaving slack, you defeat the purpose of having these redundant hot swap parts for the machines.

If I need to unplug everything to service a machine, I have to be careful to avoid bumping into the other servers, and I need to hope that the cables are labeled properly so that they get plugged in correctly when the service is over. When you’re talking about multiple adapters, multiple connections and multiple serial and HMC cables, that’s not a trivial number of connections.

Another interesting thing I see is some people not using any rails. They just put shelves into their racks and stack two or three servers on a shelf. I don’t think this is any better. I still cannot slide the machines out, and worse, if I need to reach one of the bottom machines, I may need to power off multiple physical servers just to get at it.

Cleanliness has its place in the computer room, of course. You should make sure your racks and cabling are clean. But think about what you actually need and, literally, cut your service personnel some slack — make sure there’s enough slack in the cables that each drawer can be easily pulled out when service is needed.

Computer Rooms are Still People Rooms

Edit: We still need to plan for humans in data centers.

Originally posted February 8, 2011 on AIXchange

I travel to customer sites across the country — including customer-owned facilities, outsourcing facilities, disaster/recovery facilities and co-location facilities — and I see plenty of raised floors. But I’m always fascinated by how much these sites cater to machines rather than people.

These days many large data centers are designed as lights-out environments, where people don’t need to go onsite at all. Troubleshooting and administration are done via the network, from anywhere on the globe. This is great, especially for companies that utilize sun-support scenarios, where different teams in different countries and different time zones support machines during their normal business hours. Provided that good turnover information is being passed on from shift to shift, and calls and trouble tickets are accurately logged in a searchable database, this is a terrific support setup. At least it’s preferable, I think, to having IT staff members carry pagers and get called in the middle of the night to work on problems.

Then there are the colo data centers that many companies now use. Customers have one or more racks that sit next to other customers’ racks, and each is housed behind its own chain link fence. Although the prison aesthetic of these “caged” machines can take some getting used to, again, the concept has its place. Customers can get personalized attention from the staff that typically mans the facility 24-7, and the costs to rent space can be quite reasonable.

However, the reality is that computers still need to be installed and decommissioned on a regular basis. Even if these new facilities are designed to have only a few people working onsite, IT folk are constantly coming in and out. Recently we joked about one large site reminding us of the place in the original Raiders of the Lost Ark where they ended up storing the Ark of the Covenant, just a huge cavernous warehouse full of pallets as far as the eye could see.

The simple, frustratingly overlooked truth is that people need to get stuff done, and they need room to do it. I can’t count the number of times I’ve had to stack cardboard boxes to create a makeshift desk on a raised floor or temporarily cover the perforated floor tiles that are needed to cool the raised floor. The computers may need the AC to do their job, but I can’t do mine if I’m a human popsicle.

It always amazes me to see these new facilities and their state-of-the-art security apparatus: cameras, biometric man traps, retina scanners, voice recognition systems and sensors that weigh people coming and going (to ensure that you’re not walking out with some valued piece of equipment). And yet, many of these same places are constructed without enough conference rooms, lounge areas, work spaces and even bathrooms.

It goes without saying that if these facilities don’t have enough room for people on a daily basis, they’re not equipped to handle a disaster scenario, either. I’ve heard facilities managers say that in the event of a disaster that would bring an influx of IT folks to their site, they’d just get some portable toilets. (I guess that would work as long as it isn’t one of those disasters where it’s really hot or really cold outside; otherwise those Porta-Potty trips could get a little uncomfortable.)

The point is, if you’re in charge of planning and managing a data center, remember us humans. Unlike the computers, we need places to eat. We may need places to sleep. We absolutely need adequate restrooms. I realize that these facilities are built for computers, but as long as computers need people to work on them, these sites must be designed with people in mind.

IBM’s New Software Compatibility Tool

Edit: The link no longer works.

Originally posted February 1, 2011 on AIXchange

IBM has come out with a new software compatibility website.

I learned of this site from a mailing list, which offers this description:

“Clarity is the new tool based on Clearing House data designed to allow users to easily generate custom reports about compatible IBM software combinations…. Using this tool customers may create reports about a product’s compatibility with operating systems, prerequisite software or virtualization environments. They can also generate EOS reports for [IBM] products.”

When you go to the site you’ll find lists of available reports, including:

* Operating systems for a specific product.
* Prerequisites of a specific product.
* Virtualization environments supporting a product.

I was interested in “Products that use a specific operating system,” so I selected “Products supported by AIX 6.1 POWER System.” (Options ranged from the current AIX 7.1 and back as far as AIX 4.3.)

The tool produced a report displaying a list of products that were supported, under headings with names like:

* Information Management (DB2, InfoSphere, Informix)
* Lotus (Domino, Mashups, Quickr)
* Rational (Asset Manager, ClearCase, COBOL)
* Tivoli (Access Manager, Configuration Manager)
* WebSphere (Application Server, Business Monitor)

I didn’t count the number of products listed in the report, but it was several pages worth of information. Besides the software name, it also displayed the versions that were supported. (To my surprise, some of these products — including Tivoli Access Manager, DB2 and Informix — support AIX 4.3.) This is a great way to quickly determine which levels of software are supported by a particular operating system version.

Using checkmarks or greyed-out checkmarks as indicators, this report also broke down each product in this manner:

* “This operating system is supported by all parts of the product.”
* “This operating system is supported by some of the parts of the product.”

Also available on the software compatibility site is a software end-of-service tool. For fun look at the end of service dates on VIOS: You’ll see the general availability date along with the time (e.g., 2012 third quarter) products are expected to reach end of service. And by adjusting the tool’s start dates, you can see when the product came on the market and when it will go end of service in a graphical format.

I’m always interested in new ways to look at data, so if you run your own sample reports, let me know what you find.

An Unusual VIOS Install

Edit: The link to the NIM document still works. Getting the mksysb file is easier now.

Originally posted January 25, 2011 on AIXchange

I had an interesting experience with a VIOS installation recently. I’m curious if anyone else has seen something similar. Maybe I just had a bad day.

When I do a NIM install of a VIOS on a new system, I typically refer to this documentation, which outlines a nice way to get the mksysb file off of the installation DVD to use with the NIM server.

The document states:

“Copy the VIOS mksysb image from the CD to your NIM master:

“Mount the VIOS base CD and copy the VIOS mksysb image from the CD (in /usr/sys/inst.images) to your NIM master:

# mount -o ro -v cdrfs /dev/cd0 /mnt
# cd /mnt/usr/sys/inst.images
# cp mksysb_image /export/mksysb/mksysb_image

“If using VIOS 1.5 or higher media, the mksysb file may be split into two parts. To combine these two parts and copy them to hdisk, run the following:

# cat /mnt/usr/sys/inst.images/mksysb_image
/mnt/usr/sys/inst.images/mksysb_image2 > /dir/filename

” *** You can substitute any path you would like to save the combined mksysb image, for ‘/export/mksysb_image’.”

Why would I use NIM to install VIOS if I already have the installation media in hand?

With some IBM Power Systems models, you can get an optional split backplane. This allows you to set up multiple partitions with their own disk controller and disks. With some of these models, however, the DVD drive can only be accessed only by one of the disk controllers. So when the other partition boots up, it cannot see the DVD. This makes it difficult to load that second VIOS on that second set of internal disks, since they cannot see the VIO install media.

I’ve seen people load VIOS to one internal disk and then, when that installation finished, pull the disk out of the first controller and put it into the second controller. Then they reload VIOS onto a disk that they put into the original disk controller. This works, but I don’t think it’s a very clean method; you end up with defined devices on your VIOS that are no longer seen by the operating system.

My preference is to load VIOS and then immediately create a client partition and make it my NIM master. Once I have this NIM master, I define the VIOS mksysb image and use it to load any other VIO servers in the environment — just what’s layed out in the aforementioned document. With the NIM  server there, loading the rest of the client partitions is trivial — at least it usually is.

In this case, however, the issue was that the VIO servers that I loaded from left me with a server at version 2.2.0.0 instead of 2.2.0.10 like I was expecting. Maybe it was bad media, or maybe I just fat-fingered something. At least the solution was simple enough. I just took a mksysb from my clean, newly installed VIOS with the command:

backupios –nomedialib –mksysb –file /mksysb/vios.mksysb

This command excluded the .iso images I’d copied into the /var/vio/VMLibrary (which is my virtual media repository).

I copied that mksysb image over to my NIM server and created my spot. The other VIOS installed as expected.

So has anyone else seen this issue when copying the mksysb file from the install media?

Technical University: Looking Back and Ahead

Edit: Some links no longer work. The links to pictures do, it was fun looking for people from all those years ago.

Originally posted January 18, 2011 on AIXchange

The 2011 schedule for the IBM Systems Technical Conference Series is now out. These worldwide educational events include the IBM Power Systems Technical University, which is set for Oct. 10-14 in Miami, Fla.

I’ve written numerous times about the valuable education available through the IBM Power Systems Technical University events. I figure it’s worth mentioning now because companies typically finalize their annual budgets early in the calendar year — so really, now is the time to make the case to your management about how this educational conference will help you in your job.

If you can’t make time to attend the conference in October, you might also consider the three-day Power Systems Technical Symposium that’s set for April 27-29 in Orlando, Fla.

From the website:

“The IBM Power Systems Technical Symposium is a shorter version of the IBM Power Systems Technical University offered in October. It will focus exclusively on training related to the recent POWER announcements and the potential impact on your data center.”

These conferences are a great place to hear about the technical details behind Power systems, without any marketing fluff. They are technical conferences for technical people. They’re taught by people with real-world experience. The attendees you talk to and network with face the same kinds of challenges that you do. It’s a great place to meet with others in your field.

Besides the technical information, you can usually count on learning about other interesting topics while you’re there.

When I attended the 2010 conference in Las Vegas last October, Steve Squyers was the keynote speaker. Steve is the scientific principal investigator for the Mars Exploration Rover mission, and based on the information he presented that night, I found that space exploration is even more fascinating than I had realized.

I picked up a copy of his book, “Roving Mars,” so that I could learn more. I enjoyed reading what he wrote. I learned a lot from the book, but I also realized that he shared insights with us in his talk that he didn’t cover in his book. I only wish I’d made an audio recording so I could listen to it again. (If anyone has an MP3 of the lecture, please send it my way.)

If you attended the conference, look through these pictures and see if you can find a photo of yourself. I was able to find a shot of the back of my head in one of them.

Many of the speakers flew straight from the event in Las Vegas to the Technical University that was held the following week in Lyon, France, in order to present there as well.

Here’s a set of pictures from France.

I got some behind-the-scenes information from IBM’s Marlin Maddy, who’s in charge of the IBM Technical Conference Series. He told me there were approximately 1,550 attendees in 2010, nearly double the attendance at the 2009 event. There were well over 400 technical sessions, many of which were standing room only.

These venues are selected close to 18 months ahead of time and they had selected the rooms that the speakers would be presenting in for the different topics that were going to be covered in the conference well before actually getting the final attendance numbers, which was why some of the rooms felt so crowded. They did acquire a couple of additional larger rooms to ease the congestion, but it’s difficult to plan for a crowd when you are not sure until the event starts just how many people will be attending.  Since attendees only sign up for the conference itself as opposed to specific presentations, it can be hard to plan for the size of the room that will be needed.

Event organizers collect attendee feedback from each of the conference sessions so that they can get our thoughts and criticisms while they are still fresh in our mind, and according to Marlin, overall customer satisfaction with the event was extremely high in 2010.

A large burst of people signed up at the last minute for the 2010 conference, which is unusual as most years they will have a steady stream of people that sign up for conferences They also offered two pre-conference certification classes that had 50 attendees. This was something new, and event organizers felt that it enhanced the value of the conference.

Prizes given away at the 2010 conference included five iPads, five Kindles and five 1TB external drives. Expect more such giveaways this year.

As Marlin put it, “Putting together an event like this takes a great deal of planning and a solid team working together. There are always minor surprises and as a team we just need to adjust to make it all transparent to the customer.”

I thought they did a great job with the event. Let me know what you think in the comments, or send me an e-mail.

Remote HMC Upgrades

Edit: Some links no longer work. I still love remote upgrades.

Originally posted January 11, 2011 on AIXchange

Anthony English’s recent blog entry about remotely upgrading the HMC struck a chord with me. How many times have you found yourself on a cold raised floor to upgrade a machine? Wouldn’t you rather do that work from a warm office (or, in Anthony’s case, the deck of a ship in Sydney Harbor)?

I had an HMC running V7.3.3 that needed to be upgraded to V7.7.2 in order to support a new POWER7 720 machine. What were my options? There’s upgrading the way we’ve done it for years: I could download the .iso images and burn them to a CD, and then put the CDs into the DVD drive.

Or, I could order the CDs from IBM and not have to download or burn anything.

Or, I could avoid burning physical media or visiting a cold computer room by attempting the upgrade over the network.

Since this was my first time using this method, I started here and followed the instructions:

Download options for HMC network install images. You have three options for acquiring HMC network install image files:

  • Download all files simultaneously via link to Download Director.
  • Download the files individually.
  • Download all files via an anonymous FTP process at a command line.

Note that in all cases you must download (or copy) the image files to a server that accepts FTP requests. You cannot download these files directly to the HMC.

I went ahead and downloaded the files to an FTP server. Then I followed this information:

“You can upgrade the HMC remotely by using network install images rather than using a Recovery DVD. The HMC commands involved include saveupgdata, getupgfiles, chhmc and hmcshutdown. The network install images are linked off of IBM FixCentral.”

I downloaded the files:

Enter these commands, type.

# ftp
ftp> open ftp.software.ibm.com
  Name: anonymous
  Password: ftp
ftp> cd software/server/hmc/network/v7720
ftp> prompt   (turns off interactive mode)
ftp> bin   (ensures binary transfer mode)
ftp> mget *   (downloads all six files simultaneously)
ftp> bye   (exits FTP after all files have been downloaded)”

These files had been successfully downloaded to my FTP server:

# ls -la
total 5068440
drwxr-xr-x   2 root     system          256 Dec 07 13:25 .
drwxr-xr-x   3 root     system          256 Dec 07 10:07 ..
-rw-r–r–   1 root     system      1531846 Dec 07 10:08 bzImage
-rw-r–r–   1 root     system    672645120 Dec 07 11:03 disk1.img
-rw-r–r–   1 root     system   1133975552 Dec 07 12:25 disk2.img
-rw-r–r–   1 root     system    753831936 Dec 07 13:25 disk3.img
-rw-r–r–   1 root     system           78 Dec 07 13:25 hmcnetworkfiles.sum
-rw-r–r–   1 root     system     33049856 Dec 07 13:28 initrd.gz

I logged into my HMC using ssh as hscroot, and the first command ran fine:

>saveupgdata -r disk

However, I ran into a snag with my second command:

> getupgfiles -h ftpserver -u root –passwd passw0rd -d /fixes/HMCV7R7.2.0
Cannot contact server to obtain files.

A Web search netted me this forum and this question, but, alas, no answer.

So I tried rebooting the HMC. That didn’t help.

I verified I could connect to the FTP server from other machines, but I couldn’t figure out why my HMC wouldn’t connect to the FTP server. Instead of spending more time troubleshooting, I decided to download the files directly from IBM onto the HMC. I ran this command:

>getupgfiles -h ftp.software.ibm.com -u anonymous –passwd ftp -d /software/server/hmc/network/v7720

That worked great, although it took longer to download the files than it would have if I were moving files on the local network. To monitor my progress, I ran the script that’s suggested in this post:

while true ; do
date
ls -la /hmcdump
sleep 60
done

Once the download completes, the /hmcdump filesystem gets unmounted. That tells you that the download is finished.

“The getupgfiles operation will mount a filesystem called /hmcdump and copy the install files into the directory then unmount the filesystem. The following commands will set the HMC to boot from the network install images and allow the upgrade to proceed.”

Once the download completed, I ran:

>chhmc -c altdiskboot -s enable –mode upgrade

>hmcshutdown -r -t now

Then I waited a while until the HMC came back from the upgrade. This was actually the one mildly disconcerting bit of the whole operation; there’s no feedback as to whether anything is happening. How do I know if it’s actually working if I don’t see something telling me it’s working? How do I know that the HMC didn’t have a problem on reboot and it’s not just sitting there, frozen, waiting for someone to physically touch the machine?

In my case it took about 20 minutes for the upgrade to complete, and beyond that I had to wait until I could actually login to the HMC GUI (although I could ping the HMC and ssh into the command line). I kept getting an error: “Console not Ready. You cannot log on at this time. The console is still initializing and is not yet ready for users to login. Allow the console to finish initializing and then try to login again.”

After the upgrade, I made sure to load the mandatory eFix MH01235, along with MH01243 and MH01244. This brought the HMC to the current level at the time of writing.

I have to agree with Anthony: Remotely upgrading an HMC is a very painless way to go, though I still wish there was a way to get status information during the actual upgrade process. Just don’t ask me how IBM could provide that information from a machine that’s hundreds of miles away and not on the network while the upgrade is taking place.

More From the Tweet Life

Edit: Some links no longer work.

Originally posted January 4, 2011 on AIXchange

If you’re not following along on Twitter, you should be. Most recently, Twitter brought me this update about VIOS Next Generation:

“VIOS Next Generation or ‘NextGen VIOS’ was released on Dec. 9 as VIOS 2.2 SP01. I recently installed it on my test cluster and put it through its paces to see what was included in the nearly 900MB download.

“First of all, pay close attention to the README prior to installing this code. There are more than a few caveats that are important to pay attention to. Some notable one are: 

  • The reject option of updateios is not supported in this release. Once you install this service pack, you are committed.
  • The new shared storage pool functionality requires 4 GB of RAM in the VIO server.
  • There is a maximum of one (1) VIOS node per shared storage cluster in this release.
  • VIO servers that host shared storage pools may not participate in Live Partition Mobility operations or Partition Suspend/Resume Operations.
  • VIO clients that make use of storage from shared storage pools are not supported for Live Partition Mobility.”

Here’s more I’ve recently gleaned from tracking various AIX enthusiasts on Twitter:

In his blog, Anthony English notes that the depreciation of the the bootinfo –s command means we should instead use the getconf command to track disk size. Anthony’s post points to the following techdoc:

“The command /usr/sbin/bootinfo has traditionally been used to find out information regarding system boot devices, kernel versions, and disk sizes. This command has been depricated in favor of the command /usr/bin/getconf. The bootinfo man page has been removed, and the command is only used in AIX by the booting and software installation utilities. It should not be used in customer-created shell scripts or run by hand.

“The getconf command will report much of the same information that bootinfo will:

“What was the device the system was last booted from?
$ getconf BOOT_DEVICE
hdisk0

“What size is a particular disk in the system?
$ getconf DISK_SIZE /dev/hdisk0
10240

“What partition size is being used on a disk in the system?
$ getconf DISK_PARTITION /dev/hdisk0
16

“Is the machine capable of running a 64-bit kernel?
$ getconf HARDWARE_BITMODE
64

“Is the system currently running a 64-bit or 32-bit kernel?
$ getconf KERNEL_BITMODE
64

“How much real memory does the system have?
$ getconf REAL_MEMORY
524288.”

Here’s an interesting story (link not active) about Apple dumping its Xserve rack-mounted servers, and the conjecture that maybe running Snow Leopard Server on an IBM Power 710, 720 or 750- or on a PS700, PS701 or PS702 blade–might be a good option in the future. Of course getting the OS running on the hardware is a hurdle, but you have to admit that the idea of running yet another operating system on Power Systems servers is intriguing.

Sure, a lot of Twitter is devoted to people sharing what they had for breakfast or where they’re going for the weekend. But if you look around, Twitter can be a valuable resource for the AIX pro (e.g., go to Twitter.com and try searching for #aix#ibmtechu or #ibmwatson).

So what hashtags or users are you interested in?

Watson Follows in Deep Blue’s Steps

Edit: I cannot believe that it has been this long ago already. Some links no longer work.

Originally posted December 21, 2010 on AIXchange

It wasn’t that long ago when chess master Garry Kasparov took on–and was defeated by–IBM’s Deep Blue supercomputer.

Nearly 14 years after that match-up, another man-vs.-machine competition is being staged, and this one will be hosted on the long-running American television game show “Jeopardy!” In a series of shows that will air Feb. 14-16, two of Jeopardy!’s most successful players will test their knowledge against a cluster of IBM Power 750 machines running IBM DeepQA software, dubbed “Watson.”

A group of us recently met with IBM’s marketing team to get more information about Watson and to discuss the technology behind it. They were quick to praise the efforts of the scientists at IBM Research, under the direction of Dave Ferrucci, as being the brains behind Watson. 

They wouldn’t confirm the number of machines that make up the cluster (saying only it was between one and 100 servers), but they told us that Watson runs IBM DeepQA software on Novell SUSE Linux Enterprise Server 11 that has been compiled for Power. The Power 750 servers, which have been configured with 32 cores and either 256 GB or 128 GB memory each, are connected together over a 10 Gb Ethernet network. Watson connects with 2 TB of clustered storage for a total of 4 TB.

I’m interested in solid-state disks, so I had to ask if Watson used SSD to speed up access to the data. I was told that it uses SAS drives and that disk performance isn’t an issue since, once booted, the entire application and data resides in main memory. Watson receives questions in text form at the same time that human contestants have the questions read to them. Watson physically presses the buzzer and uses a voice synthesizer to “speak” the answers. The machine isn’t connected to the Internet; it relies only on its memory for answers.

As you might imagine, IBM has built a significant Web presence to promote Watson and the DeepQA Project (see this introduction, this slideshow, these press releases and this Twitter feed). There’s also this background about Watson’s road to “Jeopardy!”:

“An IBM executive had proposed that Watson compete on ‘Jeopardy!’, but the suggestion was initially dismissed. While search engines such as Microsoft’s Bing and Google are able to provide search results based on search terms provided, no computer program had been able to answer anything other than the most straightforward of questions, such as ‘What is the capital of Russia?’ In competitions run by the United States government, Watson’s predecessors were able to answer no more than 70 percent of questions correctly and often took several minutes to come up with an answer. To compete successfully on ‘Jeopardy!’, Watson would need to come up with answers in no more than a few seconds, and the problems posed by the challenge of competing on the game show were initially deemed to be impossible to develop.

“In initial tests run in 2006 by David Ferrucci, the senior manager of IBM’s Semantic Analysis and Integration department, Watson was given 500 clues from past ‘Jeopardy!’ programs. While the top real-life competitors buzzed in half the time and answered as much as 95 percent of questions correctly, Watson’s first pass could only get about 15 percent right. In 2007, the IBM team was given three to five years and a staff of 15 people to develop a solution to the problems posed. …

 “By 2008, the developers had advanced to the point where Watson could compete with low-level ‘Jeopardy!’ champions. That year, IBM contacted ‘Jeopardy!’ executive producer Harry Friedman about the possibility of having Watson compete as a contestant on the show. The show’s producers readily agreed. …”

 In addition, another American TV show, the acclaimed science series “NOVA,” will feature Watson in a Feb. 9 broadcast. The segment is entitled “The Smartest Machine on Earth.”

Finally, there’s this video, from which I’ll quote:

“We were mainly interested in using ‘Jeopardy!’ as a playing field upon which we could do some science. We wanted the ability to use questions that had not been designed for a computer to answer. ‘Jeopardy!’ really represents natural language. You have to understand the English language and all the nuances and all the regionalisms, slang, and the shorthand to play the game, to get the clues. It’s not just a piece of information.

“In 2009 the producers of ‘Jeopardy!’ watched Watson compete for the first time. Their concern was how do we keep it from becoming a stunt or a gimmick. This was different, this was the notion of knowledge acquired by a computer against knowledge acquired and displayed by the best Jeopardy! players. This could be something important, and we want to be a part of it. Many people are going to watch the ‘Jeopardy!’ show and look at Watson and how it competes in ‘Jeopardy!’ and the curiosity of the computer. They will focus on man versus machine, but the more interesting general challenge is, we are trying to produce a deep question and answering machine which will change the way people interact with computers and machines. We are going to revolutionize many many fields.”

 What do you think? Is this a gimmick? A ploy? Does a cluster of 750s beating humans at “Jeopardy!” make you more likely to purchase a Power Systems server? Does this mean we’ll soon be able to interact with computers the way they did on Star Trek? Hopefully there will still be a way to connect my Model M keyboard to these computers of the future.

#ibmwatson

Virtualization for the Right Reasons

Edit: Some more good discussion. The link does not seem to work.

Originally posted December 14, 2010 on AIXchange

In a recent AIXchange blog entry, I outlined the reasons why some customers have yet to get on board with virtualization. Along those lines comes AIX blogger Waldemar Mark Duszyk, who cautions against virtualizing just for the sake of virtualization.

Here’s Duszyk:

“I do believe that there is room for a VIOS, but not in each and every data center and especially not because the admin from across the street just put one on line so we have to have it too! If you were an owner of a big and heavy track capable of heavy loads in access of 100 tons, would you use it to carry a pillow across your state? You could have used a mail service instead, right? Or if you had 10,000 of pillows to transport, you will make sure they are all compressed to fit as many as possible? The point I am making here is this you would think how to save.”

I remember watching a television show where it was argued that a diesel engine powered school bus that gets six miles to the gallon can sometimes be preferable to an economy car getting 40 miles to the gallon. You may be getting great mileage taking the children to and from school, but it takes a lot of small cars to transport as many children as the big bus can move in one trip. As Mark says, if you have to transport goods, look for the most economical way to do it. The same mindset applies to computing. Don’t virtualize just because everyone else is, do it to save on floor space and power and cooling costs, and to consolidate workloads.

 Again from Mark:

“Do I think that virtualization is a bad idea? Nope again, except that it is still a very expensive proposition. First, before even thinking about virtualization the surrounding IT environment must be comfortable with SAN boot, because without it will be very difficult if not impossible to fully utilize the processing capacity of hardware one wants to virtualize. Why? How much will it cost you to buy just one CPU (including its activation costs) + RAM + physical I/O adapters for your planned VIO environment? Now, multiply this number by two if you want to have two VIO servers in the new managed system? The point to remember is this: For VIO to save you money you have to prove that over time you will at least be able to recover the costs associated with VIO implementation. It is already obvious that if you decided to follow the VIO crowd, in order to recover the costs of virtualization, you have to pack into your managed system as many partitions as possible. Welcome to the world of SAN boot! If your partitions cannot boot from SAN you have to provide them with local disks!”

I cannot agree more. We don’t want to use physical disks and physical adapters when we virtualize. We want to boot from SAN and run many LPARs on our frames, and then we can move workloads around by running Live Partition Mobility between our frames. 

Mark also touches on workload partitions (WPARs) as well as Nigel Griffiths’ idea about running workloads and applications inside of WPARs rather than the global AIX instance:

“Use GLOBAL [instances] solely for systems management. Don’t run workloads there, and don’t create any more users than are required. Create WPARs for each workload, and create the necessary users there. Since WPARs are inherently resource efficient, you don’t give up very much by dedicating GLOBAL [instances] to management only. The overhead is certainly much less than creating a separate LPAR for each workload.”

As I’ve said: Not everyone is virtualizing, and not everyone necessarily wants to virtualize. So what are your reasons for holding back?

Blinky: The Mouse that Roared

Edit: I still love my Model M. I still get freebies at conferences, but I no longer have small children at home that love to see what I brought them.

Originally posted December 7, 2010 on AIXchange

I’m a stickler when it comes to my computer keyboard. If I’m going to be stationed in any one place for an extended period of time, my keyboard is coming with me.

In the past I’ve waxed poetic about my Model M keyboard. With a PS/2 to USB converter, I’ve been able to continue using the same keyboard for so many years that I’ve lost track.

However, I’m far less passionate about my computer mice. I seem to cycle through different iterations without much fanfare or fuss. I certainly don’t miss the old style mouse with the ball inside; I was perfectly happy to join the ranks of the optical mouse users.

Recently, I got a free optical mouse. Well, it turned out it was only “almost” free, but I’m getting ahead of myself. I picked it up at a conference. Anyone who travels to these technical events knows all about the nice freebies that vendors hand out. Over the years I’ve taken home flying disks, foam footballs, Rubik’s Cubes, flashlights, pens and flash drives, along with plenty of other knickknacks I’ve long since lost or given away.

Anyway, this optical mouse was actually nice. Being just the right size for my tastes, I determined it would make a fine addition to my computer bag. I’m always swapping out the mouse that I take with me when I travel. Lately I’ve divided my time between a corded optical mouse and a wireless optical mouse, but since this freebie mouse came with a nice retractable USB cable, I thought I’d try it on my next trip.

So I plug it in, and I’m pleased. Really, it exceeded expectations. The sensitivity was great, it seemed very responsive and, like I said, the size was just right.

But that blinking.

The mouse blinked, and it wouldn’t stop. It even changed colors as it blinked.

Someone must have thought that a computer mouse that could cycle from blue to red and alternate between solid and blinking was a neat idea–and it was, for about three seconds. Then it became annoying, especially in any room with low light. If there was a simple way to stop the blinking, I couldn’t figure it out. But that blinking had to be stopped.

I figured I could just open up the mouse and … do something. I wasn’t sure what, though. So I asked around. I was told that applying black nail polish to the LED would keep the annoying light from escaping. Someone else told me that a piece of black electrical tape would do the trick.

Finally, someone told me to just get some wire cutters and remove the LED entirely. That seemed more my style.

Opening the mouse was fairly simple, especially since I wasn’t overly concerned with breaking my little freebie. So I went to work with the wire cutters and removed the LED.

It was at this point when I learned something, something I probably should have known beforehand. That little red light that you see on the bottom of your optical mouse? It comes from an LED.

“Able to work on almost any surface, the mouse has a small, red light-emitting diode (LED) that bounces light off that surface onto a complimentary metal-oxide semiconductor (CMOS) sensor. The CMOS sensor sends each image to a digital signal processor (DSP) for analysis. The DSP, operating at 18 MIPS (million instructions per second), is able to detect patterns in the images and see how those patterns have moved since the previous image. Based on the change in patterns over a sequence of images, the DSP determines how far the mouse has moved and sends the corresponding coordinates to the computer. The computer moves the cursor on the screen based on the coordinates received from the mouse. This happens hundreds of times each second, making the cursor appear to move very smoothly.”

Turns out my little freebie had two LEDs: One made all those annoying lights blink; the other performed the critical task of making the optical mouse itself work. In my haste to solve the problem, I’d removed both LEDs. I’d killed my mouse.

Needless to say, I realized my mistake the moment I reassembled it. So I was off to the Radio Shack to drop $1.50 on a new LED that I could solder onto the circuit board. It works fine now, and that’s how I ended up with my free optical mouse that I only paid a little bit for.

I spend my work week expertly configuring, installing and supporting computers that can be worth millions, and yet I can’t be trusted with a device that some vendor paid maybe a couple of bucks to put their logo on. Go figure.

Those Who Do Without Virtualization

Edit: Most everyone virtualizes these days, although I still know of vendors that prefer you run one big LPAR per frame.

Originally posted November 30, 2010 on AIXchange

Working on virtualized systems as much as I do, and talking to people about virtualization as often as I do, I tend to forget a couple things:

  1. Not all IBM Power Systems users have virtualized systems.
  2. Not all of them use VIOS even while they benefit from other aspects of virtualizing their machines.

It isn’t necessarily that these shops are limited by the constraints of older hardware and operating systems. I know of customers with POWER6 and POWER7 hardware that haven’t yet virtualized their systems. Maybe they lack the time or the resources to virtualize more fully, or maybe they simply lack the skills that come only with hands-on experience.

Customers who aren’t hands-on generally don’t realize that virtualization covers a wide range of functionality. Using workload partitions (WPAR) counts as virtualization. Micropartitioning CPU, where we assign fractions of a CPU to an LPAR and then set up processing entitlements and cap or uncap partitions based on our LPAR’s requirements? That’s virtualization. We use VIOS to virtualize disk, the network or both. NPIV allows us to virtualize our fibre adapters and have our clients recognize the LUNs we provision–and it saves us the effort of having to map them to the VIOS and remap them to the VIOS client LPARs. We use the built-in LHEA to virtualize the network. We could create an LPAR with some dedicated physical adapters and some virtual adapters. We could use active memory sharing and active memory expansion to better utilize our systems’ memory. Power Systems offers many choices and scenarios where it can be said that we’re using virtualized machines.

I know some administrators who’ve been unable to convince their management or application vendors of virtualization’s benefits. I know of some IBM i users who are reluctant to get on board with VIOS (though plenty of AIX shops still don’t virtualize, either). Sometimes it’s the vendor that lacks the time, resources or skills for virtualization. For instance, I’ve seen multiple customer sites where tons of I/O drawers are used; the vendor won’t officially support VIOS because the vendor hasn’t tested it, and these customers don’t want to run an unsupported configuration.

I talked to an admin who has experience with configuring logical partitions, setting up dedicated CPUs and dedicated I/O slots in his environment, but he continues to use a dynamic logical partition (DLPAR) operation to move a physical DVD between his different LPARs. It’s the way he’s always done it. He figures that since his shop doesn’t use virtualization is no big deal, since he has no experience with VIOS and virtual optical media anyway. “You can’t miss what you’ve never had,” is how he put it.

Others will tell me that they the see the writing on the wall. They insist they’ll virtualize, some day.

Are there roadblocks keeping you from virtualizing? Are there complications that prevent you from moving to a fully virtualized environment? I’d like to hear about the challenges you face. Please e-mail me or post in Comments.

IBM’s Virtualization Alternative

Edit: Still some pretty good arguments in favor of PowerVM. Awareness is still an issue.

Originally posted November 23, 2010 on AIXchange

Did you know that when IBM publishes server benchmarks, these workloads always run on virtualized IBM Power Systems machines? The virtualization is built into the hardware and firmware; there is no concept of a non-virtualized, standalone Power machine anymore. Contrast that with offerings from other virtualization solutions running on other platforms that can degrade performance 30 percent just by using their virtualization software solutions.

The previous statements come from a recent IBM presentation. As you likely know, IBM has been at this virtualization game for a generation or so. The company developed the hypervisor that would become VM on the mainframe in 1967. In 1973, IBM was doing physical partitioning.

Here’s some more material I gleaned from this training session:

  • IBM Power Systems servers provide up to twice the performance of other virtualization solutions on other platforms. These numbers can be even greater depending on the level of virtualization you employ.
  • IBM Power Systems servers are scalable, both in terms of being capable of accommodating workload spikes and in allowing an enterprise to grow its business.
  • PowerVM technology gives you enterprise quality of service virtualization capabilities with higher performance, more scalability and enterprise security. You can have higher utilization of your machines–around 90 percent–which enables you to consolidate your workloads onto fewer physical servers. You can dynamically move from as little as 1/10th of a core to as many as 256 cores in your LPAR, using all of the resources of your server. You can make dynamic changes to resources like CPU, memory and I/O, and you can add and remove dedicated I/O adapters and storage devices, all without a reboot.
  • Live Partition Mobility allows you to easily move running workloads to other frames in your server environment. You can also use LPM to move workloads between POWER6 and POWER7 machines in your environment.
  • Using IBM Systems Director, VMs can be moved automatically to any physical machine in your environment, based on the criteria that you set up. If you have a busy workload on one machine, and more capacity available on another machine, Director can move that workload, without interruption and without human intervention, to the less busy machine.
  • IBM Power Systems servers are secure by design. No common vulnerability exposures (CVEs) have been reported against PowerVM virtualization by US CERT or by MITRE Corp. In contrast, more than 200 VMware-related vulnerabilities are listed in the U.S. government National Vulnerability Database (NVD). VMware is a third-party software add-on, while PowerVM is integrated into the server firmware. No PowerVM vulnerabilities are currently listed in the NVD. Compare PowerVM virtualization with VMware for instance.
  • POWER7 servers offer LPM, live application mobility, partition availability priority, first failure data capture, processor instruction retry, alternate processor recovery, dynamic processor deallocation, dynamic processor sparing, extended error handling and I/O adapter isolation.

The presentation featured a detailed comparison of PowerVM and VMware, making IBM’s case that PowerVM virtualization runs workloads more efficiently than VMware, with far superior resource utilization, price/performance, resilience and availability. PowerVM technology outperforms VMware by up to 65 percent on Power 750, running the same Linux workloads and virtualized resources. See this comparison of PowerVM and VMware virtualization performance for more information. In addition, PowerVM on a Power 750 will scale better than VMware with linear scaling that maximizes resource utilization with 4X more virtual CPUs. And compared to a large-tier POWER7 model such as the Power 795, you can have 32X more virtual CPUs than VMware.

Assuming I have my facts right (I borrowed them from the presentation, please correct me in Comments if you disagree with the information) VMware ESX 3.5 allows for four virtual CPUs per VM, 64 GB per VM, 192 VMs on a server, 32 CPU threads on a server and 256 GB on a server. ESX 4.0 allows for eight virtual CPUs per VM, 255 GB per VM, 320 VMs on a server, 64 threads on a server, and 1024GB on a server. PowerVM allows for 256 virtual CPUs per VM, 8192 GB memory per VM, 1000 VMs on a server, 1024 threads per server, and 8192 GB on a server.

With PowerVM technology, you can utilize all CPU cores and all physical memory. Which would you prefer for your enterprise workloads?

Let’s look at flexibility once your VM is running. PowerVM virtualization allows you to make dynamic changes to virtual CPUs, memory and I/O devices, and have integrated LPAR and WPAR support with PowerVM.

None of this is possible with ESX 3.5. With ESX 4.0, you can add but not remove virtual CPU, add but not remove memory. You can only make some dynamic I/O device changes, and limited direct access to I/O devices.

The same arguments can be made with OracleVM Server for SPARC or HP Integrity VM 4.0. Oracle/Sun allows for Sun Logical Domains on UltraSPARC T1/T2 servers only–they allow for 32 partitions on a T1 or 128 on a T2. You can add or remove CPU, but only add virtual I/O. You can perform warm migrations with constraints. There’s no support for dedicated I/O. With HP you can have 8 CPUs max, 64 GB ram. To do dynamic logical partitioning you need to reboot your LPAR. There’s no support for dedicated I/O. There’s no dynamic CPU sharing.

I still find that some shops simply aren’t aware of all that IBM Power Systems servers and PowerVM technology have to offer, and all that they can do. These customers either aren’t yet virtualizing their systems, or they don’t see the limitations they’re under using other vendors’ solutions. Hopefully comparisons like these will cause them to take a close look at IBM’s alternative.

The Case for High Availability

Edit: Shawn still gives awesome presentations.

Originally posted November 15, 2010 on AIXchange

Recently I attended a session on the IBM PowerHA high-availability solutions. The point was made that, given the reliability and uptime of IBM Power servers, many customers wonder why they even need an HA solution.

IBM’s Shawn Bodily, our PowerHA presenter, described one of his typical customer interactions: First, another IBM representative will tell the customer about the hardware and the systems’ reliability, availability and serviceability (RAS) features. Then a second rep will discuss live partition mobility and how it seamlessly shifts logical partitions from one frame to another.

So after 20 to 30 minutes of hearing about how the hardware never fails, THEN Shawn must step in and explain why the customer should be concerned with high availability and disaster recovery. That’s one tough act to follow.

So why should you care about high availability and disaster recovery? I’m reminded of something I heard at another presentation, this one at an IBM Technical University conference: “What’s the most important thing in the data center?”

I can’t recall the name of the presenter who asked that question, but I definitely know the answer. The most important thing in the data center is the applications that run on the systems. These applications are the reason we buy the systems. Really, we don’t worry about systems going down; we worry about systems going down and losing access to the applications. Or maybe it takes a system failure before we realize just how critical a given application is to the organization. When users can no longer login, when processing no longer occurs, when the cost of said failure soars by the minute–that’s what we worry about.

A Standish Group study from a few years ago estimated that only about 20 percent of outages are a result of hardware failure. And with today’s Power hardware, one can readily assume that that percentage has diminished even further.

So what else can go wrong? What about something like planned maintenance? Live partition mobility might help if your hardware alerts you to the need for a fix. Then you just move the workload off of the machine, perform the service and move the workload back on. But, as Shawn pointed out, what good is it to move your workload if you need to update the application or OS?

In those scenarios, we might look at multibos updates. Or we might look at using a product like PowerHA to fail our workload to a standby node. Yes, you’ll see an outage while the application is stopped and then restarted, but only a brief one.

The point is, things happen. Certainly we’ve seen our share of natural disasters in recent years. Or what about a simple power outage that knocks out the electricity and air conditioning? What about operator/user/human error? A mistake is made, files get deleted. Things do happen. These are the reasons  you should care about high availability and disaster recovery. You may need it. At some point you may need to bring your systems up in another location.

Ask yourself the questions that Shawn asked us: How long can you afford to be without your systems? When your systems are recovered, how much data can you afford to lose? I don’t know any companies that really want to be without their systems for any length of time. I can’t imagine any that would view their data as expendable.

When it comes to high availability and disaster recovery, the time to think about it is now–not after you’re hit with something unexpected.

The Difference Between Busy and Productive

Edit: This is still good stuff.

Originally posted November 9, 2010 on AIXchange

Some time ago I read two articles that got me thinking about the same thing: the difference between being busy and getting things done.

How are you living your day-to-day life? Are you busy running around from one task to another without thinking about what you’re doing? Are you actively looking for ways to automate or eliminate tasks? Are you stressed out?

The author of this piece explains why she stopped working with “busy” people. “It took me a while to realize that there’s a big difference between someone who feels busy and someone who has a lot going on in their business. Busy, my friends, is a cop-out. It’s a euphemism for everything from ‘I’m frantic with deadlines’ to ‘I just don’t wanna’ to ‘I feel bamboozled as to what to do next so I’m checking Twitter obsessively to tell people I’m busy.'”

How are you at prioritizing, making lists and systematically attacking the items that need to be completed? Are you working on goals and are the things you do each day helping you to reach those goals?

I understand we all have things that we need to get done, and many of them need to get done NOW, but that doesn’t necessarily mean that we must become so stressed, so busy, that we lose perspective on what we’re actually trying to accomplish. We can’t get so busy handling service requests, for instance, that we lose sight of the reality that some tasks can be delegated to others.

I can hear you thinking: “But Rob, there have been cutbacks, and now I’m doing the job that two or three people did before.” This is all the more reason to seek assistance. Early in my career, senior staff members would rely on junior staff members to handle the routine requests, which helped to free up senior staff to create simple but highly useful tools like scripts and self-help portals. How much better is it to spend a bit of time setting up a tool that people can use to help themselves?

Always stop and ask yourself: Is this task really important for me to tackle?  Can someone else do it?  Can we teach someone to do it themselves?  Can we give them better tools to help them do their job?

As is noted in this article on “the cult of the busy,”  “By appearing busy, people bother them less, and simultaneously believe they’re doing well at their job. It’s quite a trick. The person who gets a job done in one hour will seem less busy than the guy who can only do it in five. How busy a person seems is not necessarily indicative of the quality of their results. Someone who is better at something might very well seem less busy, because they are more effective. Results matter more than the time spent to achieve them. People who are always busy are time poor. They have a time shortage. They have time debt. They are either trying to do too much, or they aren’t doing what they’re doing very well. [They’re either ineffective with their time or they] don’t know what they’re trying to effect, so they scramble away at trying to optimize for everything, which leads to optimizing nothing.”

So are you busy? Are you effective? What do you plan to do about it?

The Delicate Art of VIOS Configuration

Edit: Setting up SEAs is easier now with built in control channels and HMC GUIs, but it is still something to be aware of. Some links no longer work, and I removed one that appears to be malicious.

Originally posted November 2, 2010 on AIXchange

What’s the quickest way to get to know your network team? Just bring down the entire network.

I actually know of people who have caused network outages by misconfiguring dual VIOS. However, this isn’t another of my scary stories–I just want to tell you how to avoid stirring up your own broadcast network storm.

Start with this sample example:

mkvdev -sea ent0 -vadapter ent2 -default ent2 -defaultid 2 -attr ha_mode=auto ctl_chan=ent1

When you run this command, make sure that each VIOS is set up to use the same control channel VLAN (ent1 in this case). If not, the two servers will be unable to communicate with one another. And if that happens, each will respond as if the other VIOS is down, and each will attempt to function as the primary server.

From IBM Support:

“A Shared Ethernet Adapter (SEA) can be used to connect a physical network to a virtual Ethernet network. It provides the ability for several client partitions to share one physical adapter. SEA can only be configured on the Virtual I/O Server (VIOS) and requires the POWER Hypervisor and Advanced POWER Virtualization feature. The SEA, hosted on the VIOS, acts as a Layer-2 bridge between the internal and external network.

“One SEA on one VIOS acts as the primary (active) adapter and the second SEA on the second VIOS acts as a backup (standby) adapter. Each SEA must have at least one virtual Ethernet adapter with the
‘Access external network’ flag (previously known as trunk flag) checked. This enables the SEA to provide bridging functionality between the two VIO servers.

“This adapter on both the SEAs has the same PVID, but will have a different priority value. A SEA in ha_mode (Failover mode) might have more than one trunk adapters, in which case all should have the same priority value. The priority value defines which of the two SEAs will be the primary and which will be the backup. The lower the priority value, the higher the priority — e.g. an adapter with priority 1 will have the highest priority. An additional virtual Ethernet adapter, which belongs to a unique VLAN on the system, is used to create the control channel between the SEAs, and must be specified in each SEA when configured in ha_mode. The purpose of this control channel is to communicate between the two SEA adapters to determine when a failover should take place.”

In other words: When setting up VIOS, you must set up a control channel so that the two servers can communicate with one another. You also need to establish one VIOS as the primary server and the other as the backup.

This document states the consequences of misconfiguring your SEAs:

“In this section, you will create the control channel virtual Ethernet adapters on VIOS1 and VIOS2, which will communicate on VLAN ID 12. It is very important to create this adapter on both VIOS partitions before creating SEA adapters to support failover for the same VLAN. Failing to have proper control channel configuration can result in causing a broadcast storm when both SEA adapters are activated on the same VLAN (VLAN ID 2 in this case).

“First you will create the control channel adapters on each VIOS partition. These control channel adapters are used to determine the health of the SEAs and are required to avoid a broadcast storm (which can result when two trunking virtual adapters are available on the same VLAN).”

In another part of this document, we read:

“Failing to have proper control channel configuration can result in causing a broadcast storm when both SEA adapters are activated on the same VLAN (VLAN ID 2 in this case).”

And again:

“When you run the mkvdev -sea command, it is very important that you specify the ha_mode and ctl_chan attributes. If you fail to do this, creation of the primary adapter on VIOS2 could result in a network broadcast storm.”

And again:

“STOP!!! Before you continue to the next step, ask a lab instructor to determine that you have the correct adapter configuration. Failure to properly configure an SEA failover scenario can result in a broadcast storm than can affect the entire lab network.”

A network guy I know recommends enabling BPDU on our Cisco switches to try to address this issue. This website seems to agree with that assessment:

“As a precaution, you can enable Bridge Protocol Data Unit (BPDU) Guard on the switch ports connected to the physical adapters of the SEA. BPDU Guard detects looped Spanning Tree Protocol BPDU packets and shuts down the port. This helps prevent broadcast storms on the network.”

Maybe some networking gurus out there can let us know whether using BPDU is advisable on our VIOS-connected ports.

Even those of us who routinely work with VIOS shouldn’t get cocky, because one wrong move can take out a network. So be careful. The stakes are high.

Scary Tales of IT

Edit: Surely there have been more stories I could have been told in the time since this was published..

Originally posted October 27, 2010 on AIXchange

Halloween’s coming up, and I’m looking for horror stories. No blood and gore, please–just tales from your life as an IT professional.

We all have these stories, things we’ve been through and things we’ve heard about. But even if your story comes from a friend of a friend, I’d still like to hear it. I feel all these experiences are instructive. They remind us to be on our toes around our machines.

For instance, a guy once told me about one of his coworkers replacing a disk in a rack-mounted server. An aluminum rod in the raised floor snapped, and the rack started to fall on him. Thankfully, the others working on the raised floor at the time were able to catch the rack before it crushed him.

I heard about another guy who stumbled entering another raised floor — and in the process he accidentally pressed the big red button that completely cut off power to the computer room. From what I was told, the IT folk did not have a particularly happy day recovering those machines. And I can tell you first-hand that when you enter this room now, you’ll find a large cover over that big red button.

I have many stories about dropped machines. I even know someone who took pictures. In that case, my friend said his customer unboxed a new 595 while it was still on the truck — ignoring his advice, by the way. They then wheeled the machine onto the semi’s liftgate, which was sloped slightly. I think you know where this is going. The 595 rolled down the slope, tumbled off the truck and landed upside-down on the ground. That story at least has a relatively happy ending; once they got the machine to the raised floor, it powered right up. But the cosmetic damage serves as a vivid reminder of what can happen if you uncrate the machine before taking it off the truck.

I’ve personally been in computer rooms where columns and posts blocked the ramp–providing just enough obstruction to make it impossible to wheel large computer equipment onto the raised floor. My back still aches thinking about the time I had to lug a 4-CEC 570 with multiple drawers of disks up a flight of stairs.

Finally, a friend e-mailed me this story: “In the late 80s our company had three System/38 machines, and we ended up buying a company out of Minnesota that had its own System/38. Of course that box had to be shipped to Phoenix after we bought the company.

“The System/38 was about the size and weight of your typical Fiat 500. While it did have wheels and could be rolled off the truck and into our building with relative ease, we could only fit it onto the elevator to our third-floor data center by standing it up on its end.

“There we were, around 11 p.m. one night, taking advantage of the extra people in the data center at shift change. We gently slid it out of the elevator and into the lobby. With eight people standing around the 1,100-pound behemoth, surely it would be no problem to gently set it back down on its wheels and push it the last 50 feet to its new home.

“But as we nudged it back towards its normal horizontal position, it became apparent that not everyone understood exactly how heavy this thing was, and, it kind of got away from us. We were close, but with about 24 inches left to go, some of us lost our grip (or maybe our nerve, thinking about crushed toes and fingers). The machine slammed into the floor, shooting the remains of one wheel across the lobby.

“It turns out a System/38 can be rolled on three wheels, if you really want it to. So we managed to get it into the data center and give it a trailer-park touch, leveling it with a piece of 2×4 we scrounged up in the parking garage.

“We figured we’d better find out as soon as possible how badly the system was damaged, but upon plugging it in and powering up, it came up normally. We never had any problems, other than the access doors that never quite closed completely after that night.”

I just wish I’d been there to see that. No, on second thought, I’ve done enough heavy lifting as is.

These are a few of my horror stories. Now let’s hear yours. Surely you’ve seen something memorable in your career (and hopefully the statute of limitations has expired by now). Share your tale by sending me an e-mail or making a post in the Comments section.

The Tweet Life

Edit: Twitter is still a thing. Some links no longer work.

Originally posted October 18, 2010 on AIXchange

I’ve said it before, but Twitter offers a lot of value to IT professionals. I’m finding more and more useful information and links from the people I follow.

In fact, just recently, I came across all of this information in a single morning:

First, gmon. It can now play back files.

“gmon allows you to graphically monitor several AIX 5.3TL5+, AIX6, Linux LPARs and/or [VIOS] running on POWER5, POWER6 or POWER7 servers — from a PC or laptop running Windows. gmon has a very high refresh rate [1-4 seconds] and is best used as a demonstration tool or as an educational tool to help you learn and ‘see’ how POWER virtualization works in action ‘real time.’ gmon also now has the ability to playback nmon files — up to 8 nmon files can be played back at once. This new version supports both an interactive monitoring mode (using a small agent installed on AIX, [VIOS] or Linux) and a nmon file(s) playback mode.”

Next, the ent line in vmstat and what it means, from the IBM developerWorks forums.

“In the documentation it says ent is only used if running shared processors, but it doesn’t say what it actually is telling you. I know what pc and ec mean in the stats, but what is it telling me here with ent and a link to any documentation explaining it further would be appreciated. …

“You are correct ent=NNN.N in the top line is the entitled CPU capacity of the logical partition (LPAR) and it is only shown if this is a shared CPU LPAR. This number is the guaranteed CPU time available to the LPAR. If the LPAR is uncapped, you can use more than this number (if available). If capped then its the maximum. Nothing can stop the LPAR getting this much CPU time.”

Another topic: IBM System Director plugins. Here’s an overview with a download link. I counted 11 different plugins.

Speaking of IBM Systems Director, there’s this tutorial on how to discover systems that use a mirrored (or cloned) image.

“Systems that are cloned (or use a mirrored image) and managed by IBM Systems Director must be correctly configured to ensure their successful discovery. To discover cloned systems, they must be
configured in the following ways:  All cloned systems must have a unique identifier (UID). Each cloned Common-Agent managed system must have a Tivoli globally unique identifier (GUID). Any cloned system that uses Secure Shell (SSH) must have a unique Secure Shell (SSH) host key.”

Finally, I found another good article by Anthony English. Here, Anthony discusses useful commands that help you locate free disks and logical volumes on the VIOS that are available to be mapped to client LPARs.

“There are three commands:

lspv -free lets you see which disks are not mapped to a vscsi device

lslv -free shows the logical volumes which aren’t mapped

lspv -size shows all disk with their sizes in megabytes.”

Again, this was just one morning of Twitter-watching for me.

Finding what you’re looking for on Twitter can be as simple as going to twitter.com and searching on a term like AIX, but plenty of applications are also available to help you navigate this terrain. As noted, I like tweetdeck; with it I’ve set up columns that constantly search for tweets containing AIX or #AIX. Of course by doing this, I’ll occasionally be exposed to other things with the letters a-i-x, like this city in France.

Sounds like my kind of town.

VIOS Updates

Edit: There have been a few updates since I first posted this.

Originally posted October 12, 2010 on AIXchange

I first heard about an updated version of the virtual I/O server (VIOS) during a recent IBM conference call. Now it’s official.

We already use VIOS for sharing disks and networks, active memory sharing and live partition mobility. With these just-announced enhancements, we’ll be able to suspend and resume workloads, do more with virtual networks and take advantage of thin storage provisioning and storage pool sharing capabilities.

Here are some announcement highlights, starting with a new feature called suspend/resume.

As I learned in the conference call, suspend/resume is the process of “freezing” an LPAR and saving the complete system state to disk. Then you can restart the workload exactly where it left off, without data loss. The entire LPAR system state is stored in a set of files and can be resumed on either the same server or a different system after migration. After suspension, the server resources are freed up for use by other workloads.

As you can imagine, this feature can make hardware maintenance much easier, because it allows system administrators to perform system updates or CEC upgrades without the need to shut down and restart applications, and without the need to engage application teams to verify that everything is running properly after the restart.

Where live partition mobility allows us to shift resources between physical machines while applications are still running, with suspend/resume, we’ll be able to move workloads to another machine (though obviously with an interruption of services). We’ll also be able to temporarily suspend low-priority or long-running workloads to allow more urgent processes to access server resources.

For debugging or forensics purposes, IBM states that a workload can be temporarily suspended and a copy made for offline analysis for security or performance purposes. I can’t wait to test out this intriguing feature.

More about suspend/resume from the IBM announcement letter:

“Using Suspend/Resume, clients can provide long-term suspension (greater than 5-10 seconds) of partitions, saving partition state (memory, NVRAM and VSP state) on persistent storage, freeing server resources that were in use by that partition, restoring partition state to server resources, and resuming operation of that partition and its applications either on the same server or on a different server.

“Requirements for Suspend/Resume: All resources must be virtualized prior to suspending a partition. If the partition is to be resumed on a different server, then the shared external I/O (disk and LAN) should remain identical. Suspend/Resume works with AIX and Linux workloads when managed by HMC.”

Here’s what the announcement letter says regarding shared storage pools, VIOS grouping and thin provisioning:

“VIOS 2.2 allows the creation of storage pools that can be accessed by VIOS partitions deployed across multiple Power Systems servers so that an assigned allocation of storage capacity can be efficiently managed and shared. … Multiple VIOS 2.2 partitions can utilize a common shared storage pool to more efficiently utilize limited storage resources and simplify the management and integration of storage subsystems.”

During the conference call the presenters mentioned that this would eliminate the need for vscsi devices or NPIV, but I’ll need to do some hands-on testing to understand the functionality better.

“VIOS 2.2 supports highly efficient storage provisioning, whereby virtualized workloads in VMs can have storage resources from a shared storage pool dynamically added or released as required.”

It sounds like the thin provisioning that we’re used to managing on our storage subsystems can now be managed from our VIO servers. I look forward to testing it out.

“When a new VM is created, the amount of physical storage used is less than the amount defined for the virtual workload, resulting in optimal storage utilization across the shared storage pool. Additional
storage is delivered dynamically when workloads expand and released when workloads contract.  This automates optimized storage utilization, has a more cost-efficient use of storage resources and integrates multiple storage subsystems.”

The last thing they touched on in the training was the enhancements to the virtual networking.

“The virtualized network switch functionality within the VIOS will include support for SNMP, networking QoS, dynamic VLAN and MAC access control lists (ACLs). There will be more sophisticated controls for monitoring and tuning network traffic between virtualized workloads. There will be control over networking QoS (quality of service) rules for specific LPARs and you can fine-tune the performance of network-sensitive workloads. There will be support for MAC based access ACLs to allow administrators to impose higher levels of protection for specific workloads.”

According to the announcement letter, VIOS 2.2 is set for availability on Oct. 15.

New SSD Modules Offer Greater Efficiency

Edit: I cannot remember the last time I did not run SSD in my laptops. Some links no longer work.

Originally posted October 5, 2010 on AIXchange

I’ve been meaning to touch on one other aspect of the recent Power Systems announcements — that being the new solid-state drive (SSD) disk modules.

The new SSD modules are about the same size as a thick credit card. And, according to slides I’ve seen, compared to the 69GB SSD, the new modules give you a better per GB cost, more dense physical packaging and 50 percent less energy and heat per drive, with comparable performance.

As I’ve noted, filemon (filemon -O hot -A -x “sleep 20” -r fmon -o fmon.out) can help us identify which filesystems and physical and logical volumes should be moved to SSD drives and determine the proper physical locations for these relocated files.

If you’re just starting to gather information about SSD technology, IBM offers some good introductory material. In particular check out the SSD vs. hard-disk drives comparison.

From IBM:

“Also known as Flash technology, solid-state drive technology eliminates the rotational delay of a spinning platter and of waiting for an arm to move to the correct position. Thus, data is available nearly immediately. Dramatically reducing crippling I/O bottlenecks, an SSD provides 33X to 125X more I/O Operations Per Second (IOPS) than a HDD and works at speeds much closer to those of memory, bridging the HDD performance gap. SSDs are also more efficient than HDD. While SSD operates close to 100 percent capacity, HDD is often limited to 20-50 percent storage capacity in an effort to improve responsiveness.”

There’s also a comparison of internal storage solutions and storage area networks (SANs) as well as a rundown of older SSD disk options, including SAS-connected 69 GB drives and the new double-wide PCIe card which can house up to four 177GB SSD eMLC disk modules.

Again, from IBM:

“eMLC technology stands for “Enterprise Multi-Level Cell” Flash memory technology. IBM is the first server vendor to provide this new SSD technology option which blends enterprise class performance and reliability characteristics with the more cost effective characteristics of MLC Flash storage.”

Here’s one more quote from IBM that I agree with wholeheartedly: “Remember, it’s not a question of if solid-state drives will be part of your computer center, but rather, when.”

Finally, revisit this post, which references Nigel Griffiths’ SSD demonstration video.

Technical University a Training Highlight

Edit: Some links no longer work.

Originally posted September 28, 2010 on AIXchange

They say what happens in Vegas stays in Vegas, but that isn’t always the case. For instance, the IBM Power Systems Technical University 2010 is set for Oct. 18-22 in Las Vegas. And if you attend this conference, odds are you’ll bring back a wealth of new knowledge about AIX, Linux and IBM i systems.

As I’ve noted numerous times (herehere and here), the information available at Technical University conferences is invaluable. I consider this the IBM technical training highlight of the year. These events are well worth planning and budgeting for.

Here’s an overview from IBM:

“This university is an intense, consolidated way for attendees to learn how to reduce operating costs, simplify the IT environment, access current and upcoming solution providers and leverage the newest technology innovation — virtualization with the IBM POWER7 technology.

“IBM Power Systems Technical University will offer hundreds of sessions on extensive topics, multiple training levels (beginner to advanced), best practices, solution center/expo and certification testing. Attendees will hear details behind the latest POWER7 announcements offering improved capabilities: New workload optimizing technologies like IBM TurboCore, IBM Active Memory Expansion and Active Memory Sharing to reduce memory costs, IBM PowerVM and VMControl virtualization software to support up to 1,000 virtual machines, Intelligent energy optimization features, such as IBM POWER7 EnergyScale.

“The more than 300 sessions will feature such topics as:

•    What’s new in AIX 7.1 and IBM i7.1.
•    Deepdive sessions covering all 2010 POWER7 announcement.
•    NDA future Trends and Directions sessions for Power, AIX and IBM i.
•    Active Memory Sharing (AMS) and Active Memory Expansion (AME). Taking virtualization to the next level.
•    Best practices for POWER Systems including VIOS, I/O and firmware/microcode currency.
•    Turning SAP and Oracle in AIX Environments.
•    Fibre Channel over Ethernet (FCoE) and converged network adapters (CNAs) for Power Systems servers.
•    Understanding the Processor Virtualization for POWER6 and POWER7.
•    Migrating from IBM i V5R4 to i6.1/7.1.
•    VIO Server Best Practices and Enhancements.
•    Leveraging Live Partition Mobility to move to POWER7.
•    Technical details of the new high-end POWER7 Systems.
•    How to migrate to POWER7 Hardware.
•    Designing your NIM Environments with Reliability.
•    What’s new in PowerHA SystemMirror.
•    Performance, Capacity Planning Enhancements.”

If you can make it happen, I encourage you to attend next month’s Technical University conference. And, for those readers outside the United States, here’s a worldwide events calendar that includes an IBM Technical University conference in France.

Getting Started With AIX 7.1

Edit: Some links no longer work.

Originally posted September 20, 2010 on AIXchange

Well, I was wrong. After arguing in two posts (here and here) that getting physical media from IBM is preferable to downloading AIX images, I am now among the converted. Sort of.

What happened? AIX 7.1 happened. When it was released on Sept. 10, I just had to get my hands on it. That meant I had to download it. Don’t worry; I also ordered my physical media. But I did download a copy so I could work with it right away.

Glad as I was to get started with the newest AIX version, the experience reminded me why I’m generally reluctant to download these files. I couldn’t unzip one file,
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.

Instead, I received this error message:

End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP
or
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.zip,
and cannot find
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.ZIP, period.

Since I was able to unzip the AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_2_of_2_092010.iso.ZIP (disk 2) file with no trouble, I thought maybe the download didn’t complete the first time. So I tried downloading the whole thing again, with the same results. Interestingly though, once I moved the file to an LPAR running RedHat Linux on the same frame, it unzipped just fine.

After completing the download and the decompressing the files, I moved the .iso images over to my virtual media repository, booted an LPAR from it and loaded AIX 7.1. I was able to select the edition of AIX I wanted to install, and I was able to navigate through menus and pick different software install options, including my preferred browser and the server packages I wanted to install. It looked like a normal AIX installation.

     1  System Settings:
         Method of Installation…………. New and Complete Overwrite     
         Disk Where You Want to Install…..hdisk0

    2  Primary Language Environment Settings (AFTER Install):
         Cultural Convention…………….English (United States)
         Language ……………………..English (United States)
         Keyboard ……………………..English (United States)
         Keyboard Type………………….Default
    3  Security Model…………………..Default                  
    4  More Options  (Software install options)
    5  Select Edition…………………..express

Install Options

 1.  Graphics Software………………………………………… Yes
 2.  System Management Client Software………………………….. Yes
 3.  Enable System Backups to install any system…………………. Yes
     (Installs all devices)
4.    Import User Volume Groups…………………………………. Yes

Install More Software

 1. Firefox (Firefox CD)………………………………………. No
 2. Kerberos_5 (Expansion Pack)………………………………… No
 3. Server (Volume 2)…………………………………………. No

Nevertheless, I have a couple of nits to pick: First, why is the Manage Editions menu option so high up on the SMIT main menu?

  Software Installation and Maintenance
  Software License Management
  Manage Editions
  Devices
  System Storage Management (Physical & Logical Storage)
  Security & Users
  Communications Applications and Services
  Workload Partition Administration
  Print Spooling
  Advanced Accounting
  Problem Determination
  Performance & Resource Scheduling
  System Environments
  Processes & Subsystems
  Applications
  Installation Assistant
  Electronic Service Agent
  Using SMIT (information only)

Even computer geeks build muscle memory, and we’ve become used to the devices or system storage manager options being only a couple of down arrow keystrokes away. Now we must unlearn years and years of keyboarding. It’s the new options — especially those that probably won’t change often — that should be buried further down the list.

So AIX 7 was loaded, and rebooted. Now we come to my second issue: Having to watch this message on my console as I waited for it to allow me to login to the system for the first time:

    This is the first time starting Director Agent. Please wait several minutes for the initial setup…

    Stopping The LWI Nonstop Profile…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…

It’d be nice if I could start the System Director Agent when I wanted to, or at least have the option of running it in the background while I install systems.

And of course, ssh wasn’t installed by default, but as always it was easy enough to install it after the fact.

Another thing keep in mind is that if you plan on working with versioned WPARs (where AIX 5.2 can run in a WPAR), those vwpar filesets aren’t loaded by default. Be sure you’ve ordered them separately.

Along with the unveiling of AIX 7.1, new VIOS code was released. Read about Anthony English’s experiences with VIOS. (And once you’ve done so, check out Anthony’s comments about migrating the NIM server to AIX 7.1.)

Finally, please share your experiences with AIX 7 in Comments. While I realize that many of you aren’t yet considering migration, I am curious to kn ow if AIX 7 is in your near-term plans.

AIX and Linux

Edit: I still love AIX. The link to the article no longer works.

Originally posted September 14, 2010 on AIXchange

I’ve been exchanging numerous e-mails regarding this article that’s been making the rounds on Twitter. The premise? Linux is now on a par with AIX.

My response? First, note the source: CIO Weekly. Now, it’s fine that C-level executives are seeing the value of Linux. But to suggest that Linux has achieved parity with AIX? I have a hard time believing that actual AIX and Linux administrators would go that far.

The article quotes Jean Staten Healy, IBM’s director of worldwide Linux strategy.

“From Healy’s perspective, Linux is meeting the needs of many CIOs today. She noted that total cost of ownership is a focus for CIOs, but there are other pressures which Linux can help relieve. She noted that virtualization and server consolidation as well as management simplification are key CIO goals in 2010.”

And the money quote?

“‘Linux is on parity with AIX,’ Healy told InternetNews.com in response to a question about how IBM is positioning AIX against Linux. ‘Linux enables choice. I think that’s one of the basic tenants of the faith.'”

Chris Gibson wrote a great article that articulates many of the points I made in the e-mails. Among other things, Chris discusses smit, LVM, mksysb, NIM, multibos, nimadm, concurrent updates, alt_disk_install, savevg, installp, WPARs and IBM support.

To me, Linux has one huge advantage over AIX — its ease of entry. Obtaining a copy of Linux and getting up and running in a test lab is simple. About all you need is an old x86 machine or the capability to create a virtual machine to host one. Then you can play around with the systems or work or home and get comfortable.

On the other hand, if management doesn’t approve a sandbox system for the administrators to learn the ins and outs of AIX, it makes things that much more difficult. It’s great to attend classes or read IBM Redbooks, but these are no substitutes for hands-on work with an operating system and hardware.

So many, many more people have used Linux than IBM Power Systems. And that matters when a company’s UNIX team is asked which operating system they should deploy. They’re unlikely to say AIX if they’ve never previously worked with it (or even worse, if their only AIX experience came many years ago on 3.x or 4.x).

But if you’ve used AIX like I have, you can rattle off its many attributes. I like having my logical volume manager running by default. I like making changes to the system, running cfgmgr and having my new hardware and new LUNs show up automagically. I like making dynamic changes to my running system with no need for a reboot. I like that the IBM owns the hardware and the operating system, and that its support team will fix the system when I report a problem. I like having enterprise class hardware and an enterprise class operating system to run my enterprise. I even like AIX’s security through obscurity advantage. How many script kiddies are attacking my AIX machines as opposed to my Linux machines?

Maybe my problem with the article simply stems from the word choice. “Parity” is defined as “the quality or state of being equal or equivalent.” That doesn’t seem accurate to me — even if you believe Linux is as valuable as AIX, they’re very different operating systems. They aren’t the same. But I imagine the perspective does change if you look at this through the eyes of a CIO.

Ask this administrator though, and I’ll always maintain that, unless or until Linux gets the same capabilities, AIX is the superior operating system.

People, Not Resources

Edit: This is still good stuff.

Originally posted September 7, 2010 on AIXchange

Do these statements sound familiar?

“We need to see if we can find a resource for this project.”

“Our storage resource is busy, but our network resource is available.”

“We need to find another resource.”

I hear things like this all the time. And while I recognize that most even moderately sized companies have human resources departments, I really don’t like it when the word “resource” is applied to people.

I am not a resource. I’m a person who happens to have some unique technical skills that might be utilized to help other people get something done. But I’m not a resource. I’m not a machine. I have a name.

I like this Wikipedia entry, especially the last sentence:

“The term human resources can be defined as the skills, energies, talents, abilities and knowledge that are used for the production of goods or the rendering of services. In a project management context, human resources are those employees responsible for executing the activities defined in the project plan. Human resources are considered to be the most important resource in any project.”

That’s my point: The people providing “human resources” are not disposable. They’re not expendable. They’re critical to the organization, and they should be treated that way.

This article outlines 10 ways employers can keep employees happy, including offering flexible work options, communicating openly, recognizing success, explaining the big picture, building trust and, above all, giving employees respect. It made me happy just to read it.

In this world of downsizing and budget cutting, it’s worth remembering that it’s better and less expensive to retain a current employee than it is to recruit a new employee. Over time your people become more skilled, they know how to get things done internally and they know the customers. In IT specifically, they know the systems and have experienced the server issues.

When I was writing this post, I searched on “I am not a resource,” and found, among other things, this.

So I’m not the only person who feels this way. Far from it.

You may think I’m being overly sensitive about employers describing employees as resources. And if you think I am, let me know in Comments. My response, again, comes from Wikipedia:  “A resource is any physical or virtual entity of limited availability that needs to be consumed to obtain a benefit from it.”

Hopefully we don’t need to be consumed to be beneficial.

Plane Talk About Serving Customers

Edit: I have been known to stand up and stretch sooner than I used to. These days I take pictures of receipts instead of using a scanner. Some of the links no longer work.

Originally posted August 31, 2010 on AIXchange

I’m old enough to recall when airline travel was a stand-up comedy staple. Why do comedians talk so much about flying? I imagine the biggest reason is that they’re on the road so much, it’s familiar territory for them.

And, as a consultant, I can relate. In my travels to and from customer sites, I spend considerable time moving through airports, sitting on tarmacs and waiting at car rental counters. And even though I’m no Seinfeld, I have some observations of my own.

For instance, why do airline passengers insist on standing up immediately after a flight? They just jam the aisles and slow down the deplaning. Why can’t they stay seated, and then leave the plane row by row? I understand the need to stretch — we’ve all been sitting for a long time. But give it five more minutes. Once you’re off the plane you can stretch all you want.

Anyone who’s ever flown can probably rattle off a half dozen annoying things about air travel. But at least the part about tracking business expenses and getting reimbursed has become easier for me. That’s because I found a moderately priced portable scanner. Instead of waiting till I get home to scan all my receipts, now I do it the moment I buy something on the road. Really, keeping track of receipts has become a breeze.

Flying on a regular basis, one tends to develop strong preferences regarding particular airlines, hotel chains and car rental agencies. A single experience can turn you into a loyal customer — or a former customer. For me, when it comes to hotels and car rentals, flexibility is the key. I may need to cancel a reservation at the last minute, so I need to know that I can make that happen.

The whole flying experience reminds me once again how critical a role employee attitudes play in business. And that’s something we should all keep in mind. A friend displays in his cubicle a list of the 11 commandments of good customer service. You can easily replace the word “customer” with “user” or whoever it is you work for:

1. Customers are the most important part in any business.
2. Customers are not dependent upon us, we are dependent upon them.
3. Customers are not an interruption of our work, they are the purpose of it.
4. Customers do us a favor when they call, we are not doing them a favor by serving them.
5. Customers are not cold statistics, they are flesh and blood human beings with feelings and emotions like our own.
6. Customers are part of our business, not outsiders.
7. Customers are not there to argue or match wits with.
8. Customers are people who bring us their wants; it is our job to fill those wants.
9. Customers are deserving of the most courteous and attentive treatment we can give them.
10. Customers are the people that make it possible to pay your salary, whatever your role might be in the company.
11. Customers are the life-blood of this and every other business.

On an unrelated note, here are a couple of useful links. First, a presentation on POWER7 blades. It’s a 30MB file, so be patient when downloading.

The whole site, which I’d written about previously, is worthy of your investigation. There’s a lot of good training material here.

Hot Spares and Other Tips and Tricks

Edit: Some links no longer work.

Originally posted August 24, 2010 on AIXchange

I love getting tips and tricks, and hopefully you love it when I share them. For instance, recently while perusing a mailing list, I learned of a simpler way to look up IBM employee contact information from a smartphone. At least for me, this just seems to render better on my phone than the more familiar URL (whois.ibm.com) that many of us have bookmarked on our browsers.

Thanks to the same mailing list, I was reminded of something else: You can still create a physical hot spare disk in a volume group. This capability has been available through the AIX logical volume manager since AIX 5.1, but of course, with the advent of SANs and shared storage, we’re far less reliant on internal and direct-attached disks these days. But even though we don’t need hot spares the way we once did, it’s good to know that this option remains available.

In the days of SSA drawers I used hot spares all the time. Knowing the hot spare would immediately take over when a disk failed was, to put it mildly, reassuring. Then I’d just place a quick service call, and the CE would come replace my old disk or IBM would ship me a disk and I’d replace it myself.

Here is a detailed definition of hot spares.

I’ll also highlight these steps for enabling hot spare support. Although this document uses websm rather than smit, the concepts are still the same. To select your volume group, go to smit lvm > Volume Groups > Set Characteristics of a Volume Group > Change a Volume Group. Then in the smit panel, change Set Hotspare Characteristics to y.

“Beginning with AIX 5.1, you can designate hot spare disks for a volume group to ensure the availability of your system if a disk or disks start to fail. Hot spare disk concepts and policies are described in AIX 5L Version 5.2 System Management Concepts: Operating System and Devices. The following procedures to enable hot spare disk support depend on whether you are designating hot spare disks to use with an existing volume group or enabling support while creating a new volume group.

Enable Hot Spare Disk Support for an Existing Volume Group
The following steps use Web-based System Manager to enable hot spare disk support for an existing volume.
1.    Start Web-based System Manager (if not already running) by typing wsm on the command line.
2.    Select the Volumes container.
3.    Select the Volume Groups container.
4.    Select the name of your target volume group, and choose Properties from the Selected menu.
5.    Select the Hot Spare Disk Support tab and check beside Enable hot spare disk support.
6.    Select the Physical Volumes tab to add available physical volumes to the Volume Group as hot spare disks.

“At this point, your mirrored volume group has one or more disks designated as spares. If your system detects a failing disk, depending on the options you selected, the data on the failing disk can be migrated to a spare disk without interruption to use or availability.

Enable Hot Spare Disk Support while Creating a New Volume Group
The following steps use Web-based System Manager to enable hot spare disk support while you are creating a new volume group.
1.    Start Web-based System Manager (if not already running) by typing wsm on the command line.
2.    Select the Volumes container.
3.    Select the Volume Groups container.
4.    From the Volumes menu, select New > Volume Group (Advanced Method). The subsequent panels let you choose physical volumes and their sizes, enable hot spare disk support, select unused physical volumes to assign as hot spares, then set the migration characteristics for your hot spare disk or your hot spare disk pool.

“At this point, your system recognizes a new mirrored volume group with one or more disks designated as spares. If your system detects a failing disk, depending on the options you selected, the data on the failing disk can be migrated to a spare disk without interruption to use or availability.”

Here’s an update to last week’s blog post about the new POWER7 servers:Here are the supported operating systems: AIX 7.1, AIX 6.1TL6, AIX v5.3 TL12 SP1 or later, IBM i 7.1, IBM i 6.1 with 6.1.1 MC or later, VIOS 2.2 or later and HMC V7R720 or later.

While this will be true at GA, on 9/30 we should see support for AIX V5.3 TL10 SP 5, or later and AIX V5.3 TL11 SP 5, or later.

The New POWER7 Servers

Edit: How many of you still have machines you need to upgrade to AIX 7?

Originally posted August 16, 2010 on AIXchange

Following on previous releases of POWER7 servers (the 750, 770 and 780 models) and blades, IBM today announced five new POWER7 servers: the 710, 720, 730, 740 and 795 models. The 710 and 730 are 2U servers; the 720 and 740 are 4U servers. The 795 is the high-end replacement for the Model 595.

Here’s a quick overview of the new servers, all of which come with a standard three-year warranty:

Power 710: This 2U single-socket server comes with four, six or eight cores. It can have a maximum of 64 GB of memory with four low-profile PCIe slots. It runs on 100-240 VAC power.

Power 720: This 4U single-socket server comes with 4, 6 or 8 cores. It can have a maximum of 128 GB of memory with four PCIe cards plus four low-profile PCIe cards. It also runs on 100-240 VAC power.

Power 730: This 2U 2-socket server comes with 8, 12 or 16 cores. It can have a maximum of 128 GB of memory with four low-profile PCIe cards. It runs on 200-240 VAC power.

Power 740: This 4U 2-socket server comes with 4, 8, 12 or 16 cores. It can have a maximum of 256 GB of memory with four PCIe cards plus four low-profile PCIe cards. It also runs on 200-240 VAC power.

For a comparison, the 750 is a 4U 4-socket server with 6, 8, 12, 16, 18, 24 or 32 cores, with up to 512 GB of memory and three PCIe cards and two PCI-X cards.

Power 795: This machine can have 24 to 256 cores running at 3.7, 4.0 or 4.25 GHz. Like the 780, the 795 supports TurboCore mode, where half of the cores in a socket are turned off to allow the remaining “enabled” cores to use the shared cache. While TurboCore mode can be deactivated via the ASMI, remember that the entire system is either in TurboCore or MaxCore mode — you can’t mix and match.

These machines can have 8 TB of DDR3 memory when using 32GB DIMMs, with an aggregate memory bandwidth of 4TB per second.

Here are the supported operating systems: AIX 7.1, AIX 6.1TL6, AIX v5.3 TL12 SP1 or later, IBM i 7.1, IBM i 6.1 with 6.1.1 MC or later, VIOS 2.2 or later and HMC V7R720 or later.

While this will be true at GA, on 9/30 we should see support for AIX V5.3 TL10 SP 5, or later and AIX V5.3 TL11 SP 5, or later.According to IBM, customers who upgrade from a 64-core 5 GHz POWER6 595 to a 64-core 4.25 GHz POWER7 795 can obtain 40 percent greater performance while using 35 percent less energy.

I also found this interesting statement in the IBM materials I received:

“rPerf (Relative Performance) is an estimate of commercial processing performance relative to other IBM UNIX systems. rPerf reflects a single image AIX/Linux workload and is derived from an IBM analytical model which uses characteristics from IBM internal workloads such as TPC, SPEC and other benchmarks. Most Power 795 systems will be used to consolidate multiple workloads leveraging multiple PowerVM partitions of various sizes. Starting with the introduction of the Power 795, a new rPerf estimate will be added that represents multiple partitions of smaller sizes. Single image rPerf estimates will continue to be provided up to a maximum of 64 cores.”

I think this reflects the reality that most of us carve our servers into multiple LPARs rather than run a giant 256-core 8TB single image of AIX on a 795. (Although, I must admit, it would be fun to be the admin on that one.)

Another thing IBM notes is that a 64-core 795 would use 61 percent less power than a 64-core 595.

Finally, I saw how mirrored hypervisor memory will be available to add additional built in redundancy:

“(Mirrored hypervisor memory) eliminates system outages due to uncorrectable errors in memory by maintaining two identical copies of the system hypervisor in memory at all times. Both copies are simultaneously updated with any changes, and in the event of a memory failure on the primary copy, the secondary copy will be automatically invoked and a notification sent to IBM via the Electronic Service Agent (ESA).”

In addition to the new hardware, IBM also officially unveiled AIX 7. Here are some key points from that announcement, some of which been covered previously. (See my earlier AIX 7 post, with accompanying links to Nigel Griffiths and Ken Milberg, here.)

AIX 7 will allow vertical scalability for massive workloads with up to 256 cores/1,024 threads in a single AIX partition. AIX 7 will run AIX 5.2 in a WPAR to simplify consolidation of legacy environments on POWER7. I already know of customers who are excited about taking their old applications that are bound to AIX 5.2 and upgrading them onto POWER7/AIX7 WPARs.

AIX 7 will have built in clustering to simplify configuration and management of scale-out workloads and high availability solutions. Its profile-based configuration management will ease the management of pools of AIX systems.

AIX 7 is binary compatible with AIX 6 and AIX 5. Current applications will continue to run; there is no need to recompile applications to work with AIX 7. AIX 7 fully exploits POWER7 processor-based systems, but can also run on systems based on POWER4, POWER5 or POWER6 processors.

Customers can upgrade directly to AIX 7 from AIX 6 and AIX V5; it’s a free upgrade for customers with Software Maintenance Agreements (SWMA).

AIX will have solid state disk (SSD) only volume groups, and there are enhancements to the filemon tool to help identify good SSD candidates. This will help you determine which filesystems to put on your more expensive SSD drives.

AIX is available in three different editions:

AIX Standard Edition: Suitable for most UNIX workloads, with vertical scalability up to 256 cores using AIX 7 (or 64 cores using AIX 6).

AIX Enterprise Edition: Simply, this consists of AIX plus enterprise management features. This edition includes AIX Standard Edition plus Systems Director Enterprise Edition and the Workload Partitions Manager for AIX. Vertical scalability up to 256 cores using AIX 7 (64 cores using AIX 6).

AIX Express Edition: This lower priced edition is targeted toward customers with low-end servers or who are looking to consolidate smaller workloads on larger servers. This edition includes most of the functionality of AIX Standard Edition, but vertical scalability is limited to 4 cores and 8GB of memory per core in a single partition. Customers can use multiple AIX Express Edition partitions in a single larger server.

Keep in mind that customers can run any combination of AIX Standard, Express and Enterprise edition on the same server — for example, you could use AIX Standard for a big database instance and AIX Express for 4-core application server instances.

Take the time to look at the updated facts and features documents. This will allow you to determine which POWER7 servers make the most sense in your environment.  Also start thinking about when you should upgrade to AIX 7.

Readers Respond

Edit: Some links no longer work.

Originally posted August 10, 2010 on AIXchange

Recently I questioned why so many people choose to download .iso images rather than order a set from IBM. Some of you were kind enough to offer your thoughts.

Being able to download these images from IBM is nothing new, (although the capability to download one DVD image as opposed to multiple CD images is a welcome new twist). Back when I first wrote about this, we weren’t yet able to take advantage of virtual optical media. It makes me laugh to go back and read about the gyrations I once went through to use these disk images for anything other than a source file that I would then need to burn to physical media. The method I described in that post didn’t even allow me boot from the images to load the OS; the images could only be used to load the AIX code into a NIM server using the bffcreate command.

In the article I mentioned that I had to download the CDs, then I noted that:

“On Linux, I can simply run:
    mount -o loop -t iso9660 filename.iso /mnt/iso

“This mounts my CD image on my filesystem. On AIX, mounting an .iso image is a little more involved. First I created my logical volume, in this case:
    /usr/sbin/mklv -y’testlv’ datavg 6

“Then I ran the dd command in order to write the contents of the .iso file to my logical volume:
    dd if=/aixcd1.iso of=/dev/testlv bs=10M

“Then I mounted my .iso image with:
    mount -v cdrfs -o ro /dev/testlv /mnt/iso

“At this point the CD was mounted, and I could run smitty bffcreate.”

Of course these days, with virtual optical media, .iso images can be copied into a virtual media library and loaded and unloaded without the need to create logical volumes and run dd commands.

Another thing that simplifies this process now is the addition of the loopmount command in AIX. Anthony English explains:

     “You can now mount ISO images directly onto an AIX LPAR using the loopmount command. This was introduced into AIX 6.1 TL 4 (use the oslevel -s to check your current level). The man page for loopmount provides this example:
    loopmount -i cdrom.iso -o “-V cdrfs -o ro” -m /mnt

So, with all this said, I can certainly understand why people choose to download the .iso images, and that for some in fact it may be their only option. As one reader told me: “I prefer the downloads. It seems easier to me to mount an .iso via virtual means. Though we keep hard copies at the DC just for the reasons you mention.”

Not everyone disagreed with me though. Here’s another comment: “Call me old school also, but I too like for IBM to send me the base media, comes in real handy for booting to maintenance mode for an outage recovery. I can always download/burn .iso (images), but if a server is down, every minute counts….”

Here’s a final comment I liked: “I wish I could download .iso images, because I have more often access to a high bandwidth Internet connection than to a physical media drive. (I work on some systems that are a several miles away from me.)” Without physical access to a machine, an .iso image in a virtualized machine is certainly the way to go. But even in that situation, I’d still want a copy of the media from IBM as well.

Yes, I understand that we can boot from .iso images as if an actual DVD was loaded in a virtual drive.  We can also burn our own copies from the images if we want to. With physical media, we can still load our system even if the environment lacks a NIM server or we don’t have a VIO server running on the server in question. Each method has its pros and cons. As in any case, we need to know what tools we have at our disposal, and then use the most appropriate one for the task at hand.

On an unrelated note, the Central Region Virtual users group hosted another great session — this one covers NIM. Check out the replays (here and here) and download the materials (here and here).

IBM Gets Rolling with Loaner Hardware

Edit: The links no longer work. I guess these days we would just try out workloads in the cloud.

Originally posted August 3, 2010 on AIXchange

Are you a current IBM customer who’s planning on upgrading to POWER6 or POWER7, but would like to try out the machines before buying them? Or maybe you use other operating systems, but want to evaluate IBM hardware running AIX? Or maybe you’ve been reading about the latest virtualization techniques, but don’t have current hardware to test them on?

If you face any of these scenarios, help may be available. Of course, your business partner may have access to machines that you could run some test workloads on. You may be able to work with an IBM Innovation Center to test the hardware.

Or, you could look into the POWER on Wheels program. From IBM:

“Power on Wheels is a revolutionary addition to the Power Loaner Program designed to help quickly determine if Power is right for your server consolidation efforts by providing your client with direct,
hands on access to the newest Power technology in a simple to use package that requires little to no previous AIX or Power skills. Power on Wheels is delivered to a client location in a self contained shipping box. When the box arrives, the client wheels it onto their floor, opens the doors, plugs the box into an electrical outlet and within minutes, the client starts stepping through the graphical user interface to power up and starts running the demo software application.”

Power on Wheels is a POWER7 technology-based server and software demo combination that can be used to demonstrate virtualization, CPU sharing, multiple operating system support, server consolidation Power savings and more. Participants receive a loaner plug and play shipping box, which IBM ships to the customer location for three weeks. The solution also features several pre-packaged software solution demos, but customers can add their own applications to test the hardware as well.

Once the shipping container arrives, you would need to provide power and (if desired) network connectivity. Once it’s plugged in on the raised floor, you’d fire up the physical machines and start running the LPARs, monitoring, applications, etc.

To get nominated for the program, contact your IBM Field Technical Sales Specialist (FTSS) or IBM Business Partner. Currently Power on Wheels is available only in North America, but availability is expected soon for European Union members, and a worldwide rollout is being planned.

Right now, IBM is building the fleet of machines that will serve this program. Currently, there are six shipping boxes, two POWER7 and four POWER6 systems. Another six systems are expected to be deployed, and IBM anticipates mid-August availability for the Power on Wheels V2 stack. And around that time, the POWER6 systems should be upgraded to POWER7.

For more on the Power on Wheels program, check out this text and these videos.

AIX7 Open Beta: First impressions

Edit: Some links no longer work, I edited a few of them.

Originally posted July 27, 2010 on AIXchange

Wait no longer to get your hands on the latest AIX code: The AIX7 open beta is up and running.

From IBM:

“IBM today announced an open beta program for AIX 7, the company’s open standards-based UNIX operating system. AIX 7 builds on the capabilities of previous releases of AIX and can fully exploit the performance and energy management capabilities of the new POWER7 servers that began shipping to customers earlier this year.”AIX 7 provides full binary compatibility for programs created on earlier versions of AIX including AIX 6, AIX 5, and 32-bit programs created on even earlier versions of AIX. This means that clients can protect previous investments in Power Systems by moving existing applications up to AIX 7 without having to recompile them. Full information on AIX binary compatibility is available here.

“Many clients running prior generations of POWER hardware would like to consolidate on newer, more efficient POWER7 servers, but simply do not have the administrative resources to upgrade a large number of servers. AIX 7 introduces new technology to help simplify consolidation of these older workloads onto new systems. Clients can back up an existing AIX 5.2 environment and restore it inside of a Workload Partition on AIX 7, which can allow them to quickly take advantage of the advances in POWER technology.

“AIX 7 and Power Systems hardware provides support for very large workloads with up to 256 cores/1024 threads in a single AIX logical partition – four times greater than that of AIX 6.”

I installed the open beta when it came out, but I’ve only begun playing with the code. Rather than use physical media, I loaded it on my client LPARs with a virtual optical device. It was a straightforward download, and simple to install and get running.

The new Korn shell is one thing that caught my eye. I cannot count the number of new AIX customers who moan and groan about ksh lacking the tab completion and up arrow/down arrow access to their shell history that they enjoy on their Linux or other UNIX systems. These folks will be pleased to know that they can now access some of the same functionality as the bash shell by running /usr/bin/ksh93. The newly updated ksh93 version is also available on AIX V6.1 with the 6100-06 Technology Level.

Given my recent comments, I was also happy to see ssh running by default after the installation. We’ll see if the same is true of the actual AIX 7 release once we get the actual release media.

In addition, some new menus appear in smit. Here are a few:

* Administer VERSIONED Workload Partitions (located under Workload Partition Administration).
* Live Partition Mobility with Host Ethernet Adapter (HEA) (located under Applications on the main menu).
* AIX Runtime Expert (located under System Environments).
* Change System User Interface (located under System Environments).
* System Cryptographic Operational Mode (located under System Environments).
* Managed Editions (located on the main menu).

Finally, I saw information about using NIM in the Getting Started section. Specifically, this information provides directions on using a NIM server that’s at AIX 6.1 TL5 or higher with the AIX 7 open beta. I’ve yet to test this, but I will soon.

From the documentation:

“A separate download is required for network install support. The NIM image is in tar format and available from the AIX Open Beta website. The image name should be similar to 710_NIM.tar.

“The tar image contains the following:
•    710_1026A_SPOT/ — This is a complete 710 spot environment and can be used for network installing the AIX 7 Open Beta mksysb.
•    710_1026A_SPOT.chrp.64.ent — 710 network boot image.
•    inst.images/ — This install directory contains the bos.sysmgt package and can be used for installing the latest NIM support for AIX 7 open beta.”

These are things that immediately struck me. No doubt there’s much more to discuss — I know tons of people are talking about it on Twitter. So get in on the buzz — tell me in Comments about your favorite parts of the operating system.

For more on the open beta, check out Ken Milberg’s article in the July IBM Systems Magazine e-newsletter. Chris GibsonAnthony English and Nigel Griffiths also offer their observations.

The Lines Blur Between Prod and Test

Edit: The links to the webinar resolve but are old and do not seem to work. The first link still lists the speakers at the time of this writing.

Originally posted July 19, 2010 on AIXchange

Recently I was helping a customer implement an IBM PowerHA cluster. We were on the whiteboard going over various failover scenarios. There were going to be two physical servers in the environment, and this question came up: “Are we planning to have one frame be the ‘production’ frame and the other be the ‘test/QA’ frame?”

Not that long ago, implementing a test machine alongside a “prod” machine was a given. Hardware simply wasn’t as reliable back then. So, to protect themselves from hardware failure, companies would install a hot standby backup along with their production machine — just in case. Since that backup box typically sat idle, many companies opted to run test workloads on it. At least this way, that second machine was doing something worthwhile.

However, with the advent of Live Partition Mobility and PowerHA — and with more Reliability, Availability and Serviceability (RAS) built into newer hardware — it’s more or less assumed that machines will stay up. And somewhere between then and now, the distinction between prod and test has started to blur.

Almost three years ago I saw my first Live Partition Mobility demo, and I immediately went from skeptic to true believer.

But even now, I find many customers can’t quite believe what they’re seeing. For instance, a few weeks back I was demonstrating how to move a busy LPAR from one frame to another. The customer had the same skepticism I had back at the beginning: Will it work? Will I drop packets? Is this smoke and mirrors and magic? Yes, it works. No smoke, no mirrors — and no dropped packets.

Because you can quickly and easily move workloads around your environment, you’re freed from the entire concept of “this frame is production” and “that frame is test.” You can concentrate on properly mixing workloads across the environment based on need and available resources. You can create uncapped partitions with proper values for the weights of your partitions. If the machine has free cycles, you can allocate them on a very granular level. If one machine becomes constrained, you can easily shift your workload to another frame that can better handle the load.

When my customer and I were discussing PowerHA and whether they wanted the capability of failing multiple LPARs, a comment was made — and a light bulb went on in the minds of those present. What if you set things up the “old way,” your production frame dies for some reason, and you need to failover your prod workload? Should the whole environment failover at once, or would it be preferable to have half of prod failover while the other half keeps on processing? After all, in a mixed environment with production LPARs running on different physical machines, losing a frame means only failing a subset of the environment as opposed to the whole thing.

CPU micro-partitioning, PowerVM server virtualization, Live Partition Mobility and PowerHA are all game changers. When we plan for these technologies, we must also rethink the way our systems are implemented. Though it’s tempting to still think in terms of standalone systems, alternatives are now possible. Rather than separate prod from test, we may find that mixing production with test on the same frame might make perfect sense.

Note: IBM is hosting a pair of webcasts on future trends relating to Power Systems. Register here and here.

Send Me Your Scripts

Edit: The awk link no longer works. The open beta links no longer work.

Originally posted July 13, 2010 on AIXchange

I wish someone would set up a wiki site that would serve as a repository of people’s favorite administrative ksh scripts. I mean, we all have tools and scripts and nifty .profile setups, why don’t we have a better mechanism for sharing them? Is there really that much intellectual capital that goes into setting up prompts and getting hostnames to show up at the top of an xterm? Is the handy alias and script that you use daily really a matter of national security?

For instance, I stumbled across this post on enumerating columns for awk.

While I don’t have a column command, and the formatting isn’t right for an alias to work in ksh, this is the sort of thing I’m talking about — people sharing tips, tricks, scripts and whatnot this way.

Each shop has its own script for taking a mksysb. Everyone uses their own crontab entries to manage their systems. More mature and seasoned administrators have more mature scripts. Over time their flaws and bugs have been cleaned up, and the scripts have been enhanced. Wouldn’t it be nice if we could all share in this knowledge?

Actually, if you send your scripts to me, I’ll happily post them. I’ve posted scripts in the past. Just provide the original version and note the modifications and improvements that have been made along the way.

I know. Some will argue that Company X spent the time and money to develop these scripts. It’s proprietary information that shouldn’t be handed to strangers. Still, I have to believe that users of a proprietary operating system have room to share information and knowledge. If the open-source crowd can make this sort of thing work, why can’t we?

We already share plenty of freely available information. For instance, the valuable technical doccumentation in IBM Redbooks is there for the taking. And  more and more blogs and forums are sprouting up where people can get information and ask for help. While we’re at it, shouldn’t we be able to find some quality tools as well?

Maybe someone smarter than me can figure out a way to fill this void.

Speaking of voids, this one, at least, has been filled. I know I’m not the only one who’s been anxiously waiting for the AIX 7 open beta. Well, the wait is over. The website went live today. Here’s the open beta; here’s the open beta forum.

Let me know what you think of it.

WPARs and Other AIX 7 Highlights

Edit: Did you ever do much with WPARs? The links to Nigel’s articles no longer work. The AIX 7 IBM article no longer works.

Originally posted July 6, 2010 on AIXchange

So far, adoption of WPARs has been slow. Customers like the workload isolation and resource flexibility they get with LPARs, so they’re less interested in the WPAR story. In fact I often hear customers tell me they’re happy to get away from the whole WPAR/container concept and get into micropartitioning and LPARs on Power Systems.

IBMer Nigel Griffiths gives some food for thought as to why we should get on board with WPARs. I’ll summarize his arguments:

1. LPARs take only minutes to create, but creating WPARs takes just seconds.
2. LPAR requires 512 MB to 1GB to boot AIX. With WPAR, you need fewer than 60 MB (yes, I said megabytes).
3. You can share application code — say, 1 GB — in each and every LPAR (40 LPAR = 40 GB). Or you can share just read-only copy for all WPARs (40 WPAR = 1 GB).This not only requires less maintenance, it saves disk space and memory. (If the application is loaded in the Global AIX then there’s only one copy in RAM.)
4. Maintaining one Global AIX via the SYNCWPAR command is much easier than updating, say, 40 copies of AIX.
5. Application mobility is much simpler to organize than LPM.
6. The Global AIX administration can see and change all WPAR filesystems — e.g., adding a tool to /usr/local/bin can be done by simply issuing the cp command.
7. Rapid cloning is easy and allows you to use “disposable” images that can be created, tinkered with and readily discarded.
8. If you mess up a WPAR, you can fix it via the Global AIX. If you mess up an LPAR, your system may not boot!
9. Backups are much easier and smaller than a LPAR mksysb. A default WPAR backup file is around 75 MB. Of course it’s more if you have applications plus data, but you still don’t need 2 GB of data as you do with an LPAR backup.

I have left out a few of his points — Nigel’s original entry has 12 items — so be sure to check out his full list.

Nigel has another article that covers a couple of new AIX 7 features, one of which is the capability to run AIX 5.2 within workload partitions:

“We all know running AIX 5.2 is pretty dumb (as it’s not under normal support), but it happens. For some reason the code can’t get updated, the ‘if it ain’t broken don’t fix it’ rule sticks for so long that it becomes a nightmare to update or it’s just not worth the manpower to upgrade a small application. But this also tends to mean it’s on hardware than is costly to maintain, actually large foot print given is lowly computing power, energy hungry, little or no virtualization. So picking up that AIX image (mksysb) and putting it in a AIX Workload partition is such a cool idea. It is then running on a much faster POWER7 machine, lower maintenance, sharing resources in virtualization, less energy use and freeing up computer room floor space. It’s win-win-win-win — and you then get AIX5.2 support (OK somewhat limited support as it’s been functionally stable for many years).”

For more general looks at IBM AIX 7, here’s Ken Milberg’s introduction that appears in the IBM Systems Magazine June 2010 issue. And here’s an official AIX 7 preview from IBM.

Finally, here’s my brief preview that I wrote in April.

What have you been reading about regarding AIX 7? Please share your links by posting in Comments.

The Evolution of Education

Edit: The link no longer works.

Originally posted June 29, 2010 on AIXchange

As more companies migrate to IBM Power Systems hardware and the AIX operating system, the need for education grows. It may be hard for us longtime users to imagine, but every day, seasoned pros are just getting started on POWER hardware and AIX.

While I’ve provided customer training, what I do–either through giving lectures on current topics or talking to people informally as their systems get built–doesn’t compare to the educational value of a “traditional” instructor-led class or lab.

With that in mind, check into the IBM Power Systems Test Drive, a series of no-charge remote (read: online) instructor-led classes.

Courses being offered include:

IBM DB2 WebQuery for IBM i (AT91)
IBM PowerHA SystemMirror for IBM AIX (AT92)
IBM PowerHA and Availability Resiliency without Downtime for IBM i (AT93)
Virtualization on IBM Power (AT94)
IBM Systems Director 6.1 for Power Systems (AT95)
IBM i on IBM Power Systems (AT96)
IBM AIX on IBM Power Systems (AT97)

Remote training, of course, saves IT pros and their employers the time and expense of having to travel to an educational opportunity. But is something lost if students, instructor and equipment aren’t in the same room? Not necessarily. Let’s face it: Nowadays a lot of education is remote anyway–when you travel to classes and conferences and do lab exercises, you’re likely logging into machines that are located offsite. By now good bandwidth is the norm, so network capacity shouldn’t be an issue when it comes to training.

Sure, offsite training has its advantages. When you travel somewhere for a class, there are fewer distractions, so you can concentrate on the training. Taking training remotely from your office desk, it’s easy to be sidetracked by your day-to-day responsibilities. (This does cut both ways though–I often see people connect to their employer and work on their laptops during offsite training.)

Offsite training also allows you to meet and network with your peers. I still keep in touch with folks I’ve met at training sessions. If I run into a problem with a machine I’m working on, I have any number of people I can contact for help. Being able to tap into that knowledge with just a call or a text message is invaluable.

While I haven’t taken a remote instructor-led class like the ones IBM offers, I’ve heard positive feedback from those who have. But what about you? I encourage you to post your thoughts on training and education in comments.

Looking Back, Looking Ahead, Staying Put

Edit: The link to the comparison chart no longer works. The links to the datasheets no longer work. How much further have we come since this was written?

Originally posted June 22, 2010 on AIXchange

Sometimes I’ll look at the raw computing power that sits on my desk and think back to the IBM XT systems I used years ago. Like a lot of folks in our industry, I go back a ways. I recently found some old floppy disks, and in the pile were some installation disks for some old programs I remember using under DOS on IBM-compatible machines many years ago. I was around when everyone used Lotus 1-2-3 to create spreadsheets, and WordPerfect for word processing. I was messing around prior to that, when my 300 baud acoustic coupler and my phone line would connect me to bulletin board systems (BBS) where I could communicate with others. I can easily recall the days of VisiCalc on the Apple II computer.

Going back even further, I wrote my first school papers using Stylograph on the OS-9 operating system. I remember being astounded by WYSIWYG and the capability to fully justify my text. It’s safe to say that my teachers were impressed by this cutting-edge technology as well. Back then, most students still wrote their papers by hand.

Technologies and applications come and go, and they’ll continue to do so. Microsoft Office didn’t always dominate user workstations, and I’m sure something will eventually emerge to replace it.

Right now there are free alternatives like OpenOffice, StarOffice, and Google docs (see this chart for highlights). I’ve tried many of these solutions–for instance, I use Gmail quite a bit, and when attachments come my way I often view them using Google docs.

I’ve also played around with IBM’s Lotus Symphony. While I’ve been happy with the results, I’ve yet to really give it a workout.

I guess, at the end of the day, I’m OK with Microsoft Office. Likewise, I keep coming back to Windows. Even as fond as I am of VMware workstation (which allows me to run a copy of Windows inside another operating system) and vncviewer (which allows me to view a remote Windows desktop hosted by another machine), I can’t seem to make the full-time switch to a Linux desktop. I’ve tried, but the need for some new app or utility seems to keep me from moving. Frankly, it’s just easier using what everyone else uses.

Another thing that’s easy (at least in my case) is looking back. Even during this spring’s POWER7 announcements, I got a little nostalgic thinking of the earlier iterations of AS/400 and RS/6000 hardware I once administered. It’s not that I want to go back to using those systems–and I sure wouldn’t trade my laptop and today’s software for a 386 running Wordperfect. It’s just fun to think about how far we’ve come–and how much further we’ll go.

I mean, what kind of computers will I be using in another 20 years? I’m sure I’ll be fine with future technology–provided I can still attach my Model M keyboard.

Speaking of POWER7 hardware, if you’re looking for a quick introduction to the new systems, check out these data sheets for the Model 750Model 770 and Model 780.

Some Questions for You

Edit: The link no longer works. I do not think anyone uses physical media anymore. With virtual optical and usb flash drives I cannot remember the last time I had CDs or DVDs, although I still have old media in that format that I should get rid of.

Originally posted June 14, 2010 on AIXchange

While I always try to be available to answer your questions, this week I have some questions for you. First, why isn’t ssh installed and running by default when I load AIX? Honestly, I’ve been asking myself this for years.

Admittedly, my complaint is trivial, since getting ssh running on a newly installed server is quick and easy — especially now that the openssh file sets are included with AIX install media. And in my case, since I typically build up a gold image before deploying it in an environment, all of my images will have it loaded as well.

But it still bugs me.

I mean, ssh is installed by default when I load Linux, so why isn’t it installed for AIX?

On the flip side, why are telnet and FTP enabled by default on AIX? Again, I know it’s not a big deal to go edit /etc/inetd.conf and comment out these unwanted services and restart the daemon. But it just seems like these insecure daemons shouldn’t be running at all. It’s great that they’re included, but to me, a freshly installed AIX server should have ssh enabled and telnet and FTP disabled by default.

One more question on another topic: I keep seeing retweets on Twitter about downloading installation .iso images from IBM. But why would I want to do this?

Even years ago, when IBM offered AIX installation CD images for download, I still preferred to have IBM send me a set of CDs or DVDs. And nothing’s changed. Yes, I can download the .iso images, but I’d rather IBM send me a set.

“But Rob,” you say, “you love virtual optical devices!” Yes, I most certainly do. However, if I have the physical media, I can always run mkvtopt in my VIO server and create .iso images.

I can hear you again: “But Rob, it’ll take forever for that physical media to spin and create that copy!” Actually, I think it’s quicker to do this than it is to download an installation image (depending of course on the available bandwidth). If I’m installing AIX on a new machine and I don’t already have a NIM server in the environment, what will I need? Install images burned to optical media. So now I have to download the image and then find some media to burn the images to, etc.? No thanks. I’ll just have IBM send me a set.

Now, if my environment is already built and I just want to stick the .iso image in my virtual media library and either migrate or install my client LPARs or install these filesets to my NIM server, downloading is fine, I guess. But I just like having the physical media on hand in case I need it. I don’t know, maybe I’m old school.

But please make your case in Comments. Get me on the downloading .iso bandwagon. Or tell me why ssh isn’t enabled with AIX, or why telnet and FTP are. Help me sort out my questions.

The Downside of Uptime

Edit: I know we all love continuous uptime, but the time to find out your machine will not boot is during a planned outage, not an unplanned outage.

Originally posted June 8, 2010 on AIXchange

A customer recently performed some scheduled maintenance on a critical server that had an uptime of nearly two years. The customer had created some great scripts that would bring down the application and then connect via ssh to the database server to bring down the database. The application start scripts worked the same way — they’d remotely connect to a database server and bring up the database during the application startup process.

After successfully completing the server maintenance, it was time to bring the application back up. The customer ran the application startup script, but the application didn’t appear to be working properly. After some phone calls to application and database support personnel, it was determined that someone had commented out a line in the startup script. The line that was commented out was the command that would ssh to the database server to start the database, and the application relied on the database in order to work properly.

I’ve said it before: When making changes to a machine, the changes must be tested. Again, in this case, the timestamp on the changed file was nearly two years old. So the change was made, it was never tested, and it was forgotten about. It could have been a simple case of testing something else in the script that affected the startup process and not wanting the script to contact the database server, but once that testing was done, uncommenting the line was forgotten. Since the timestamp was so old, it wasn’t a smoking gun. It didn’t stand out when troubleshooting was done on the issue, so it took a while for someone to actually check the script to see that it did what it should be doing. People assumed that such an old startup script had not been changed, so it should still be working as it would have been used at some point over the last few years.

Although none of us like downtime, especially with resilient servers that “just run,” maintenance windows and application restarts are well worth doing. If we don’t regularly exercise our server shutdowns and startups, we may not uncover a script problem or some other issue until long after the change is made. But by scheduling reboots each month or each quarter, these changes will be more quickly detected and dealt with.

The same holds true with IBM PowerHA clusters. I always like to know that a failover is being regularly tested. The wrong time to find out that something doesn’t work is when the application actually needs to failover.

Having machines that can stay up for years is a tremendous thing. But there’s nothing like the peace of mind that comes from knowing your machines stop and start the way they’re supposed to.

A New VIO Backup Option

Edit: I updated Chris’ post to reflect that he has moved his archives to gibsonnet.net.

Originally posted June 1, 2010 on AIXchange

I use the VMLibrary almost constantly. Virtual media is faster than optical media, and I can mount virtual media to different client LPARs at the same time.

Typically when I load a new machine in a new environment that doesn’t already have a NIM server, I’ll boot my first VIO server from optical media loaded in the DVD drive. Then I’ll use the mkvopt command to copy the physical media I’ll need — usually including any relevant VIO and AIX CDs — and create .iso images in my virtual media repository.

After I boot from my AIX .iso image and build my first AIX client, I’ll build it as a NIM server. Then I’ll map my AIX .iso image for the smitty bffcreate command to copy the AIX filesets. I’ll use my NIM server to load any other VIO servers in the environment; then I’ll use the NIM server to build the rest of my AIX client LPARs. I can load each system in minutes with NIM, whereas with optical media it takes much longer.

Many of you rely on virtual media as well. For instance, recently one of my customers was trying to back up his VIO servers, but because of his huge virtual media library, he was getting errors. When he called IBM Support, he was told to move the .iso images off of the machine and remove the virtual media library, and then back up the vio server — a recommendation that wasn’t practical given how much the customer relies upon the virtual media library.

Fortunately, I’d just come across information about an interesting new backupios option, nomedialib, that can exclude the VMLibrary. Chris Gibson e-mailed me about it, explaining that by using the /etc/exclude.rootvg feature of the mksysb command, the -nomedialib flag excludes the contents of the virtual media repository from the backup. When the -nomedialib flag is specified, the backupios command copies the original contents of the /etc/exclude.rootvg file to a repository, appends the /var/vio/VMLibrary string to the /etc/exclude.rootvg file, ensures that the -e flag is passed to the mksysb command and restores the original contents of the /etc/exclude.rootvg file.

Having this -nomedialib flag in your back pocket simplifies the process of backing up VIO servers, because you don’t need to copy the .iso images — that results in smaller backup images.

Speaking of Chris, I recently saw a tweet linking to his blog post about shared Ethernet adapter statistics, which can be displayed per VIO client using the seastat command.

As Chris says: “This is a great way to monitor traffic/activity over a particular SEA. It can be very useful when determining if an SEA is currently being using — i.e. during troubleshooting network connectivity issues between client LPARs and an external network that is bridged via a SEA.”

He closes his post with an IBM link to seastat information.

I’m continually building new servers for customers, and I’m always looking for new and better ways of doing things. If you have some tips about building a new machine from scratch, please post them in Comments.

Changing User ID Defaults, Using TurboCore Mode

Edit: The link to AIX Down Under no longer works. The link to the whitepaper no longer works.

Originally posted May 25, 2010 on AIXchange

A customer recently asked me about the default user ID length in AIX and how to change it. A quick search brought up the two-part answer.

1. To get the current value, run getconf LOGIN_NAME_MAX or lsattr -El sys0 -a max_logname.
2. To set the size limitation to a new (higher) value, run chdev -1 sys0 -a max_logname=# (where # is the new maximum user name length).

Then just a few days later, I noticed a new AIX blog, Anthony English’s AIX Down Under, covering this very topic. Anthony expounds on this subject by explaining the benefits of longer usernames. Check it out.

As for me and this blog, as promised, I can now tell you a bit about working with the new Model 780.

I was looking forward to using the 780 because I wanted to try out TurboCore mode. For a primer to this to this new feature, I suggest the IBM whitepaper, “Performance Implications of POWER7 Model 780’s TurboCore Mode.”From IBM: “The POWER7 Model 780 system offers an optional mode called TurboCore which allows the processor cores to execute at a higher frequency — about 7.25 percent higher — and to have more processor cache per core. Higher frequency and more cache often provide better performance.  TurboCore is a special processing mode of these systems wherein only four cores per chip are activated. With only four active cores, ease of cooling allows the active cores to provide a frequency faster (~7.25 percent ) than the nominal rate. Both the higher frequency and the greater amount of cache per core are techniques for providing better performance. It is not uncommon for a longer running, even multi-threaded workload accessing largely private data to see a performance benefit well in excess of what might be expected from the better frequency alone. Even more complex workloads residing in a partition scoped to the cores and memory of a given processor chip can see similar benefits.”

I’d read that you needed to go into ASMI to make the change to TurboCore mode on the 780, and it was an extremely simple option to change. So I logged into ASMI, clicked on Performance Setup, TurboCore setting, changed it to enabled, and saved the settings. It was almost anticlimactic for me, and the customer is happy with the performance.

In fact, I’ve now installed each of the new models — the 750, 770 and 780 — and my customers are all pleased with the performance of the machines. How about you? Post your impressions of the POWER7 gear in Comments.

Gauging the Benefits of AME

Edit: In the first paragraph I was able to download the audio presentation and the movie. I wonder how long that will last. The whitepaper links no longer work. The report is still available to download but I imagine the link will go away in the future. Instead of downloading the report, I included it at the end of this post.

Originally posted May 17, 2010 on AIXchange

The AIX Virtual User Group-Central Region USA put on another great webinar in April; this one covers Active Memory Expansion (AME). IBMer Nigel Griffiths provides a wealth of information that can help you get up to speed on the topic. The user group Web site has a complete webinar archive, including  Nigel’s audio presentation and related materials. You can also watch this movie at IBM developerWorks.

On the subject of AME, check out a couple of whitepapers. The first is the AME Overview and Usage guide. From IBM:

“IBM’s POWER7 systems with AIX feature Active Memory Expansion, a new technology for expanding a system’s effective memory capacity. Active Memory Expansion employs memory compression technology to transparently compress in-memory data, allowing more data to be placed into memory and thus expanding the memory capacity of POWER7 systems. Utilizing Active Memory Expansion can improve system utilization and increase a system’s throughput. This paper provides an overview of POWER7’s Active Memory Expansion technology, as well as guidance on how to deploy and monitor workloads with Active Memory Expansion.

“Active Memory Expansion increases a system’s effective memory capacity. The additional memory capacity made available by Active Memory Expansion enables a system to do more work, leading to an increase in the system’s throughput and utilization. Thus, the value of Active Memory Expansion is that it enables a system to do more work by increasing a system’s effective memory capacity.”

This whitepaper goes into different scenarios that you might consider when thinking about deploying AME, including expanding consolidation by fitting more LPARs onto your frame and increasing LPAR throughput by increasing the effective memory size of a single LPAR.

The second whitepaper is entitled “AME Performance.” Again, from IBM:

“This document introduces the basic concepts of Active Memory Expansion, showing the principles of operation and performance characteristics of this new component of AIX. Active Memory Expansion is available on POWER7 platforms starting with AIX 6.1 TL04 SP2. All computers have a limited amount of Random Access Memory (RAM) in which to run programs. Therefore, one of the perennial design issues for all computer systems is how to make the best use of the entire RAM which is physically available in the system, in order to execute as many programs concurrently as possible, in the limited space available. Active Memory Expansion, a POWER7 feature, supplies a new technique for making better use of RAM: Portions of programs which are infrequently used are compressed into a smaller space in RAM. This, in turn, expands the amount of RAM available for the same or other programs. Among the benefits of Active Memory Expansion, this paper shows the following scenarios and their performance results:

“1. Reducing the physical memory requirement of an LPAR resulting in 111 percent memory expansion.
2. Increasing the effective memory capacity and throughput of a memory constrained LPAR resulting in a 65 percent increase in application throughput.
3. Enabling consolidation of more LPARs onto a system resulting in a 60 percent increase in overall system throughput.”

If you have AIX 6.1 TL04 SP2 on POWER4, POWER5, POWER6 or POWER7 hardware, running the amepat command can give you an idea of AME’s potential benefits. The idea is to run the command while your system is busy and memory is in use.

When I ran amepat on a fairly idle test machine, I received this report: (Download AIXchange 5.18.10 report)

Take the time to investigate whether your computing environment can benefit from AME.

—— Below this was originally a download file ——

#amepat 1

Command Invoked                : amepat 1

Date/Time of invocation        : Thu Apr 29 11:21:23 CDT 2010

Total Monitored time           : 1 mins 6 secs

Total Samples Collected        : 1

System Configuration:

———————

Partition Name                 : testlpar

Processor Implementation Mode  : POWER7

Number Of Logical CPUs         : 8

Processor Entitled Capacity    : 0.20

Processor Max. Capacity        : 2.00

True Memory                    : 3.00 GB

SMT Threads                    : 4

Shared Processor Mode          : Enabled-Uncapped

Active Memory Sharing          : Disabled

Active Memory Expansion        : Disabled

System Resource Statistics:               Current

—————————          —————-

CPU Util (Phys. Processors)               0.03 [  2%]

Virtual Memory Size (MB)                  1075 [ 35%]

True Memory In-Use (MB)                   3058 [100%]

Pinned Memory (MB)                         567 [ 18%]

File Cache Size (MB)                      1963 [ 64%]

Available Memory (MB)                     1883 [ 61%]

Active Memory Expansion Modeled Statistics:

——————————————-

Modeled Expanded Memory Size   :   3.00 GB

Average Compression Ratio      :   2.63

Expansion    Modeled True      Modeled              CPU Usage

Factor       Memory Size       Memory Gain          Estimate

———    ————-     ——————   ———–

     1.00          3.00 GB         0.00 KB [  0%]   0.00 [  0%]

     1.09          2.75 GB       256.00 MB [  9%]   0.00 [  0%]

     1.20          2.50 GB       512.00 MB [ 20%]   0.00 [  0%]

     1.33          2.25 GB       768.00 MB [ 33%]   0.00 [  0%]

     1.50          2.00 GB         1.00 GB [ 50%]   0.00 [  0%]

     1.71          1.75 GB         1.25 GB [ 71%]   0.00 [  0%]

Active Memory Expansion Recommendation:

—————————————

The recommended AME configuration for this workload is to configure the LPAR with a memory size of 1.75 GB and to configure a memory expansion factor of 1.71.  This will result in a memory gain of 71%. With this configuration, the estimated CPU usage due to AME is approximately 0.00 physical processors, and the estimated overall peak CPU resource required for the LPAR is 0.03 physical processors.

NOTE: amepat’s recommendations are based on the workload’s utilization level during the monitored period. If there is a change in the workload’s utilization level or a change in workload itself, amepat should be run again. The modeled Active Memory Expansion CPU usage reported by amepat is just an estimate.  The actual CPU usage used for Active Memory Expansion may be lower or higher depending on the workload.

Finding the Good in GUI

Edit: The links still work. I do not think I know anyone still running IBM Systems Director.

Originally posted May 11, 2010 on AIXchange

You may be able to teach an old dog new tricks, but getting him to remember them is another matter. More than a year ago I wrote about an IBM Redbook and passed on some tips.

Among other things, I said that that by connecting to https:/hostname:5336/ibm/console on your AIX 6 machines, you’ll get a systems director console for AIX.

Unfortunately, this old dog continually forgets to do this. Usually I just ssh into a machine and do everything on the command line when I am managing systems.

Recently I was talking with a customer who had just returned from AIX training. He wanted to know how to get the director login. Since we hadn’t installed IBM Systems Director in the environment, I wasn’t sure what he meant.

When he retrieved his class notes, it jarred my memory some. Then he logged in and showed me how he learned to use the GUI to add users, configure his network, manage devices, etc. He also showed me that you could select a system health button and view metrics like system, network and paging space configs, and then scroll down for metrics like CPU utilization and physical and virtual memory. Other listings displayed real-time updates to top processes and filesystem utilization.

Much of it looked like a Web-based front end to smitty, along with other command line tools that I use day in and day out. Before scoffing and telling him not to use a dumbed down GUI, I had to remind myself that this tool was a good thing, especially for a new AIX administrator. Reducing the learning curve is a good idea. What’s the point of making powerful systems if people can’t use and manage them?

So, if you’re running AIX 6, run lssrc –a | grep pconsole and see if you have it enabled. And if you do, log in and take a look around.

You can set up so users are restricted from accessing to the console. From the welcome page when you first log in:

“Use the Console User Authority tool to add new or existing AIX users to the IBM Systems Director Console for AIX and grant them permission to perform tasks using the web console.

“Current AIX users will need to be added using the Console User Authority tool and assigned tasks before they will be able to use the Console. These users will rely on their AIX user account for user-logon security.

“Administrators can use the Console User Authority application to add new users to AIX and the Console and grant them authorizations in one step. If you add a new AIX user using the Console User Authority tool you will still be required to assign that user a password. That user will have to change the password using a command line interface before they can logon to the console for the first time.”

Troubleshooting tips are available here. Here’s an example:

“Issue: When I try to connect to the console URL, I get an Unable to Connect or the page cannot be loaded message.

“Check whether the console subsystem is active by running lssrc -s pconsole. If it is not active, try to start it by running startsrc -s pconsole.

“If it does not start and you get a message such as: The pconsole Subsystem could not be started. The Subsystem’s user id could not be established. Please check the Subsystem’s user id and try again, check that the pconsole user account has not been removed and that the UID matches the UID that owns the files under /pconsole. If the user account is missing, reinstall sysmgt.pconsole.rte so that the account is recreated with the required attributes. If the user account exists, but the UID is incorrect, remove the user account and reinstall sysmgt.pconsole.rte. …

“If the task you are trying to execute fails:

“Many of the tasks in the OS Management category are based on SMIT. If a task fails in the console, try the task in SMIT or smitty to see it it also fails there. Log in to the system on a local terminal or via telnet and try the task in SMIT/smitty. If it also fails in SMIT/smitty, then the problem may be in SMIT or in the commands or scripts executed by SMIT.  Use the Show Command function (F6) to show the command and if possible try to perform the task using the command line to see if the failure is caused by the command or by SMIT/smitty.”

Don’t make the same mistake that I made and ignore something just because it’s a Web-based front end to your system. You — or your newer administrators — just might find uses for this function.

Getting Hands On With POWER7

Edit: The wiki link no longer works.

Originally posted May 4, 2010 on AIXchange

If you haven’t had a chance yet to work on POWER7 hardware, I thought I’d pass along something I’ve learned from my introduction to the 750 and 770 models: When selecting the DVD device from the HMC, the controller that it’s connected to is identified as a RAID controller. This surprised me the first time that I saw it, so I wanted to let you know.

Some other observations about working with these new boxes:

* On the 770, it was nice to be able to “float” the DVD between partitions without worrying about which set of disks it was connected to. On this system, the DVD is connected to its own controller and doesn’t share that controller with disk drives. On the 750, however, the DVD stays with the internal disks and controller, as is the case with older gear.

* In the case of the split backplane on the 750, when you select the RAID adapter, you’ll get the DVD and the first set of four disks, while on the 770, you just get the DVD. Again, I was working with a split backplane, so I selected the second SAS controller and set of disks by selecting the PCI SAS adapter in the HMC, as is done with POWER6 gear.

* On the 770, I was able to select the SAS adapters as you’d expect, but I was able to select the RAID controller for my DVD independent of the disk controllers. My PCI Ethernet cards appeared as PCI-to-PCI bridge devices, while my Ethernet and fibre adapters that were in an expansion drawer showed up as expected.

* I also found that the HMC code that I was running (V7R7.1.0.1) had a slightly different layout on the bottom of the screen. I know of others who’ve seen this same behavior, but I’d appreciate more input. So please let me know what you’re seeing when you install the new code. Previously when I’ve made a selection in the HMC, I’d see one column at the bottom left of the screen in the task pad. The new HMC code defaulted to three columns, so that took a little getting used to. You can, though, customize the display so it appears in the familiar one-column format. Customizing the number of columns was actually possible with the older HMC code, but most of the customers I deal with just used the default one-column setting.

As I continue to load more of these systems (coming soon: the 780), I’ll let you know what else I find. Again, I’d like to hear from you, so please share your own experiences with POWER7 systems by posting in Comments.

On an unrelated note, I was recently on a call where the topic was in-depth information about the new POWER7 based blades: the PS700, PS701 and PS702. The presenters mentioned that a new wiki site was going live with POWER Blade links. Check it out and let me know if other links should be included.

Managing Servers a Unique Challenge for Small Shops

Edit: The first two links do not work. The Netview link no longer works. The first ganglia link no longer works. I still called it an AS/400.

Originally posted April 27, 2010 on AIXchange

It seems like every customer I talk to has a different method for managing their servers.

For a large data center, the challenges are apparent. There are hundreds, even thousands of servers. Some are standalone servers, some have virtual I/O servers with many client LPARs. As the number of servers grows, getting a handle on these environments can be difficult.

However, smaller shops face their own issues. Many manage their servers without the benefit of standard management tools. If you’re in this situation, you should be aware of some of the available options. For starters, built-in tools like syslog and errpt can alert us when problems occur.

We can also roll our own scripts and parse our own logs and manage our own machines without any help from anyone outside of our organization — assuming, of course, that we have the time to work on our scripts.

However, the many organizations lacking the time and/or skills to create their own tools want to be able to purchase software to help them with this task. Certainly IBM Systems Director comes to mind, but recently I was asked about other monitoring tools.

As I’m focused heavily on AIX running on POWER servers, my responses were confined to that platform. I immediately thought of Tivoli software, as I had administered Tivoli NetView once upon a time.

According to the Web site, “this system monitoring software (can) manage operating systems, databases and servers in distributed and host environments. It provides a common, flexible and easy-to-use browser interface and customizable workspaces to facilitate system monitoring. It detects and recovers potential problems in essential system resources automatically. It offers lightweight and scalable architecture, with support for IBM AIX, Solaris, Windows, Linux and IBM System z monitoring software.”

I’ve also read about software called Ganglia. I’ve even seen it in action. Though its creators tout it as “a scalable distributed monitoring system for high-performance computing systems such as clusters and grids,” it’s capable of monitoring performance across POWER machines.

Beyond that though, I drew a blank. What other toolsets are out there? What are we relying upon to manage and monitor our systems?

Back when I worked on the AS/400 system, I loved the Robot/Alert and Robot/Console products.

Hopefully I’m not misremembering, but I seem to recall being able to automate the answering of console messages and redirect operator messages to an alphanumeric pager. Back in the early ’90s, this was a handy way to have my machine page me and tell me what was wrong. With a quick glance at my pager, I knew whether the issue required an immediate response or if it could wait a bit. I’m sure the current iteration of the product offers many more powerful features of which I am not aware.

What do you think? What’s the dominant monitoring software package for Power Systems? What are you using? Send me an e-mail or leave a comment. While you’re at it, are there tools you tried and didn’t like? Or, if you had a wish list, what features would you like to see included in monitoring software?

Living in the Future

Edit: I am living even further in the future now. The 2nd ad no longer works.

Originally posted April 20, 2010 on AIXchange

I’ve appropriated the Wil Wheaton line more than once, but once again I’m reminded that we really do live in the future.

Those in the U.S. may recall the 90s-era AT&T television ads (here and here) that talked about what we’d be doing in the future.

Seeing the ads again recently, it struck me that many of their predictions essentially came true. Of course, all the technologies we take for granted today — wireless communication, open road tolls, video conferencing, video on demand, GPS, etc. — were already being planned back then.

On the subject of video conferencing, last month the Wisconsin Midrange Computer Professional Association held its spring technical conference.

At the same time, the OMNI user group hosted its March dinner meeting.

While a panel of industry experts (Aaron Bartell, Alison Butterill, Susan Gantner, Scott Klement, Jon Paris, Mike Pavlak and Trevor Perry) attended the WMCPA conference, the OMNI group arranged to have their round-table discussion broadcast into its meeting. And OMNI attendees could submit questions to the panel at the WMCPA using IM or SMS.

The video link consisted of a laptop on each end, a video camera and Skype. What amazed me was everyone’s ho-hum attitude. Just think about it: We walked into a private room at a restaurant that had enough network bandwidth on its wireless network to handle an audio and video link between two sites in two different states. Using just a video projector and a laptop with an external speaker, we had a very strong and relatively low-cost connection (especially considering that the laptops weren’t even specifically acquired for this event). As I watched how easily everything was set up and taken down, I kept thinking how lucky we are to live when we do.

The content itself was certainly thought-provoking (and I figure I’ll write more about that eventually), but the technology used to facilitate the discussion just astounds me. Not because it’s new, but because not long ago we were only dreaming of this stuff.

POWER7 is just out, yet you can be certain that IBM is already well into developing the next generation of processors, along with the next version of AIX. As far and as fast as we’ve come, things keep moving. Sometimes you just have to stand back and appreciate everything that’s happened — and look forward to everything that’s ahead.

A Look at Today’s POWER7, AIX Announcements

Edit: This was when we first knew about AIX 7.

Originally posted April 12, 2010 on AIXchange

Today IBM is making more announcements around AIX and POWER7. I’ll go through a few highlights here, and, I’m sure, cover these topics in greater depth as time goes on. (Note: Some of the information that follows is copied from materials that I received from IBM.)

POWER Blades
There will be three new POWER blades with three new model numbers (the PS700, PS701 and PS702). I’ll get to their capabilities in a moment, but I want to first note that these new model numbers, across the servers and the blades, make it much easier for me to keep things straight in my head. For instance, talking to people about JS blades could be confusing. Is it a JS20? A JS21, JS22 or JS43? Which is POWER5 and which is POWER6? However, with the naming of these new PS blades, it’s easy to recognize them as POWER7 — P stands for POWER, and the 70X numbering indicates POWER7. (By the same token, a 570 server was a nebulous term, since a POWER5 570 and a POWER6 570 aren’t the same thing. Now though, when a 770 is mentioned, we’re obviously talking about POWER7.)

Anyway, some specs: The PS700 blade is a POWER7 4-core (one socket with four cores per blade) with 4GB to 64GB DDR3 memory, 0-2 SAS disks and a single wide blade form factor.

The PS701 is a POWER7 8-core (one socket with eight cores per blade) with 4GB to 128GB DDR3 memory, 0-1 SAS drives and a single wide blade form factor.

The PS702 is a POWER7 16-core (one socket x eight cores per blade) with 4GB to 256GB DDR3 memory, 0-2 SAS disks and a double wide blade form factor. Think of this as two PS701s connected together to provide more cores, more available memory and additional disk drives.

AIX Info
AIX 7 — The next AIX version will be binary compatible with AIX 6 and AIX 5. That’s good news for customers running the older versions of the operating system. (These customers will also be interested in the pending withdrawal of AIX 5.3 support, which I’ll detail in a bit.) AIX 7 will provide vertical scalability of up to 1,024 threads and 256 cores in a single partition. That’ll be a fun day when I have the opportunity to build and run those 256-core LPARs.

AIX Profile Manager — Included with AIX 7, AIX Profile Manager is designed to make it easier to create, update and verify AIX configuration properties across multiple systems. Think of this as a follow-on to the AIX Runtime Expert; it will be an IBM Systems Director plug-in.

AIX Profile Manager is designed to ease the task of managing pools of AIX systems. For instance, imagine you have a pool of 40 WebSphere servers. You tune one and you want to propagate those settings to all of the other servers. AIX Profile Manager will allow you to connect to the “source” via IBM Systems Director. Then you can collect the information into an XML file and apply the profile to the other 39 servers.

Withdrawal of AIX 5.3 support — IBM plans to withdraw marketing for AIX 5.3 in April 2011. For those of you who are still on AIX 5.3, now is the time to start thinking about migrating to a more current version of AIX. This advance notice should give everyone ample time to plan upgrades and migrations.

License metric tool — Also soon to come is an IBM license metric tool that can help AIX users simplify license tracking and audit reporting. The metric tool will allow you to periodically collect information about the software you’re running and make it easier to determine how many licenses you’re using. It runs internally, so the collected information won’t be reported back to IBM. Think of it as a solution for self-auditing your environment. This tool is already available for products like DB2 and WebSphere; now it will support AIX as software to be managed.

Cluster-aware AIX is designed to help you easily create clusters of AIX instances for scale-out computing or high availability. It will include built-in event management and monitoring capabilities, and will also have features such as common device naming to help simplify administration. IBM considers this a foundation for future AIX capabilities and the next generation of PowerHA SystemMirror.

AIX 5.2 WPAR is intended to help minimize the effort needed to consolidate old environments on new, more efficient hardware. For shops that still run legacy hardware and AIX 5.2, this WPAR capability will allow you to stay on AIX V5.2 while moving up to POWER7 and retiring the old hardware. All you’ll need to do is back up an existing AIX 5.2 environment using mksysb and restore it inside of an AIX 7 WPAR.

Personally, I can’t wait to test this. I know of several client environments that stay on older hardware due to application dependencies around 5.2. These folks can really benefit from consolidating old
workloads onto new hardware.

AIX Express Edition — A new Express Edition of AIX will be priced for smaller workloads. AIX 6 and 7 will have all three editions — Express, Standard and Enterprise edition — while AIX 5.3 will only have Standard edition. Express Edition is intended for two deployment situations:

* When you’re running AIX on entry level servers and blades.
* When you’re consolidating smaller workloads on enterprise servers.

This offering is limited to a 4-core maximum partition size with an 8GB memory per core maximum. There will be flexibility to optimize for multiple workloads as any combination of AIX Editions can run on a single server.

Stayed Tuned
As IBM did with AIX6, we can look forward to an open beta for AIX 7 in the next few months. This will give us all a chance to test out the new features and get our environments ready to migrate to the new operating system.