AIX7 Open Beta: First impressions

Edit: Some links no longer work, I edited a few of them.

Originally posted July 27, 2010 on AIXchange

Wait no longer to get your hands on the latest AIX code: The AIX7 open beta is up and running.

From IBM:

“IBM today announced an open beta program for AIX 7, the company’s open standards-based UNIX operating system. AIX 7 builds on the capabilities of previous releases of AIX and can fully exploit the performance and energy management capabilities of the new POWER7 servers that began shipping to customers earlier this year.”AIX 7 provides full binary compatibility for programs created on earlier versions of AIX including AIX 6, AIX 5, and 32-bit programs created on even earlier versions of AIX. This means that clients can protect previous investments in Power Systems by moving existing applications up to AIX 7 without having to recompile them. Full information on AIX binary compatibility is available here.

“Many clients running prior generations of POWER hardware would like to consolidate on newer, more efficient POWER7 servers, but simply do not have the administrative resources to upgrade a large number of servers. AIX 7 introduces new technology to help simplify consolidation of these older workloads onto new systems. Clients can back up an existing AIX 5.2 environment and restore it inside of a Workload Partition on AIX 7, which can allow them to quickly take advantage of the advances in POWER technology.

“AIX 7 and Power Systems hardware provides support for very large workloads with up to 256 cores/1024 threads in a single AIX logical partition – four times greater than that of AIX 6.”

I installed the open beta when it came out, but I’ve only begun playing with the code. Rather than use physical media, I loaded it on my client LPARs with a virtual optical device. It was a straightforward download, and simple to install and get running.

The new Korn shell is one thing that caught my eye. I cannot count the number of new AIX customers who moan and groan about ksh lacking the tab completion and up arrow/down arrow access to their shell history that they enjoy on their Linux or other UNIX systems. These folks will be pleased to know that they can now access some of the same functionality as the bash shell by running /usr/bin/ksh93. The newly updated ksh93 version is also available on AIX V6.1 with the 6100-06 Technology Level.

Given my recent comments, I was also happy to see ssh running by default after the installation. We’ll see if the same is true of the actual AIX 7 release once we get the actual release media.

In addition, some new menus appear in smit. Here are a few:

* Administer VERSIONED Workload Partitions (located under Workload Partition Administration).
* Live Partition Mobility with Host Ethernet Adapter (HEA) (located under Applications on the main menu).
* AIX Runtime Expert (located under System Environments).
* Change System User Interface (located under System Environments).
* System Cryptographic Operational Mode (located under System Environments).
* Managed Editions (located on the main menu).

Finally, I saw information about using NIM in the Getting Started section. Specifically, this information provides directions on using a NIM server that’s at AIX 6.1 TL5 or higher with the AIX 7 open beta. I’ve yet to test this, but I will soon.

From the documentation:

“A separate download is required for network install support. The NIM image is in tar format and available from the AIX Open Beta website. The image name should be similar to 710_NIM.tar.

“The tar image contains the following:
•    710_1026A_SPOT/ — This is a complete 710 spot environment and can be used for network installing the AIX 7 Open Beta mksysb.
•    710_1026A_SPOT.chrp.64.ent — 710 network boot image.
•    inst.images/ — This install directory contains the bos.sysmgt package and can be used for installing the latest NIM support for AIX 7 open beta.”

These are things that immediately struck me. No doubt there’s much more to discuss — I know tons of people are talking about it on Twitter. So get in on the buzz — tell me in Comments about your favorite parts of the operating system.

For more on the open beta, check out Ken Milberg’s article in the July IBM Systems Magazine e-newsletter. Chris GibsonAnthony English and Nigel Griffiths also offer their observations.

The Lines Blur Between Prod and Test

Edit: The links to the webinar resolve but are old and do not seem to work. The first link still lists the speakers at the time of this writing.

Originally posted July 19, 2010 on AIXchange

Recently I was helping a customer implement an IBM PowerHA cluster. We were on the whiteboard going over various failover scenarios. There were going to be two physical servers in the environment, and this question came up: “Are we planning to have one frame be the ‘production’ frame and the other be the ‘test/QA’ frame?”

Not that long ago, implementing a test machine alongside a “prod” machine was a given. Hardware simply wasn’t as reliable back then. So, to protect themselves from hardware failure, companies would install a hot standby backup along with their production machine — just in case. Since that backup box typically sat idle, many companies opted to run test workloads on it. At least this way, that second machine was doing something worthwhile.

However, with the advent of Live Partition Mobility and PowerHA — and with more Reliability, Availability and Serviceability (RAS) built into newer hardware — it’s more or less assumed that machines will stay up. And somewhere between then and now, the distinction between prod and test has started to blur.

Almost three years ago I saw my first Live Partition Mobility demo, and I immediately went from skeptic to true believer.

But even now, I find many customers can’t quite believe what they’re seeing. For instance, a few weeks back I was demonstrating how to move a busy LPAR from one frame to another. The customer had the same skepticism I had back at the beginning: Will it work? Will I drop packets? Is this smoke and mirrors and magic? Yes, it works. No smoke, no mirrors — and no dropped packets.

Because you can quickly and easily move workloads around your environment, you’re freed from the entire concept of “this frame is production” and “that frame is test.” You can concentrate on properly mixing workloads across the environment based on need and available resources. You can create uncapped partitions with proper values for the weights of your partitions. If the machine has free cycles, you can allocate them on a very granular level. If one machine becomes constrained, you can easily shift your workload to another frame that can better handle the load.

When my customer and I were discussing PowerHA and whether they wanted the capability of failing multiple LPARs, a comment was made — and a light bulb went on in the minds of those present. What if you set things up the “old way,” your production frame dies for some reason, and you need to failover your prod workload? Should the whole environment failover at once, or would it be preferable to have half of prod failover while the other half keeps on processing? After all, in a mixed environment with production LPARs running on different physical machines, losing a frame means only failing a subset of the environment as opposed to the whole thing.

CPU micro-partitioning, PowerVM server virtualization, Live Partition Mobility and PowerHA are all game changers. When we plan for these technologies, we must also rethink the way our systems are implemented. Though it’s tempting to still think in terms of standalone systems, alternatives are now possible. Rather than separate prod from test, we may find that mixing production with test on the same frame might make perfect sense.

Note: IBM is hosting a pair of webcasts on future trends relating to Power Systems. Register here and here.

Send Me Your Scripts

Edit: The awk link no longer works. The open beta links no longer work.

Originally posted July 13, 2010 on AIXchange

I wish someone would set up a wiki site that would serve as a repository of people’s favorite administrative ksh scripts. I mean, we all have tools and scripts and nifty .profile setups, why don’t we have a better mechanism for sharing them? Is there really that much intellectual capital that goes into setting up prompts and getting hostnames to show up at the top of an xterm? Is the handy alias and script that you use daily really a matter of national security?

For instance, I stumbled across this post on enumerating columns for awk.

While I don’t have a column command, and the formatting isn’t right for an alias to work in ksh, this is the sort of thing I’m talking about — people sharing tips, tricks, scripts and whatnot this way.

Each shop has its own script for taking a mksysb. Everyone uses their own crontab entries to manage their systems. More mature and seasoned administrators have more mature scripts. Over time their flaws and bugs have been cleaned up, and the scripts have been enhanced. Wouldn’t it be nice if we could all share in this knowledge?

Actually, if you send your scripts to me, I’ll happily post them. I’ve posted scripts in the past. Just provide the original version and note the modifications and improvements that have been made along the way.

I know. Some will argue that Company X spent the time and money to develop these scripts. It’s proprietary information that shouldn’t be handed to strangers. Still, I have to believe that users of a proprietary operating system have room to share information and knowledge. If the open-source crowd can make this sort of thing work, why can’t we?

We already share plenty of freely available information. For instance, the valuable technical doccumentation in IBM Redbooks is there for the taking. And  more and more blogs and forums are sprouting up where people can get information and ask for help. While we’re at it, shouldn’t we be able to find some quality tools as well?

Maybe someone smarter than me can figure out a way to fill this void.

Speaking of voids, this one, at least, has been filled. I know I’m not the only one who’s been anxiously waiting for the AIX 7 open beta. Well, the wait is over. The website went live today. Here’s the open beta; here’s the open beta forum.

Let me know what you think of it.

WPARs and Other AIX 7 Highlights

Edit: Did you ever do much with WPARs? The links to Nigel’s articles no longer work. The AIX 7 IBM article no longer works.

Originally posted July 6, 2010 on AIXchange

So far, adoption of WPARs has been slow. Customers like the workload isolation and resource flexibility they get with LPARs, so they’re less interested in the WPAR story. In fact I often hear customers tell me they’re happy to get away from the whole WPAR/container concept and get into micropartitioning and LPARs on Power Systems.

IBMer Nigel Griffiths gives some food for thought as to why we should get on board with WPARs. I’ll summarize his arguments:

1. LPARs take only minutes to create, but creating WPARs takes just seconds.
2. LPAR requires 512 MB to 1GB to boot AIX. With WPAR, you need fewer than 60 MB (yes, I said megabytes).
3. You can share application code — say, 1 GB — in each and every LPAR (40 LPAR = 40 GB). Or you can share just read-only copy for all WPARs (40 WPAR = 1 GB).This not only requires less maintenance, it saves disk space and memory. (If the application is loaded in the Global AIX then there’s only one copy in RAM.)
4. Maintaining one Global AIX via the SYNCWPAR command is much easier than updating, say, 40 copies of AIX.
5. Application mobility is much simpler to organize than LPM.
6. The Global AIX administration can see and change all WPAR filesystems — e.g., adding a tool to /usr/local/bin can be done by simply issuing the cp command.
7. Rapid cloning is easy and allows you to use “disposable” images that can be created, tinkered with and readily discarded.
8. If you mess up a WPAR, you can fix it via the Global AIX. If you mess up an LPAR, your system may not boot!
9. Backups are much easier and smaller than a LPAR mksysb. A default WPAR backup file is around 75 MB. Of course it’s more if you have applications plus data, but you still don’t need 2 GB of data as you do with an LPAR backup.

I have left out a few of his points — Nigel’s original entry has 12 items — so be sure to check out his full list.

Nigel has another article that covers a couple of new AIX 7 features, one of which is the capability to run AIX 5.2 within workload partitions:

“We all know running AIX 5.2 is pretty dumb (as it’s not under normal support), but it happens. For some reason the code can’t get updated, the ‘if it ain’t broken don’t fix it’ rule sticks for so long that it becomes a nightmare to update or it’s just not worth the manpower to upgrade a small application. But this also tends to mean it’s on hardware than is costly to maintain, actually large foot print given is lowly computing power, energy hungry, little or no virtualization. So picking up that AIX image (mksysb) and putting it in a AIX Workload partition is such a cool idea. It is then running on a much faster POWER7 machine, lower maintenance, sharing resources in virtualization, less energy use and freeing up computer room floor space. It’s win-win-win-win — and you then get AIX5.2 support (OK somewhat limited support as it’s been functionally stable for many years).”

For more general looks at IBM AIX 7, here’s Ken Milberg’s introduction that appears in the IBM Systems Magazine June 2010 issue. And here’s an official AIX 7 preview from IBM.

Finally, here’s my brief preview that I wrote in April.

What have you been reading about regarding AIX 7? Please share your links by posting in Comments.

The Evolution of Education

Edit: The link no longer works.

Originally posted June 29, 2010 on AIXchange

As more companies migrate to IBM Power Systems hardware and the AIX operating system, the need for education grows. It may be hard for us longtime users to imagine, but every day, seasoned pros are just getting started on POWER hardware and AIX.

While I’ve provided customer training, what I do–either through giving lectures on current topics or talking to people informally as their systems get built–doesn’t compare to the educational value of a “traditional” instructor-led class or lab.

With that in mind, check into the IBM Power Systems Test Drive, a series of no-charge remote (read: online) instructor-led classes.

Courses being offered include:

IBM DB2 WebQuery for IBM i (AT91)
IBM PowerHA SystemMirror for IBM AIX (AT92)
IBM PowerHA and Availability Resiliency without Downtime for IBM i (AT93)
Virtualization on IBM Power (AT94)
IBM Systems Director 6.1 for Power Systems (AT95)
IBM i on IBM Power Systems (AT96)
IBM AIX on IBM Power Systems (AT97)

Remote training, of course, saves IT pros and their employers the time and expense of having to travel to an educational opportunity. But is something lost if students, instructor and equipment aren’t in the same room? Not necessarily. Let’s face it: Nowadays a lot of education is remote anyway–when you travel to classes and conferences and do lab exercises, you’re likely logging into machines that are located offsite. By now good bandwidth is the norm, so network capacity shouldn’t be an issue when it comes to training.

Sure, offsite training has its advantages. When you travel somewhere for a class, there are fewer distractions, so you can concentrate on the training. Taking training remotely from your office desk, it’s easy to be sidetracked by your day-to-day responsibilities. (This does cut both ways though–I often see people connect to their employer and work on their laptops during offsite training.)

Offsite training also allows you to meet and network with your peers. I still keep in touch with folks I’ve met at training sessions. If I run into a problem with a machine I’m working on, I have any number of people I can contact for help. Being able to tap into that knowledge with just a call or a text message is invaluable.

While I haven’t taken a remote instructor-led class like the ones IBM offers, I’ve heard positive feedback from those who have. But what about you? I encourage you to post your thoughts on training and education in comments.

Looking Back, Looking Ahead, Staying Put

Edit: The link to the comparison chart no longer works. The links to the datasheets no longer work. How much further have we come since this was written?

Originally posted June 22, 2010 on AIXchange

Sometimes I’ll look at the raw computing power that sits on my desk and think back to the IBM XT systems I used years ago. Like a lot of folks in our industry, I go back a ways. I recently found some old floppy disks, and in the pile were some installation disks for some old programs I remember using under DOS on IBM-compatible machines many years ago. I was around when everyone used Lotus 1-2-3 to create spreadsheets, and WordPerfect for word processing. I was messing around prior to that, when my 300 baud acoustic coupler and my phone line would connect me to bulletin board systems (BBS) where I could communicate with others. I can easily recall the days of VisiCalc on the Apple II computer.

Going back even further, I wrote my first school papers using Stylograph on the OS-9 operating system. I remember being astounded by WYSIWYG and the capability to fully justify my text. It’s safe to say that my teachers were impressed by this cutting-edge technology as well. Back then, most students still wrote their papers by hand.

Technologies and applications come and go, and they’ll continue to do so. Microsoft Office didn’t always dominate user workstations, and I’m sure something will eventually emerge to replace it.

Right now there are free alternatives like OpenOffice, StarOffice, and Google docs (see this chart for highlights). I’ve tried many of these solutions–for instance, I use Gmail quite a bit, and when attachments come my way I often view them using Google docs.

I’ve also played around with IBM’s Lotus Symphony. While I’ve been happy with the results, I’ve yet to really give it a workout.

I guess, at the end of the day, I’m OK with Microsoft Office. Likewise, I keep coming back to Windows. Even as fond as I am of VMware workstation (which allows me to run a copy of Windows inside another operating system) and vncviewer (which allows me to view a remote Windows desktop hosted by another machine), I can’t seem to make the full-time switch to a Linux desktop. I’ve tried, but the need for some new app or utility seems to keep me from moving. Frankly, it’s just easier using what everyone else uses.

Another thing that’s easy (at least in my case) is looking back. Even during this spring’s POWER7 announcements, I got a little nostalgic thinking of the earlier iterations of AS/400 and RS/6000 hardware I once administered. It’s not that I want to go back to using those systems–and I sure wouldn’t trade my laptop and today’s software for a 386 running Wordperfect. It’s just fun to think about how far we’ve come–and how much further we’ll go.

I mean, what kind of computers will I be using in another 20 years? I’m sure I’ll be fine with future technology–provided I can still attach my Model M keyboard.

Speaking of POWER7 hardware, if you’re looking for a quick introduction to the new systems, check out these data sheets for the Model 750Model 770 and Model 780.

Some Questions for You

Edit: The link no longer works. I do not think anyone uses physical media anymore. With virtual optical and usb flash drives I cannot remember the last time I had CDs or DVDs, although I still have old media in that format that I should get rid of.

Originally posted June 14, 2010 on AIXchange

While I always try to be available to answer your questions, this week I have some questions for you. First, why isn’t ssh installed and running by default when I load AIX? Honestly, I’ve been asking myself this for years.

Admittedly, my complaint is trivial, since getting ssh running on a newly installed server is quick and easy — especially now that the openssh file sets are included with AIX install media. And in my case, since I typically build up a gold image before deploying it in an environment, all of my images will have it loaded as well.

But it still bugs me.

I mean, ssh is installed by default when I load Linux, so why isn’t it installed for AIX?

On the flip side, why are telnet and FTP enabled by default on AIX? Again, I know it’s not a big deal to go edit /etc/inetd.conf and comment out these unwanted services and restart the daemon. But it just seems like these insecure daemons shouldn’t be running at all. It’s great that they’re included, but to me, a freshly installed AIX server should have ssh enabled and telnet and FTP disabled by default.

One more question on another topic: I keep seeing retweets on Twitter about downloading installation .iso images from IBM. But why would I want to do this?

Even years ago, when IBM offered AIX installation CD images for download, I still preferred to have IBM send me a set of CDs or DVDs. And nothing’s changed. Yes, I can download the .iso images, but I’d rather IBM send me a set.

“But Rob,” you say, “you love virtual optical devices!” Yes, I most certainly do. However, if I have the physical media, I can always run mkvtopt in my VIO server and create .iso images.

I can hear you again: “But Rob, it’ll take forever for that physical media to spin and create that copy!” Actually, I think it’s quicker to do this than it is to download an installation image (depending of course on the available bandwidth). If I’m installing AIX on a new machine and I don’t already have a NIM server in the environment, what will I need? Install images burned to optical media. So now I have to download the image and then find some media to burn the images to, etc.? No thanks. I’ll just have IBM send me a set.

Now, if my environment is already built and I just want to stick the .iso image in my virtual media library and either migrate or install my client LPARs or install these filesets to my NIM server, downloading is fine, I guess. But I just like having the physical media on hand in case I need it. I don’t know, maybe I’m old school.

But please make your case in Comments. Get me on the downloading .iso bandwagon. Or tell me why ssh isn’t enabled with AIX, or why telnet and FTP are. Help me sort out my questions.

The Downside of Uptime

Edit: I know we all love continuous uptime, but the time to find out your machine will not boot is during a planned outage, not an unplanned outage.

Originally posted June 8, 2010 on AIXchange

A customer recently performed some scheduled maintenance on a critical server that had an uptime of nearly two years. The customer had created some great scripts that would bring down the application and then connect via ssh to the database server to bring down the database. The application start scripts worked the same way — they’d remotely connect to a database server and bring up the database during the application startup process.

After successfully completing the server maintenance, it was time to bring the application back up. The customer ran the application startup script, but the application didn’t appear to be working properly. After some phone calls to application and database support personnel, it was determined that someone had commented out a line in the startup script. The line that was commented out was the command that would ssh to the database server to start the database, and the application relied on the database in order to work properly.

I’ve said it before: When making changes to a machine, the changes must be tested. Again, in this case, the timestamp on the changed file was nearly two years old. So the change was made, it was never tested, and it was forgotten about. It could have been a simple case of testing something else in the script that affected the startup process and not wanting the script to contact the database server, but once that testing was done, uncommenting the line was forgotten. Since the timestamp was so old, it wasn’t a smoking gun. It didn’t stand out when troubleshooting was done on the issue, so it took a while for someone to actually check the script to see that it did what it should be doing. People assumed that such an old startup script had not been changed, so it should still be working as it would have been used at some point over the last few years.

Although none of us like downtime, especially with resilient servers that “just run,” maintenance windows and application restarts are well worth doing. If we don’t regularly exercise our server shutdowns and startups, we may not uncover a script problem or some other issue until long after the change is made. But by scheduling reboots each month or each quarter, these changes will be more quickly detected and dealt with.

The same holds true with IBM PowerHA clusters. I always like to know that a failover is being regularly tested. The wrong time to find out that something doesn’t work is when the application actually needs to failover.

Having machines that can stay up for years is a tremendous thing. But there’s nothing like the peace of mind that comes from knowing your machines stop and start the way they’re supposed to.

A New VIO Backup Option

Edit: I updated Chris’ post to reflect that he has moved his archives to gibsonnet.net.

Originally posted June 1, 2010 on AIXchange

I use the VMLibrary almost constantly. Virtual media is faster than optical media, and I can mount virtual media to different client LPARs at the same time.

Typically when I load a new machine in a new environment that doesn’t already have a NIM server, I’ll boot my first VIO server from optical media loaded in the DVD drive. Then I’ll use the mkvopt command to copy the physical media I’ll need — usually including any relevant VIO and AIX CDs — and create .iso images in my virtual media repository.

After I boot from my AIX .iso image and build my first AIX client, I’ll build it as a NIM server. Then I’ll map my AIX .iso image for the smitty bffcreate command to copy the AIX filesets. I’ll use my NIM server to load any other VIO servers in the environment; then I’ll use the NIM server to build the rest of my AIX client LPARs. I can load each system in minutes with NIM, whereas with optical media it takes much longer.

Many of you rely on virtual media as well. For instance, recently one of my customers was trying to back up his VIO servers, but because of his huge virtual media library, he was getting errors. When he called IBM Support, he was told to move the .iso images off of the machine and remove the virtual media library, and then back up the vio server — a recommendation that wasn’t practical given how much the customer relies upon the virtual media library.

Fortunately, I’d just come across information about an interesting new backupios option, nomedialib, that can exclude the VMLibrary. Chris Gibson e-mailed me about it, explaining that by using the /etc/exclude.rootvg feature of the mksysb command, the -nomedialib flag excludes the contents of the virtual media repository from the backup. When the -nomedialib flag is specified, the backupios command copies the original contents of the /etc/exclude.rootvg file to a repository, appends the /var/vio/VMLibrary string to the /etc/exclude.rootvg file, ensures that the -e flag is passed to the mksysb command and restores the original contents of the /etc/exclude.rootvg file.

Having this -nomedialib flag in your back pocket simplifies the process of backing up VIO servers, because you don’t need to copy the .iso images — that results in smaller backup images.

Speaking of Chris, I recently saw a tweet linking to his blog post about shared Ethernet adapter statistics, which can be displayed per VIO client using the seastat command.

As Chris says: “This is a great way to monitor traffic/activity over a particular SEA. It can be very useful when determining if an SEA is currently being using — i.e. during troubleshooting network connectivity issues between client LPARs and an external network that is bridged via a SEA.”

He closes his post with an IBM link to seastat information.

I’m continually building new servers for customers, and I’m always looking for new and better ways of doing things. If you have some tips about building a new machine from scratch, please post them in Comments.

Changing User ID Defaults, Using TurboCore Mode

Edit: The link to AIX Down Under no longer works. The link to the whitepaper no longer works.

Originally posted May 25, 2010 on AIXchange

A customer recently asked me about the default user ID length in AIX and how to change it. A quick search brought up the two-part answer.

1. To get the current value, run getconf LOGIN_NAME_MAX or lsattr -El sys0 -a max_logname.
2. To set the size limitation to a new (higher) value, run chdev -1 sys0 -a max_logname=# (where # is the new maximum user name length).

Then just a few days later, I noticed a new AIX blog, Anthony English’s AIX Down Under, covering this very topic. Anthony expounds on this subject by explaining the benefits of longer usernames. Check it out.

As for me and this blog, as promised, I can now tell you a bit about working with the new Model 780.

I was looking forward to using the 780 because I wanted to try out TurboCore mode. For a primer to this to this new feature, I suggest the IBM whitepaper, “Performance Implications of POWER7 Model 780’s TurboCore Mode.”From IBM: “The POWER7 Model 780 system offers an optional mode called TurboCore which allows the processor cores to execute at a higher frequency — about 7.25 percent higher — and to have more processor cache per core. Higher frequency and more cache often provide better performance.  TurboCore is a special processing mode of these systems wherein only four cores per chip are activated. With only four active cores, ease of cooling allows the active cores to provide a frequency faster (~7.25 percent ) than the nominal rate. Both the higher frequency and the greater amount of cache per core are techniques for providing better performance. It is not uncommon for a longer running, even multi-threaded workload accessing largely private data to see a performance benefit well in excess of what might be expected from the better frequency alone. Even more complex workloads residing in a partition scoped to the cores and memory of a given processor chip can see similar benefits.”

I’d read that you needed to go into ASMI to make the change to TurboCore mode on the 780, and it was an extremely simple option to change. So I logged into ASMI, clicked on Performance Setup, TurboCore setting, changed it to enabled, and saved the settings. It was almost anticlimactic for me, and the customer is happy with the performance.

In fact, I’ve now installed each of the new models — the 750, 770 and 780 — and my customers are all pleased with the performance of the machines. How about you? Post your impressions of the POWER7 gear in Comments.

Gauging the Benefits of AME

Edit: In the first paragraph I was able to download the audio presentation and the movie. I wonder how long that will last. The whitepaper links no longer work. The report is still available to download but I imagine the link will go away in the future. Instead of downloading the report, I included it at the end of this post.

Originally posted May 17, 2010 on AIXchange

The AIX Virtual User Group-Central Region USA put on another great webinar in April; this one covers Active Memory Expansion (AME). IBMer Nigel Griffiths provides a wealth of information that can help you get up to speed on the topic. The user group Web site has a complete webinar archive, including  Nigel’s audio presentation and related materials. You can also watch this movie at IBM developerWorks.

On the subject of AME, check out a couple of whitepapers. The first is the AME Overview and Usage guide. From IBM:

“IBM’s POWER7 systems with AIX feature Active Memory Expansion, a new technology for expanding a system’s effective memory capacity. Active Memory Expansion employs memory compression technology to transparently compress in-memory data, allowing more data to be placed into memory and thus expanding the memory capacity of POWER7 systems. Utilizing Active Memory Expansion can improve system utilization and increase a system’s throughput. This paper provides an overview of POWER7’s Active Memory Expansion technology, as well as guidance on how to deploy and monitor workloads with Active Memory Expansion.

“Active Memory Expansion increases a system’s effective memory capacity. The additional memory capacity made available by Active Memory Expansion enables a system to do more work, leading to an increase in the system’s throughput and utilization. Thus, the value of Active Memory Expansion is that it enables a system to do more work by increasing a system’s effective memory capacity.”

This whitepaper goes into different scenarios that you might consider when thinking about deploying AME, including expanding consolidation by fitting more LPARs onto your frame and increasing LPAR throughput by increasing the effective memory size of a single LPAR.

The second whitepaper is entitled “AME Performance.” Again, from IBM:

“This document introduces the basic concepts of Active Memory Expansion, showing the principles of operation and performance characteristics of this new component of AIX. Active Memory Expansion is available on POWER7 platforms starting with AIX 6.1 TL04 SP2. All computers have a limited amount of Random Access Memory (RAM) in which to run programs. Therefore, one of the perennial design issues for all computer systems is how to make the best use of the entire RAM which is physically available in the system, in order to execute as many programs concurrently as possible, in the limited space available. Active Memory Expansion, a POWER7 feature, supplies a new technique for making better use of RAM: Portions of programs which are infrequently used are compressed into a smaller space in RAM. This, in turn, expands the amount of RAM available for the same or other programs. Among the benefits of Active Memory Expansion, this paper shows the following scenarios and their performance results:

“1. Reducing the physical memory requirement of an LPAR resulting in 111 percent memory expansion.
2. Increasing the effective memory capacity and throughput of a memory constrained LPAR resulting in a 65 percent increase in application throughput.
3. Enabling consolidation of more LPARs onto a system resulting in a 60 percent increase in overall system throughput.”

If you have AIX 6.1 TL04 SP2 on POWER4, POWER5, POWER6 or POWER7 hardware, running the amepat command can give you an idea of AME’s potential benefits. The idea is to run the command while your system is busy and memory is in use.

When I ran amepat on a fairly idle test machine, I received this report: (Download AIXchange 5.18.10 report)

Take the time to investigate whether your computing environment can benefit from AME.

—— Below this was originally a download file ——

#amepat 1

Command Invoked                : amepat 1

Date/Time of invocation        : Thu Apr 29 11:21:23 CDT 2010

Total Monitored time           : 1 mins 6 secs

Total Samples Collected        : 1

System Configuration:

———————

Partition Name                 : testlpar

Processor Implementation Mode  : POWER7

Number Of Logical CPUs         : 8

Processor Entitled Capacity    : 0.20

Processor Max. Capacity        : 2.00

True Memory                    : 3.00 GB

SMT Threads                    : 4

Shared Processor Mode          : Enabled-Uncapped

Active Memory Sharing          : Disabled

Active Memory Expansion        : Disabled

System Resource Statistics:               Current

—————————          —————-

CPU Util (Phys. Processors)               0.03 [  2%]

Virtual Memory Size (MB)                  1075 [ 35%]

True Memory In-Use (MB)                   3058 [100%]

Pinned Memory (MB)                         567 [ 18%]

File Cache Size (MB)                      1963 [ 64%]

Available Memory (MB)                     1883 [ 61%]

Active Memory Expansion Modeled Statistics:

——————————————-

Modeled Expanded Memory Size   :   3.00 GB

Average Compression Ratio      :   2.63

Expansion    Modeled True      Modeled              CPU Usage

Factor       Memory Size       Memory Gain          Estimate

———    ————-     ——————   ———–

     1.00          3.00 GB         0.00 KB [  0%]   0.00 [  0%]

     1.09          2.75 GB       256.00 MB [  9%]   0.00 [  0%]

     1.20          2.50 GB       512.00 MB [ 20%]   0.00 [  0%]

     1.33          2.25 GB       768.00 MB [ 33%]   0.00 [  0%]

     1.50          2.00 GB         1.00 GB [ 50%]   0.00 [  0%]

     1.71          1.75 GB         1.25 GB [ 71%]   0.00 [  0%]

Active Memory Expansion Recommendation:

—————————————

The recommended AME configuration for this workload is to configure the LPAR with a memory size of 1.75 GB and to configure a memory expansion factor of 1.71.  This will result in a memory gain of 71%. With this configuration, the estimated CPU usage due to AME is approximately 0.00 physical processors, and the estimated overall peak CPU resource required for the LPAR is 0.03 physical processors.

NOTE: amepat’s recommendations are based on the workload’s utilization level during the monitored period. If there is a change in the workload’s utilization level or a change in workload itself, amepat should be run again. The modeled Active Memory Expansion CPU usage reported by amepat is just an estimate.  The actual CPU usage used for Active Memory Expansion may be lower or higher depending on the workload.

Finding the Good in GUI

Edit: The links still work. I do not think I know anyone still running IBM Systems Director.

Originally posted May 11, 2010 on AIXchange

You may be able to teach an old dog new tricks, but getting him to remember them is another matter. More than a year ago I wrote about an IBM Redbook and passed on some tips.

Among other things, I said that that by connecting to https:/hostname:5336/ibm/console on your AIX 6 machines, you’ll get a systems director console for AIX.

Unfortunately, this old dog continually forgets to do this. Usually I just ssh into a machine and do everything on the command line when I am managing systems.

Recently I was talking with a customer who had just returned from AIX training. He wanted to know how to get the director login. Since we hadn’t installed IBM Systems Director in the environment, I wasn’t sure what he meant.

When he retrieved his class notes, it jarred my memory some. Then he logged in and showed me how he learned to use the GUI to add users, configure his network, manage devices, etc. He also showed me that you could select a system health button and view metrics like system, network and paging space configs, and then scroll down for metrics like CPU utilization and physical and virtual memory. Other listings displayed real-time updates to top processes and filesystem utilization.

Much of it looked like a Web-based front end to smitty, along with other command line tools that I use day in and day out. Before scoffing and telling him not to use a dumbed down GUI, I had to remind myself that this tool was a good thing, especially for a new AIX administrator. Reducing the learning curve is a good idea. What’s the point of making powerful systems if people can’t use and manage them?

So, if you’re running AIX 6, run lssrc –a | grep pconsole and see if you have it enabled. And if you do, log in and take a look around.

You can set up so users are restricted from accessing to the console. From the welcome page when you first log in:

“Use the Console User Authority tool to add new or existing AIX users to the IBM Systems Director Console for AIX and grant them permission to perform tasks using the web console.

“Current AIX users will need to be added using the Console User Authority tool and assigned tasks before they will be able to use the Console. These users will rely on their AIX user account for user-logon security.

“Administrators can use the Console User Authority application to add new users to AIX and the Console and grant them authorizations in one step. If you add a new AIX user using the Console User Authority tool you will still be required to assign that user a password. That user will have to change the password using a command line interface before they can logon to the console for the first time.”

Troubleshooting tips are available here. Here’s an example:

“Issue: When I try to connect to the console URL, I get an Unable to Connect or the page cannot be loaded message.

“Check whether the console subsystem is active by running lssrc -s pconsole. If it is not active, try to start it by running startsrc -s pconsole.

“If it does not start and you get a message such as: The pconsole Subsystem could not be started. The Subsystem’s user id could not be established. Please check the Subsystem’s user id and try again, check that the pconsole user account has not been removed and that the UID matches the UID that owns the files under /pconsole. If the user account is missing, reinstall sysmgt.pconsole.rte so that the account is recreated with the required attributes. If the user account exists, but the UID is incorrect, remove the user account and reinstall sysmgt.pconsole.rte. …

“If the task you are trying to execute fails:

“Many of the tasks in the OS Management category are based on SMIT. If a task fails in the console, try the task in SMIT or smitty to see it it also fails there. Log in to the system on a local terminal or via telnet and try the task in SMIT/smitty. If it also fails in SMIT/smitty, then the problem may be in SMIT or in the commands or scripts executed by SMIT.  Use the Show Command function (F6) to show the command and if possible try to perform the task using the command line to see if the failure is caused by the command or by SMIT/smitty.”

Don’t make the same mistake that I made and ignore something just because it’s a Web-based front end to your system. You — or your newer administrators — just might find uses for this function.

Getting Hands On With POWER7

Edit: The wiki link no longer works.

Originally posted May 4, 2010 on AIXchange

If you haven’t had a chance yet to work on POWER7 hardware, I thought I’d pass along something I’ve learned from my introduction to the 750 and 770 models: When selecting the DVD device from the HMC, the controller that it’s connected to is identified as a RAID controller. This surprised me the first time that I saw it, so I wanted to let you know.

Some other observations about working with these new boxes:

* On the 770, it was nice to be able to “float” the DVD between partitions without worrying about which set of disks it was connected to. On this system, the DVD is connected to its own controller and doesn’t share that controller with disk drives. On the 750, however, the DVD stays with the internal disks and controller, as is the case with older gear.

* In the case of the split backplane on the 750, when you select the RAID adapter, you’ll get the DVD and the first set of four disks, while on the 770, you just get the DVD. Again, I was working with a split backplane, so I selected the second SAS controller and set of disks by selecting the PCI SAS adapter in the HMC, as is done with POWER6 gear.

* On the 770, I was able to select the SAS adapters as you’d expect, but I was able to select the RAID controller for my DVD independent of the disk controllers. My PCI Ethernet cards appeared as PCI-to-PCI bridge devices, while my Ethernet and fibre adapters that were in an expansion drawer showed up as expected.

* I also found that the HMC code that I was running (V7R7.1.0.1) had a slightly different layout on the bottom of the screen. I know of others who’ve seen this same behavior, but I’d appreciate more input. So please let me know what you’re seeing when you install the new code. Previously when I’ve made a selection in the HMC, I’d see one column at the bottom left of the screen in the task pad. The new HMC code defaulted to three columns, so that took a little getting used to. You can, though, customize the display so it appears in the familiar one-column format. Customizing the number of columns was actually possible with the older HMC code, but most of the customers I deal with just used the default one-column setting.

As I continue to load more of these systems (coming soon: the 780), I’ll let you know what else I find. Again, I’d like to hear from you, so please share your own experiences with POWER7 systems by posting in Comments.

On an unrelated note, I was recently on a call where the topic was in-depth information about the new POWER7 based blades: the PS700, PS701 and PS702. The presenters mentioned that a new wiki site was going live with POWER Blade links. Check it out and let me know if other links should be included.

Managing Servers a Unique Challenge for Small Shops

Edit: The first two links do not work. The Netview link no longer works. The first ganglia link no longer works. I still called it an AS/400.

Originally posted April 27, 2010 on AIXchange

It seems like every customer I talk to has a different method for managing their servers.

For a large data center, the challenges are apparent. There are hundreds, even thousands of servers. Some are standalone servers, some have virtual I/O servers with many client LPARs. As the number of servers grows, getting a handle on these environments can be difficult.

However, smaller shops face their own issues. Many manage their servers without the benefit of standard management tools. If you’re in this situation, you should be aware of some of the available options. For starters, built-in tools like syslog and errpt can alert us when problems occur.

We can also roll our own scripts and parse our own logs and manage our own machines without any help from anyone outside of our organization — assuming, of course, that we have the time to work on our scripts.

However, the many organizations lacking the time and/or skills to create their own tools want to be able to purchase software to help them with this task. Certainly IBM Systems Director comes to mind, but recently I was asked about other monitoring tools.

As I’m focused heavily on AIX running on POWER servers, my responses were confined to that platform. I immediately thought of Tivoli software, as I had administered Tivoli NetView once upon a time.

According to the Web site, “this system monitoring software (can) manage operating systems, databases and servers in distributed and host environments. It provides a common, flexible and easy-to-use browser interface and customizable workspaces to facilitate system monitoring. It detects and recovers potential problems in essential system resources automatically. It offers lightweight and scalable architecture, with support for IBM AIX, Solaris, Windows, Linux and IBM System z monitoring software.”

I’ve also read about software called Ganglia. I’ve even seen it in action. Though its creators tout it as “a scalable distributed monitoring system for high-performance computing systems such as clusters and grids,” it’s capable of monitoring performance across POWER machines.

Beyond that though, I drew a blank. What other toolsets are out there? What are we relying upon to manage and monitor our systems?

Back when I worked on the AS/400 system, I loved the Robot/Alert and Robot/Console products.

Hopefully I’m not misremembering, but I seem to recall being able to automate the answering of console messages and redirect operator messages to an alphanumeric pager. Back in the early ’90s, this was a handy way to have my machine page me and tell me what was wrong. With a quick glance at my pager, I knew whether the issue required an immediate response or if it could wait a bit. I’m sure the current iteration of the product offers many more powerful features of which I am not aware.

What do you think? What’s the dominant monitoring software package for Power Systems? What are you using? Send me an e-mail or leave a comment. While you’re at it, are there tools you tried and didn’t like? Or, if you had a wish list, what features would you like to see included in monitoring software?

Living in the Future

Edit: I am living even further in the future now. The 2nd ad no longer works.

Originally posted April 20, 2010 on AIXchange

I’ve appropriated the Wil Wheaton line more than once, but once again I’m reminded that we really do live in the future.

Those in the U.S. may recall the 90s-era AT&T television ads (here and here) that talked about what we’d be doing in the future.

Seeing the ads again recently, it struck me that many of their predictions essentially came true. Of course, all the technologies we take for granted today — wireless communication, open road tolls, video conferencing, video on demand, GPS, etc. — were already being planned back then.

On the subject of video conferencing, last month the Wisconsin Midrange Computer Professional Association held its spring technical conference.

At the same time, the OMNI user group hosted its March dinner meeting.

While a panel of industry experts (Aaron Bartell, Alison Butterill, Susan Gantner, Scott Klement, Jon Paris, Mike Pavlak and Trevor Perry) attended the WMCPA conference, the OMNI group arranged to have their round-table discussion broadcast into its meeting. And OMNI attendees could submit questions to the panel at the WMCPA using IM or SMS.

The video link consisted of a laptop on each end, a video camera and Skype. What amazed me was everyone’s ho-hum attitude. Just think about it: We walked into a private room at a restaurant that had enough network bandwidth on its wireless network to handle an audio and video link between two sites in two different states. Using just a video projector and a laptop with an external speaker, we had a very strong and relatively low-cost connection (especially considering that the laptops weren’t even specifically acquired for this event). As I watched how easily everything was set up and taken down, I kept thinking how lucky we are to live when we do.

The content itself was certainly thought-provoking (and I figure I’ll write more about that eventually), but the technology used to facilitate the discussion just astounds me. Not because it’s new, but because not long ago we were only dreaming of this stuff.

POWER7 is just out, yet you can be certain that IBM is already well into developing the next generation of processors, along with the next version of AIX. As far and as fast as we’ve come, things keep moving. Sometimes you just have to stand back and appreciate everything that’s happened — and look forward to everything that’s ahead.

A Look at Today’s POWER7, AIX Announcements

Edit: This was when we first knew about AIX 7.

Originally posted April 12, 2010 on AIXchange

Today IBM is making more announcements around AIX and POWER7. I’ll go through a few highlights here, and, I’m sure, cover these topics in greater depth as time goes on. (Note: Some of the information that follows is copied from materials that I received from IBM.)

POWER Blades
There will be three new POWER blades with three new model numbers (the PS700, PS701 and PS702). I’ll get to their capabilities in a moment, but I want to first note that these new model numbers, across the servers and the blades, make it much easier for me to keep things straight in my head. For instance, talking to people about JS blades could be confusing. Is it a JS20? A JS21, JS22 or JS43? Which is POWER5 and which is POWER6? However, with the naming of these new PS blades, it’s easy to recognize them as POWER7 — P stands for POWER, and the 70X numbering indicates POWER7. (By the same token, a 570 server was a nebulous term, since a POWER5 570 and a POWER6 570 aren’t the same thing. Now though, when a 770 is mentioned, we’re obviously talking about POWER7.)

Anyway, some specs: The PS700 blade is a POWER7 4-core (one socket with four cores per blade) with 4GB to 64GB DDR3 memory, 0-2 SAS disks and a single wide blade form factor.

The PS701 is a POWER7 8-core (one socket with eight cores per blade) with 4GB to 128GB DDR3 memory, 0-1 SAS drives and a single wide blade form factor.

The PS702 is a POWER7 16-core (one socket x eight cores per blade) with 4GB to 256GB DDR3 memory, 0-2 SAS disks and a double wide blade form factor. Think of this as two PS701s connected together to provide more cores, more available memory and additional disk drives.

AIX Info
AIX 7 — The next AIX version will be binary compatible with AIX 6 and AIX 5. That’s good news for customers running the older versions of the operating system. (These customers will also be interested in the pending withdrawal of AIX 5.3 support, which I’ll detail in a bit.) AIX 7 will provide vertical scalability of up to 1,024 threads and 256 cores in a single partition. That’ll be a fun day when I have the opportunity to build and run those 256-core LPARs.

AIX Profile Manager — Included with AIX 7, AIX Profile Manager is designed to make it easier to create, update and verify AIX configuration properties across multiple systems. Think of this as a follow-on to the AIX Runtime Expert; it will be an IBM Systems Director plug-in.

AIX Profile Manager is designed to ease the task of managing pools of AIX systems. For instance, imagine you have a pool of 40 WebSphere servers. You tune one and you want to propagate those settings to all of the other servers. AIX Profile Manager will allow you to connect to the “source” via IBM Systems Director. Then you can collect the information into an XML file and apply the profile to the other 39 servers.

Withdrawal of AIX 5.3 support — IBM plans to withdraw marketing for AIX 5.3 in April 2011. For those of you who are still on AIX 5.3, now is the time to start thinking about migrating to a more current version of AIX. This advance notice should give everyone ample time to plan upgrades and migrations.

License metric tool — Also soon to come is an IBM license metric tool that can help AIX users simplify license tracking and audit reporting. The metric tool will allow you to periodically collect information about the software you’re running and make it easier to determine how many licenses you’re using. It runs internally, so the collected information won’t be reported back to IBM. Think of it as a solution for self-auditing your environment. This tool is already available for products like DB2 and WebSphere; now it will support AIX as software to be managed.

Cluster-aware AIX is designed to help you easily create clusters of AIX instances for scale-out computing or high availability. It will include built-in event management and monitoring capabilities, and will also have features such as common device naming to help simplify administration. IBM considers this a foundation for future AIX capabilities and the next generation of PowerHA SystemMirror.

AIX 5.2 WPAR is intended to help minimize the effort needed to consolidate old environments on new, more efficient hardware. For shops that still run legacy hardware and AIX 5.2, this WPAR capability will allow you to stay on AIX V5.2 while moving up to POWER7 and retiring the old hardware. All you’ll need to do is back up an existing AIX 5.2 environment using mksysb and restore it inside of an AIX 7 WPAR.

Personally, I can’t wait to test this. I know of several client environments that stay on older hardware due to application dependencies around 5.2. These folks can really benefit from consolidating old
workloads onto new hardware.

AIX Express Edition — A new Express Edition of AIX will be priced for smaller workloads. AIX 6 and 7 will have all three editions — Express, Standard and Enterprise edition — while AIX 5.3 will only have Standard edition. Express Edition is intended for two deployment situations:

* When you’re running AIX on entry level servers and blades.
* When you’re consolidating smaller workloads on enterprise servers.

This offering is limited to a 4-core maximum partition size with an 8GB memory per core maximum. There will be flexibility to optimize for multiple workloads as any combination of AIX Editions can run on a single server.

Stayed Tuned
As IBM did with AIX6, we can look forward to an open beta for AIX 7 in the next few months. This will give us all a chance to test out the new features and get our environments ready to migrate to the new operating system.

Spreading the Word of AIX

Edit: How many people do you run into that do not know AIX details? The market share link no longer works.

Originally posted April 6, 2010 on AIXchange

Recently I was on the phone with a customer who had an issue with the memory utilization on one of his LPARs in his blade server. His application needed more memory. He’s a UNIX guy, but not an AIX guy. His company had purchased an AIX system and had no education budget, so he never learned the nuances of AIX. Happens all the time.

As a result, he didn’t know about dynamic logical partitioning and how it would allow him to easily move memory from one LPAR to another without having to take anything offline.

While talking to him, I logged into his network and reduced the memory being used by one of his test LPARs and moved it to the production LPAR. He was happily surprised that this was resolved so quickly, and that it didn’t require an outage.

Of course, none of this is especially new or interesting to AIX administrators. We already that know we can dynamically allocate resources to the different LPARs on our frames. We know we can mix dedicated and virtual adapters in our client partitions if we need to. We know we can micropartition and fractionalize our CPUs. We know all about processor and memory pools and a built-in logical volume manager that allows us to grow and shrink the size of a filesystem on the fly while it’s mounted and running.

And, because we do this stuff daily, we take it for granted.

But take a moment to consider how you came by this understanding and knowledge. How much of it came from formal classroom education? How much of it came on the job, learning from colleagues?

The world knows UNIX. People run HP-UX and Solaris and different Linux distributions, and they think understand everything that can be done with UNIX. All the while, they jump through hoops replacing disks and patching systems. They don’t know any better. We need to tell them. We need to spread the word.

What makes AIX different? What makes it better? Is it the capability to take a mksysb and clone it to another system? Is it the patch management? Is it the help that IBM support provides?

Admittedly — and proudly — I’m an AIX bigot, and I’m always heartened to see charts like this one that show AIX’s growing market share compared to other UNIX flavors.

Still, I look at the processors IBM sells and the capabilities built into its systems, and I keep thinking that the rest of the world should already know what we know. I have to remind myself that people are busy, and that keeping tabs on the state of AIX and Power servers isn’t everyone’s priority. Then I realize that we, as loyal AIX users, need to make this our priority. We need to show off the unique things people do with AIX systems on a daily basis.

We must spread the word.

An Ethernet Configuration Tip

Edit: I love Dean’s scripts.

Originally posted March 30, 2010 on AIXchange

I just saw this on a mailing list: It’s a tip about (default) Ethernet configs written by Dean Rowswell. This is great information, so, with Dean’s permission, I’m passing it along:

The default value for *physical* Ethernet adapters is to enable largesend. This is *not* the default for Shared Ethernet adapters and Virtual Ethernet adapters.

Enabling largesend can make a BIG difference for some applications.

Example:

largesend Disabled: FTP between two LPARs using Virtual Ethernet and the performance was 194 MB/sec.
largesend Enabled: Same test … 725 MB/sec.

For SEA, the largesend=1 option can be added when it’s created via the mkvdev command.

For Virtual Ethernet Adapters on VIO client LPARs, there is no ODM stanza for this value, so it must be set at reboot time. I wrote an “ena_lg_send_virt_eth.tar” script for making this change on reboot:

#!/bin/ksh
# Created by Dean Rowswell, IBM, March 22, 2010
# This script will enable the largesend tunable for the virtual ethernet interface in order
# to dramatically improve performance of this adapter
#
# This script should run just before the rctcpip entry in the inittab

LOG=/var/adm/ena_lg_send_virt_eth.log

echo “\n=================================================================================”
>>${LOG}
echo “Running ena_lg_send_virt_eth on `hostname` at `date`” | tee -a ${LOG}
echo “=================================================================================”
>>${LOG}

# Get all of the ethernet interfaces on this system
for INTERFACE in `netstat -in|grep ^en|grep -v link|awk ‘{print $1}’|sort|uniq`
do
# Convert the interface to the device
DEVICE=`echo ${INTERFACE} | sed ‘s/en/ent/g’`

# Check all Virtual Adapters for this device
lsdev -Cc adapter -Sa -s vdevice -F name | grep -w ${DEVICE}
>/dev/null 2>/dev/null

# If a match is found then enable largesend for this interface
if [ $? -eq 0 ]
then
echo “\nCurrent interface config for ${INTERFACE} is:” >>${LOG}
ifconfig ${INTERFACE} >>${LOG}
echo “\nENABLE LARGESEND FOR VIRTUAL ETHERNET INTERFACE ->
${INTERFACE} at `date`” >>${LOG}
ifconfig ${INTERFACE} largesend >>${LOG} 2>>${LOG}
echo “\nNew interface config for ${INTERFACE} is:” >>${LOG}
ifconfig ${INTERFACE} >>${LOG}
fi
done

I placed this script into a /usr/local/scripts directory on the AIX systems. This entry should run just before rctcpip in the inittab.

For AIX 6.1 systems I ran this command:

# mkitab -i platform_agent
“lg_send:2:wait:/usr/local/scripts/ena_lg_send_virt_eth # Enable largesend for virt enet adapters”

For AIX 5.3 systems I ran this command:

# mkitab -i srcmstr
“lg_send:2:wait:/usr/local/scripts/ena_lg_send_virt_eth # Enable largesend for virt enet adapters”

Two Tales of Customer Service

Edit: Customer service is still a thing. I changed the blizzard link to a working one.

Originally posted March 23, 2010 on AIXchange

Providing excellent customer service is one way to stand out from your competition. The old line about it being far easier to maintain a relationship with an existing customer than it is start one with a new customer is certainly true.

Recently though, I had an experience that takes this idea in another direction. I ordered a new computer system for my home. My previous interaction with this company was positive. A few years ago this outfit delivered my new system on time, and it’s worked flawlessly. They stayed in contact, through mailings and e-mails, over the years. So when it was time for a new machine, I naturally thought of them first.

I placed my order, and waited for my delivery. And waited. And waited some more. Eventually I got an e-mail saying that the ship date had slipped by several weeks. No kidding. Unfortunately, in this case I was counting on the system to arrive by a certain date because I’d already promised my older system to someone else. They didn’t want to wait either.

So I decided that the new ship date wouldn’t work, and I canceled the order. Or I tried to. When I called the toll-free line, I was put on hold five times. And when I got a new agent, that person needed to reestablish my name, address, order number and other information. One agent erroneously told me that it would be impossible to cancel the order this far into the process.

At no time did anyone attempt to solve the problem by locating a similar replacement system that might work instead. No one I talked to even sounded the least bit sorry that the delivery wouldn’t go as promised. Each agent just seemed to be in a hurry to get me off the phone and move onto the next call. Guess which company isn’t getting my business any more?

I’m tempted to say that this is fairly typical customer service these days. However, I’ve also had a very positive experience lately.

In early February, when the east coast was being deluged with snow, I had to fly out of Baltimore. I had an early-evening flight, but, in anticipation of the storm, the airport had already decided a day ahead of time that it was going to shut down early that morning. My only chance was to leave the customer site a day earlier than planned so I could figure out how to get out of town.

Over lunch I returned to my hotel to check out. Even though I was a bit past the official checkout time, I wasn’t charged for an extra night. Moreover, the desk clerk got me a room at a hotel closer to the airport for that evening, taking the time to relay all of my information to the new property. In addition, the airline switched my departure time without charging me extra. And of course, I’m grateful to my customer for understanding my dilemma and letting me leave early in the first place. I made a flight early the next morning, and got out ahead of the storm.

So, do you think I’ll use this hotel chain again, and/or recommend them to others? Do you think I’ll speak highly of the airline? And it’s all due to service people who took the initiative and spent a few minutes to help me.

We should all treat our customers — be they actual customers, co-workers, end users, what have you — as well as I was treated that day. We should all take an extra moment to exceed their expectations and make their day.

If not, you might drive someone into the arms of your competition.

My Trip to the Techdocs Library

Edit: These are still being updated and added to, some of the specific ones I linked to are no longer there.

Originally posted March 15, 2010 on AIXchange

On a recent visit to the IBM Techdocs library I saw some documents that might interest you. I sorted the view by date, but you can also search by topics, authors, etc. Take a look around and see what I might have missed.

Ron Barker covers the capability to view users and tasks from the command line, a capability available with HMC V7R3.4. Read the whole thing to learn more about the lslogon and termtask commands.

Ravi Singh has this concise 39-page reference document on AIX, VIOS and HMC features and functionality.

It answers questions like: When will my operating system be withdrawn from support? What’s the latest technology level for my operating system? Will my operating system work with my level of hardware? Will a 64-bit kernel run with this operating system? Can I run a VIO server with my hardware? It also covers hardware support, virtualization, LPARs, I/O, memory, system management, networking, security, LVM, filesystems, licensed software and more. Some of the information goes back as far back as AIX 4.1.

Shawn Bodily has a nice PowerHA cross reference document that shows supported combinations between PowerHA, AIX, VIOS and NPIV. He also produced a document that shows version compatibility for supported versions of HACMP and AIX.

Here’s one more: Katharina Denkinger covers hardening AIX in an SAP environment using AIX Security Expert.

Obviously many other documents, covering other operating systems and technologies, are available. Take the time to look around, and be sure to share what you find in Comments.

Energy Efficiency Another POWER7 Benefit

Edit: The link to the press release still works. Green is an even bigger deal now.

Originally posted March 9, 2010 on AIXchange

More and more companies are becoming sensitive to their energy costs. Though the idea of going green inspires some jokes (if the walls in my datacenter are painted green, does that make me part of the green computing movement?), the reality is that energy costs are high and getting higher.

Rising cost isn’t even the whole story. I’ve seen datacenters that were running at maximum capacity — they literally couldn’t run any more circuits. Obviously consolidation should be a serious consideration in these cases, but even bigger steps are being taken.

A customer recently told me over lunch that some of his company’s largest expenditures are energy costs. In response, they’re planning to get off of the grid completely. The goal is to use solar and wind power, and to generate energy internally by burning biomass. I’ve heard other companies are looking to geothermal for heating and cooling.

An interesting and, perhaps, overlooked aspect of IBM’s recent POWER7 announcement is the energy savings potential of these systems.

From IBM:

“The new systems can deliver four times the performance and four times the virtualization capability for the same price, and are three to four times more energy efficient…

“IBM Power 750 Express, an Energy Star qualified business server for mid-market clients [offers] four times the processing capacity of its predecessor, the IBM Power 550 Express…

“POWER7 technology features ‘Intelligent Threads’ that can dynamically vary based on workload demand. With more threads, POWER7 can deliver more total capacity as more tasks are accomplished in parallel, such as monitoring the energy usage of millions of households by the minute in a smart grid. With fewer threads, those workloads that need very fast individual processing — such as real-time analytics or database transactions — can get the performance they need for maximum benefit. Intelligent Threads work on all POWER7 processors…

“IBM’s POWER7 systems are designed to make dramatically better use of energy. Unique Intelligent Energy technology allows customers to power on and off various parts of the system or to dynamically increase or decrease processor clock speeds based on thermal conditions and system utilization, on a single server or across a pool of multiple servers. POWER7 energy management technologies are integrated from its processor, to firmware, PowerVM virtualization, operating system support, and up to IBM Active Energy Manager software, included in the new IBM Systems Director Standard and
Enterprise Editions. As a result, the system dynamically balances between energy usage and performance and systems utilization based on policy. The result is improved performance per watt — more than two-times better than similar Intel x86-based systems, four times better than Sun SPARC servers and eight times better than similar HP Itanium-based servers.”

As we dig into the details of the POWER7 announcement and get excited about the performance gains we can expect, don’t overlook the greater energy efficiency. It’s one of the biggest benefits we’ll see as we upgrade our hardware.

My Pet Peeves, Twitter Edition

Edit: The things I used to complain about.

Originally posted March 2, 2010 on AIXchange

I’m not looking for work, but I know several people who are, so I like to keep an eye on AIX opportunities and pass along relevant leads to the job seekers I know. One thing that I’ve noticed with AIX job notices that are posted on Twitter: The posts seldom mention a crucial little detail, that being the location of these jobs.

For instance, I’ll see something like:

“AIX Admin wanted, check out this Web site.”

But there’s no mention of where the job is located. You’re forced to visit the Web site — and then, oftentimes, to register on the Web site — to obtain that information.

It shouldn’t be that complicated. Tell us whether you’re looking for a full-time employee or a contractor. Tell us the requirements. Give us the contact information.

And definitely tell us where this job is. Other than in the unlikely event that an employer is willing to provide significant compensation for relocation expenses, an employment opportunity in the U.K. probably won’t interest many job seekers in the U.S., and vice versa. So why not clearly state, “U.K. AIX admin wanted,” so people can easily see if the job is realistically worth pursuing?

Obviously plenty of folks are looking for work these days. If you’re a recruiter, don’t you want to make it as easy as possible for job seekers in your area to reach you?

As I’ve mentioned, I like Twitter and I think it can be a valuable business tool. But here’s another thing I see on Twitter that bugs me: Someone will tweet something, and then I’ll get re-tweets from 20 other people saying the same thing. Maybe it’s on me for not knowing how to filter redundant information, but I know it’s a waste of time to see identical tweets, sometimes over a period of days.

Here’s an example of what I’m talking about. Recently, the aix-l mailing list had a discussion about converting a filesystem from JFS to JFS2.

The discussion concluded with this answer:

“Just found out this can be done on AIX 6.1 with the -T option from alt_disk_copy. So IBM must be supporting it on AIX 6.1 now.”

Later that day, and the next day, I was inundated with tweets about this very same thing. At least some helpfully included pointers to related information on publib.

Certainly it’s good information. I guess I just need a way to better filter it without having to see it 30 times. Seriously, if there’s a way to filter comments in Twitter, please leave a message in the Comments section. I need to know.

IBM Power Systems vs. the Competition

Edit: The first link no longer works. The Youtube video no longer works. The links to the insight tool no longer works. The facts and features takes you to POWER8 hardware. The link to the replay no longer works.

Originally posted February 23, 2010 on AIXchange

Check out this comparison of IBM Power systems to systems from Sun, Intel and HP.

According to IBM, more than 2,100 users of competitive hardware have eliminated server farms by migrating to Power Systems over the past three years. The company adds:

“IBM is the only major vendor to gain revenue share in the UNIX segment for the past five years (+11.2 points) while both Sun (-1.9 points) and HP (-5.7 points) lost share, according to IDC.

“IBM has helped nearly 1,200 customers migrate from competitive Sun, HP and other UNIX platforms to IBM’s AIX or Linux on Power.”

It also refutes claims from other vendors, in particular Oracle’s announcement that Oracle and Sun SPARC Solaris achieved World Record TPC-C Performance beating IBM’s best results on DB2 with Power 595 Server.

From IBM:

“This is not a straightforward ‘apples to apples’ comparison: The IBM system is a single server while the Sun system is a configuration with a 12-node cluster. The Sun cluster had a total of 384 processor cores and 3,072 threads compared to 64 cores and 128 threads in the IBM system. With 512GB of memory per node the Sun cluster had a total of 6TB of memory, compared to 4TB in the IBM Power 595 system. While the Sun tpmC per core is 20,097, the IBM tpmC per core is 95,080. The IBM result had 4.7 times higher performance. The Sun system was not available until December 2009. The IBM system had been available since December 2008.

“In late August and early September (2009), Oracle ran an advertisement in The Wall Street Journal and The Economist making unsubstantiated superior performance claims about an Oracle/Sun configuration relative to an official TPC-C result from IBM. Oracle has been fined for making claims they could not prove.

“The Transaction Processing Council (TPC) — a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry — has determined that Oracle’s advertisement violated TPC policies by not adhering to basic standards of fidelity, candor and due diligence. Oracle, a TPC member, was fined for the violation, required to remove the advertisement from websites and report back to the TPC on their steps for corrective action and future compliance.”

Read the TPC press release and download TPC’s letter to Oracle.

Also, check out this YouTube video, “Sun, HP and x86 users are moving to Power.” It shows how customers can use the IBM migration factory to move to Power systems. The video features a company that migrated more than 20 Sun servers to two Power 570 boxes, saving more than $500,000 annually in software costs. I imagine their electric bill dropped a bit as well.

Relatedly, Oracle and SAP customers can use the IBM insight tool to help determine how they can consolidate their current server farms to IBM Power servers.

All the information linked in this post, incidentally, was published prior to the POWER7 announcement. Of course the new systems make IBM’s consolidation message even more compelling. For example, I want to highlight the rPerf numbers comparing POWER6 and POWER7 systems.

Compare a 12-core 5 GHz POWER6 (rPerf of 111.30) server to a 12-core 3.5 GHz POWER7 (rPerf of 140.75) server. Or compare the respective high-end boxes, a 32-core 4.2 GHz POWER6 570 (rPerf of 193.25) vs. a 48-core 3.5 GHz POWER7 770 (rPerf of 493.37). The numbers for the new systems are significantly better across the board.

Take the time to see why a customer might want to move up to Power Systems.

Note: Don’t forget to check out the replay from the most recent AIX Virtual User Group meeting. The Feb. 18-19 presentation is a two-parter covering POWER7.

More on Migrating to Virtual Storage

Edit: I must have really liked this Redbook. Facts and features seem harder to find, you can search for smarter system for a smarter planet videos. I could not find a working link to the POWER7 article about not validating SPARC. None of the links at the end still seem to work.

Originally posted February 16, 2010 on AIXchange

The now published IBM Redbook, “PowerVM Migration from Physical to Virtual Storage,” contains a wealth of valuable information. Here are a few more highlights.

* Choosing a SAN disk — The authors of the Redbook detail the various methods for determining which SAN disk to use when virtualizing: Physical Volume Identifier (PVID), IEEE Volume Identifiers and Unique Device Identifiers (UDID).

* The new chkdev command — Lots of material here. As the authors state:

“As of Virtual I/O Server Fix Pack 22, a new command has been introduced to assist with the identification of disks and their capabilities. The Virtual I/O Server chkdev command is capable of displaying the same values as mentioned previously (IEEE, UDID, and PVID) but provides some additional information:

#chkdev
NAME:                hdisk0
IDENTIFIER:          2F1135000C50006D9C1F30BST9146802SS08IBM-ESXSsas
PHYS2VIRT_CAPABLE:   YES
VIRT2NPIV_CAPABLE:   NA
VIRT2PHYS_CAPABLE:   NA

* What can be virtualized? Again, I’ll let the authors tell you:

“PHYS2VIRT_CAPABLE: This disk may be virtualized to a logical partition. Once this is performed, this field will change to a value of NA if the mapping is successful. A value of NO indicates this volume may not be virtualized.

“VIRT2NPIV_CAPABLE: If the disk is capable of moving from a virtual SCSI environment to an N_Port ID Virtualization (NPIV) environment, this field will be set to YES, otherwise it will be set to NO. A value of NA means that this disk has already been moved and the Virtual Target Device or VTD as it is abbreviated in the command output, will indicate the mapping.

“VIRT2PHYS_CAPABLE: If the device is capable of moving from a virtual environment to a physical environment and is currently mapped to a VTD then the value here will be YES. A value of NA means the disk is not in use by a VTD, while a value of NO means the disk is not capable of such a move. For further information, please refer to the Virtual I/O Server chkdev manual page.”

* It figures — The Redbook also includes some nice charts (see the Figures section in the table of contents) that help make concepts clear. For example, Figure 2-1 shows the relationship between physical SCSI disk and the target SCSI device on the Virtual I/O Server. Figure 2-2 shows the relationship between physical SCSI disk and the virtual SCSI devices on a client partition.

* NPIV — As we reduce our reliance on physical adapters, and with more customers adopting NPIV, it’s nice to see this topic covered:

“N_Port ID virtualization (NPIV) is a technology that allows multiple logical partitions to access independent physical storage through the same physical Fibre Channel adapter. Each partition is identified by a pair of unique worldwide port names enabling you to connect each partition to independent physical storage on a SAN. Unlike virtual SCSI, only the client partitions see the disk. The
Virtual I/O Server acts only as a pass-through managing the data transfer through the POWER Hypervisor.”

Finally, there’s this interesting note on page 29:

“A POWER Hypervisor has a limit of 32,000 pairs of WWPNs. If you run out of WWPNs, you must obtain an activation code for an additional set of 32,000 pairs.

“Each time you configure a virtual Fibre Channel adapter, whether dynamically or by adding to a partition profile, the HMC obtains a new, non-reusable, pair of WWPNs from the POWER Hypervisor. Therefore, the correct procedure for dynamically allocating a virtual Fibre Channel adapter to an active partition that must keep the configuration across a partition shutdown is to first dynamically allocate the adapter to the partition and then to use the HMC Save Current Configuration feature to save the configuration to a new profile. This new profile then must be used to start the partition after a shutdown. This will ensure that the WWPNs that were allocated during the dynamic operation will be the same ones in the profile. If instead, you dynamically add an adapter and then add an adapter to the
partition profile, the partition will come up with a different pair of WWPNs after a partition shutdown and access to the storage will be lost.”

This summary — and this, and this — barely scratch the surface when it comes to summarizing the valuable information in this publication. So take the time to read this Redbook.

Also, some interesting response to the POWER7 announcement:

POWER7 facts and features

“POWER7 has the most remarkable on-chip cache hardware of any processor on the market.”

Smarter systems for a smarter planet

POWER7 does not validate Sun’s CMT SPARC Processor Architecture

Active Memory Expansion Performance

Active Memory Expansion Overview and Usage

Power your planet with Power Systems

POWER7 is Here

Edit: People still want an iPad. People still run POWER7. Edited the link to the POWER7 Unveiled article at the end.

Originally posted February 8, 2010 on AIXchange

Apple’s just-announced iPad has, predictably, attracted tons of attention. People want to know about the operating system it runs, the hardware specifications and when the hardware will be available. And beyond the hype, they want to know whether the product will prove useful in their daily lives.

Similar questions will be raised and evaluations will be made by IBM customers in response to the company’s POWER7 launch. I’ll share some of the highlights of today’s announcement, but first, a quick point: Considerable work goes on behind the scenes in preparation for these announcement events. Weeks ago I signed a non-disclosure agreement with IBM. In return I received access to relevant training, information and materials. This is standard industry practice — by allowing, in this case, IBMers and business partners to get up to speed on the POWER7 product line prior to the announcement, these groups can then educate customers and answer questions about the new solutions on day one.

To me, one thing that immediately jumps out is that POWER7 can run in a POWER6 compatibility mode, making this the first generation of POWER hardware that supports previous technology levels. For instance, older operating system versions (AIX and IBM i) can run on the new hardware, and live partition mobility can occur between POWER6 servers and the new POWER7 machines. The binary compatibility between POWER6 and POWER7 allows customers to migrate partitions between their current serves and the just-announced ones. In other words, you’ll be able to move your LPARs to and from POWER6 and POWER7 hardware based on your business needs.

Note: Customers that are running POWER5 systems will need to upgrade to POWER6 before upgrading to POWER7.

AIX Support
Here’s a breakdown of AIX support on POWER7:

* AIX 5.3 and AIX6 TL2 and TL3 will run only in POWER6 and POWER6+ modes.
* AIX 6 TL4 or later will run in POWER7, POWER6 and POWER6+ modes.
* AIX 6 TL5 will include additional POWER7 performance enhancements with improved memory affinity.

Initially, these AIX Levels will be supported on POWER7:

* AIX 5.3 with the 5300-11 Technology Level SP2 or later.
* AIX 6.1 with the 6100-04 Technology Level SP3 or later.
* IBM i 6.1 with 6.1.1 machine code or later.
* VIOS 2.1.2.12 with Fix Pack 22.1 and Service Pack 2 or later.

HMC Support
HMC V7 R710 is the minimum level for POWER7 support. If an HMC is used to manage a POWER7 processor-based server, it must be either a CR3 or later model rack-mount HMC or a C05 or later model deskside HMC. If IBM Systems Director is used to manage an HMC, or if the HMC manages more than 254 partitions, the HMC should have 3GB of RAM minimum and be either a CR3 or later rack-mount or a C06 or later deskside model.

Active Memory Expansion
Not to be confused with active memory sharing, active memory expansion allows you to effectively trade some CPU cycles for memory size.

Once you’ve allocated physical memory to an LPAR, the server uses CPU resources to compress memory contents, thus allowing it to function as if it has more memory than it actually does. This reduces the physical memory requirements of existing LPARs and frees up physical memory capacity that can be used to create more LPARs in the same physical memory footprint. The active memory sharing feature is managed by the OS and the hypervisor. The OS compresses and decompresses data based on memory accesses, a process that’s transparent to applications.

I’ll have much more on active memory expansion in the near future.

Hardware Availability
General availability for the new Power 750 server is set for Feb. 19. This is a 4U server with four sockets. Customers will be able to select four, six or eight cores running at 3.0 to 3.55 GHz, meaning that a maximum of 32 cores are available on this machine. Maximum memory on the 750 is 512GB.

General availability for the new Power 770 server is set for March 16. This server has a similar form factor to the 570, with up to four 4U nodes. Each node can have 12 or 16 cores running either 3.1 or 3.5 GHz, allowing for a maximum of 64 cores on the 770. Max memory ranges from 512GB with one node to 2TB with all four nodes.

The new Power 780 server is also set for March 16 availability. This machine can also support up to 64 cores, running 3.86 to 4.14GHz. You can have a single enclosure with up to 512GB of memory, or four enclosures with up to 2TB of memory. The 780 features the TurboCore mode — this allows you to “shut off” four of your original eight cores and aggregate the L3 cache from the unused cores. With more L3 cache available to each core, you should have fewer cache misses and thus, increased performance.

TurboCore chips contain twice as much L3 cache per chip. These cores, when run in TurboCore mode, are twice as fast as POWER6 cores, so your applications can potentially see a doubling of the performance per core. Since software is often sold per core, customers that are sensitive to software pricing will see this option as a great way to go.

Speeds and Feeds
The chips themselves are very interesting. Depending on the machine, four, six or eight cores are available per chip (or socket). An 8-core chip with integrated cache and memory controllers has 12 execution units per core, 4-way SMT per core and 32 threads per chip. POWER7 L1 cache consists of 32 KB I cache/32 KB D cache. L2 cache has 256 KB per core, and L3 cache has 32 MB shared between cores on each chip.

As noted, POWER7 runs at 3 to 4.14 GHz. While clock speed is down compared to POWER6, we should see 20-30 percent higher performance in POWER7, due to increased thread counts per core.

More to Come

As these systems become available and I start to install them at customer sites, I’ll continue to let you know about my experiences with POWER7. However, now is the time to start looking at the new features and figuring out how these systems can help your business. IBM and IBM business partners can help you understand how you can benefit from making the move. For more information on the POWER7 announcement, visit IBM Systems Magazine, Power Systems edition.

Gaming Wasn’t Always This Easy

Edit: It is even easier these days. And I still would not advise joining in on games at work.

Originally posted February 2, 2010 on AIXchange

Today’s video game players are spoiled. Connectivity is no longer an obstacle. Many homes have high-speed Internet or broadband service, and in most towns, wireless Internet access points are abundant. And wireless mobile devices can access the Internet from practically anywhere.

On top of that, multiplayer video games are much more accessible. You can play a staggering number of games today, all for the cost of a game console and an Internet connection. (Of course you can also use your PC if you prefer.)

It’s truly amazing if you stop to think about it — especially if you’re like me and you’ve been gaming for awhile. Because in the mid to late 1990s — barely 10 years ago — it wasn’t so easy. Dial-up was the predominant Internet connection option for home PC users. Playing multiplayer games over a modem is unimaginable now; back then, it was an exercise in futility. The network lag was terrible and game play would usually suffer.

Around this time I remember some coworkers purchasing Quake 1. Everyone started playing the game over the LAN, a Novell IPX network that was just starting to migrate to TCP/IP.

Compared to dial-up, LAN games were great. Lag wasn’t an issue, and the capability to play others who sat in the same room or up and down the hall added to the enjoyment. Now if you killed them in the game, you could hear their screams.

So what were we doing gaming at work? Management was actually OK with it as long as we played after hours, when the network wasn’t so busy. It became a Friday afternoon event, complete with delivery pizza. We’d play until all hours of the night. Some of us — particularly if we also spent time playing at home — would greatly improve by the week.

As noted, it was cool watching watching coworkers erupt in anger and frustration when they met their demises in the games. And it was funny watching some guys on the phone, trying to convince their wives to let them stay longer. But it was also nice getting to know coworkers in a non-work setting.

Perhaps inevitably, eventually people started playing Quake 1 over their lunch hour. Obviously, this wasn’t smart. This was a production network, and people were trying to do their jobs. Having gamers shouting up and down the halls wasn’t exactly conducive to a professional work environment.

I remember two contractors who were on site helping with different projects being invited to play with the employees. I guess they figured that since it was lunch time, and they were asked to join in, there was no harm.

As the battle raged, the noise levels increased. The manager happened by, and, needless to say, he wasn’t pleased. He told the contractors to quickly pack up their things and leave. Then he called their bosses to complain. I don’t recall all of the ramifications of this incident, but I do know we had “free” contractors for quite awhile thereafter. All game play was banned, even after hours, forcing us to return  to our sadly slow dial-up Internet connections and their lag.

I recall those days fondly. And, despite how things ended up, I don’t think it was all bad, even for our employer. Of course gaming shouldn’t go on when there’s work to be done; we’re there to do a job. But on the flip side, being able to relax and enjoy the company of your coworkers in a more casual
environment can lead to team bonding and improved morale. I’ve seen it. And even today, I’ve come across offices with a PS3 or an XBOX connected to a TV in a conference room. People still need to blow off steam sometimes.

In any event, unless you have explicit permission, I would advise against playing on the production network during work hours. This goes double if you’re a contractor. Even if the customer is always right, think twice if you’re asked to join their gaming.

VIOS Fixpack Installation Issues

Edit: Hopefully none of you are running this version these days. Updated the link to Chris’ blog. Updated link to Chris’ blog post. Only one of the POWER7 links at the end still works.

Originally posted January 26, 2010 on AIXchange

Chris Gibson, who blogs about AIX at IBM developerWorks, recently wrote about some issues he encountered while trying to install VIOS fixpack 2.1.2.10-FP-22.

I recently had similar issues with this VIOS fixpack. When I ran the updateios command, I still had virtual media attached from the vio server to a vio client. When I tried to use my virtual media repository, I received this message:

$ lsrep
unable to retrieve repository date due to incomplete repository structure

I initially wondered if VIOS no longer supported the direct copying of .iso files into the /var/vio/VMLibrary, and if I should have used the mkvopt command instead. As Chris mentions, I also found that running the rmrep and mkrep commands fixed the problem with my virtual media repository.

I had called IBM Support around this time, as I was having other issues loading NetApp filesets into my vio server after running oem_setup_env.

I would get errors:

503-466 installp: The build date requisite check failed for fileset ios.cli.rte. Installed fileset build date is 0920. Selected fileset does not have a build date, but one is required. installp: Installation failed dueto BUILDDATE requisite failure.

We also found lppchk –v errors:

$ioslevel
2.1.2.10-FP-22

#oslevel -r
6100-03

# lppchk -v
lppchk:  The following filesets need to be installed or corrected to bring
        the system to a consistent state:

ios.cli.rte 6.1.4.1                     (usr: COMMITTED, root: not installed)

At the time Support had me load an efix onto the VIO server, but that’s since been superseded by level VIOS 2.1.2.10-FP22.1.

This just highlights the need to load fixes in a test environment before rolling them straight into production. Of course, some shops lack the necessary test and QA gear for this. Other shops put off upgrades, figuring that if it isn’t broken, don’t mess with it. And anyone can test upgrades and still miss subtle bugs that only appear under certain conditions.

Most people recommend doing as much testing as possible before making changes to production systems. Backing out of a change to production data is usually non-trivial, particularly if the systems are running for several days before a problem is found. At this point you can’t just go to a backup tape; you must also decide what to do with all of the changed data. No administrator wants to see an issue arise and have to wait for a fix to be created and applied to the system.

So what’s your approach to testing? Do you typically take the latest and greatest OS code and fixes and apply it in production, or do you usually lag a bit and make sure that things work in your test and QA environments before running them in production?

On another note, I’m seeing more information about POWER7. Check out this and this.

The Handy mkcd Command

Edit: This is still a useful method to know about.

Originally posted January 19, 2010 on AIXchange

Last week I noted that the soon-to-be completed IBM Redbook, “PowerVM Migration from Physical to Virtual Storage,” lays out the available options for migrating to virtual storage.

In chapter 2 of the Redbook, the authors provide what they call “core procedures”:

“There are a number of core procedures that are used in multiple scenarios of the accompanying chapters. These procedures have been documented fully in this chapter and provide additional notes about the procedures that will not be found in the fully worked through examples in subsequent chapters. Some of the additional notes are about issues such as best practices and some are additional diagnostic methods that may be used.”

“The core procedures are:
Using File Backed Optical devices to perform a restoration
Checking unique disk identification – IEEE, PVID and UDID
Creating a virtual SCSI device
Using virtual Fibre Channel and NPIV”

Having written about file-backed optical storage in the past (herehere and here), this section in particular piques my interest.

As noted in the Redbook: “File Backed Optical devices provide a clean, easy-to-use mechanism to take a backup of either a root or data volume group and restore it to a target device.”

By entering the mkcd command on your source system and then moving the resulting .iso image over to your VIO server, you can boot from this virtual optical device.

Back to the authors:

“To make image files, there are two methods that will be detailed: Using the AIX smitty mkcd command, or the mkcd command line from an AIX shell. Choose whichever method that is appropriate for your environment.”

Rather than run mkcd on the command line, I ran smitty mkcd and answered no when the system prompted me with, “Use an existing mksysb image?”

+————————————————————————–+
  |                      Use an existing mksysb image?                       |
  |                                                                          |
  | Move cursor to desired item and press Enter.                             |
  |                                                                          |
  |   1 yes                                                                  |
  |   2 no                                                                   |
  |                                                                          |
  | F1=Help                 F2=Refresh              F3=Cancel                |
  | F8=Image                F10=Exit                Enter=Do                 |
  | /=Find                  n=Find Next                                      |
  +————————————————————————–+

Then I gave it the filesystem in which to store the mksysb image, the CD file structure, and the final CD images:

Back Up This System to CD

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

[TOP]                                                   [Entry Fields]
  CD-R Device                                        []                      +

  mksysb creation options:
   Create map files?                                   no                     +
   Exclude files?                                     no                     +
  Disable software packing of backup?                 no                     +
  Backup extended attributes?                         yes                    +

  File system to store mksysb image                  []                       /
     (If blank, the file system
       will be created for you.)

  File system to store CD file structure             []                       /
(If blank, the file system
       will be created for you.)

  File system to store CD file structure             []                       /
     (If blank, the file system
       will be created for you.)

  File system to store final CD images               []                       /
     (If blank, the file system
       will be created for you.)

Finally, I selected:

Advanced Customization Options:
Do you want the CD to be bootable?  Yes
Remove final images after creating CD?  No
Create the CD now?  No

I hit enter and it created my .iso images:

673570816     cd_image_344258.vol1
676298752     cd_image_344258.vol2
676298752     cd_image_344258.vol3
676298752     cd_image_344258.vol4
20981760      cd_image_344258.vol5

I copied the cd images to my target system and then ran my loadopt command as padmin.

$loadopt -disk cd_image_344258.vol1 -vtd vtopt1

$lsmap –all

VTD                  vtopt1
Status                Available
LUN                  0x8200000000000000
Backing device  /var/vio/VMLibrary/cd_image_344258vol1

The LPAR booted as expected from the mksysb, and I ran through the restore exercise just to verify that everything worked as expected.

This mkcd command is incredibly handy when you’re building a machine. Maybe it’s not on the network yet. Maybe there’s no NIM server available. No problem. Just make some .iso images and boot from the virtual optical media.

I will plan to revisit topics from this Redbook in a future post, so stay tuned.

Migrating to Virtual Storage

Edit: I love talking about Redbooks.

Originally posted January 12, 2010 on AIXchange

A new Redbook, “PowerVM Migration from Physical to Virtual Storage,” is coming out soon.

The concepts covered here are critical to understand. Whether you’ve been virtualizing for years or you’re just getting ready to take the plunge, you need to know all of the options for migrating physical disks to virtual disks. I believe this Redbook can help.

One of the first things I noticed on pages 6-7 is a handy chart that’s designed to help you choose the appropriate option for migrating a standalone server to a client LPAR on a consolidated frame. These option include:

* Backup to CD/Tape and Restore on Virtual I/O Server managed disk.
* Mirror rootvg disks to SAN disks.
* Cloning rootvg to an external SAN disk using alt_disk_install.
* Other methods such as NIM.

The chart as a whole allows you to navigate the publication based on your migration needs. For instance, the first column is called “Migration Objective.” By clicking the items in this column, you’ll jump to the chapter with information about the specific procedure (e.g., Chapter 3, “Standalone SCSI rootvg to virtual SCSI” on page 45).

The authors provide these additional steps for using the table:

“1. Learn how to use the virtual SCSI and virtual Fibre Channel methods as described in Chapter 2, ‘Core procedures’ on page 9.
2. Choose from what configuration you are migrating (Migration Objective column) – the cells in this column are linked to the actual procedures, thus, clicking the cells is another way to quickly move through the publication.
3. Choose what you are migrating (Volume Group to Migrate column).
4. Choose which procedure (Migration Procedure column) suits your environment and your administrator skills.”

Finally, the authors make these general recommendations for performing any data migration:

“1. Back up the client system: Prior to making any changes, it is recommended that the source standalone and/or dedicated partition be backed up. Some of the migration procedures can be used to perform this backup. All backups require validation to ensure they are restorable.
2. Back up the configuration of the Virtual I/O Server that you will be modifying — the viosbr command has been introduced to the Virtual I/O Server commands for this purpose.
3. It is always a best practice to perform the migration procedure on test systems and data before applying to a production environment.”

One reason I highlight this section is its mention of the viosbr command. This is fairly new, and I only recently tried it for the first time. When I tested the command, I received this error because I had some disks mapped that were defined instead of available:

$viosbr -backup -file /backups/testbackup
Device “vhost1” is not in AVAILABLE state.
Device “vhost2” is not in AVAILABLE state.

So, to make the viosbr command happy, I deleted the devices it was complaining about:

$rmvdev -vdev hdisk1
vtscsi0 deleted
$rmvdev –vdev hdisk2
vtscsi1 deleted
$rmdev -dev vhost1
vhost1 deleted
$ rmdev -dev vhost2
vhost2 deleted

With these items deleted, I was able to successfully run viosbr. At this point I had a file called testbackup.tar.gz. I went into my /backups directory and ran gunzip and then tar –xvf on the file — this allowed me to view the contents. It was an xml file that contained the information that would be needed to recreate my VIO server.

If you run $help viosbr as padmin on your vio server, you should see all the different available options.  You can backup virtual and logical configurations, and list and restore the configurations of the VIO server.

I’d take some time in a test lab to get familiar with the different things you can do, including adding viosbr to crontab using the –frequency option. Next week I’ll cover some other information in this Redbook.

Think of Our Users

Edit: The technology may be different, but the principles are the same.

Originally posted January 5, 2010 on AIXchange

When I previously wrote about using my BlackBerry, I mentioned how it helps me in my travels.

I use my phone as my GPS, having it give me turn-by-turn audio directions to the different places I go. I travel quite a bit, and find myself in unfamiliar cities. I usually program the different addresses that I need before leaving, then bring them up on my phone once I reach my destination.

It’s been great for me. Of course I’m taking a chance that the GPS receiver and phone will always work, and that I’ll have network coverage wherever it is I am. However, I haven’t had any of those issues. But the one problem I have experienced is the one I didn’t expect–a failure with the application vendor as the source.

Whenever I drive a long distance, I usually wait to activate GPS. Generally I don’t need directions for the entire trip, just the very end of it. Why use GPS for, say, all 120 miles when I really only need it for the final five?

So recently I’m on the road and getting close to my destination. When I activated the application, I was informed of a critical update. I had no idea what the critical fixes were, and I only received two options: download the new application or quit. I would have preferred to download the update later, but no such luck.

I decided to pull off of the highway and download the application. It was a relatively smooth process. I was worried that I might have to re-enter an authorization code, but the application came right up when I restarted it. However, all of the addresses that I’d entered were lost.

It wasn’t a disaster. I had plenty of time, and I managed to find the one address I needed at that moment and plug it back in. But had I been in a rush and lacking ready access to the address of my destination, I could have been in some trouble.

This experience got me thinking about the testing that went into this code deployment. Did the application vendor really not consider the urgent needs of GPS users? How can you force someone who’s in transit to do an on-the-spot upgrade? At a minimum, users should have the option to perform such upgrades at their convenience. I know I like to upgrade when I have the time to work on any issues that might arise, not when I’m racing down a highway.

And thinking about that got me thinking about planned downtime and our own users. Do we administrators give them adequate notice and advanced warning about the updates we apply and the system changes we make? Do we test and verify in the lab, or do we push these changes without considering the consequences?

A friend keeps analog backups before he goes anywhere. He’ll print out the directions and the addresses he needs–just in case his technology lets him down. Do we admins take similar precautions? Before making any changes, do we make certain we have adequate hard copy documentation and sufficient backups, just in case our upgrades go bad?

The experience with my GPS was only a minor annoyance, but I wonder how many others have landed in a world of hurt by performing forced upgrades on the fly and losing all their information at the same time.

Let’s think of our users–our customers–whenever we make decisions that impact them.

AIX 6 Update

Edit: The replay link worked, the materials did not, I imagine a search engine may be able to find them if you really want them. The chart link does not work either.

Originally posted December 22, 2009 on AIXchange

Another great Central Region Virtual users group meeting was held on Dec. 10. IBM’s Jay Kruemcke covered what’s new in AIX 6 since the release of TL4. If you missed the meeting, listen to the replay here and get the presentation materials here.

In response to a question about which features worked on which processors and versions of AIX, the organizers also posted this handy chart that details the requirements for enabling features like Live Partition Mobility, WPARs, virtual memory, etc.

Here are some other highlights:

* Workload Partition (WPAR) SAN support–According to Kruemcke, WPARs are able to own SAN devices, allowing WPAR administrators to directly manage their own storage, thus reducing administrative effort and increasing flexibility. So instead of running the lspv command inside of a WPAR and having nothing come back, you can map LUNs directly to your WPAR and use them as you would any other hdisk.

* N_Port ID Virtualization (NPIV) for Blade–The NPIV capabilities that have existed on our standalone servers are now available for our blade servers. I found this definition online: “NPIV enables a single HBA port to register several unique WWNs with the fabric, each of which can be assigned to an individual virtual machine.” I tend to think of NPIV the same way I think of virtual Ethernet, where multiple logical devices use the same physical device.

* Fibre Channel over Ethernet Fibre Channel over Converged Ethernet (or commonly referred to as FC over Ethernet)–Jay says AIX will provide a native 10GB solution, and VIOS will facilitate sharing of this adapter.

* LPAR Mobility Phase 2–Says Jay: “VIOS will preserve slots and device names across Live Partition Mobility operation.” Right now when you do a Live Partition Mobility operation, the virtual adapters get renumbered on the destination machine. If you move an LPAR back and forth a few times you can quickly end up needing to re-document which virtual devices are mapped to what.

* VIOS Usability Lightweight backup–This new VIOS option saves off just the customized data (such as device mappings and other pertinent ODM info) in a XML file. This allows backups to process much more quickly. Instead of saving the entire VIOS operating system, you’re only saving what you need in order to rebuild your environment. The rest can be restored from install media.

Learn much more by listening to the presentation and the Q&A. Although keeping up with the changes in the operating system can be challenge, these free user group events are another relatively simple way to broaden your knowledge and invest in your career.

The Value of an Open Mind

Edit: Nobody knows it all. Surround yourself with smart people.

Originally posted December 14, 2009 on AIXchange

Recently a customer was trying to repurpose an old POWER5 machine. It had been connected to an HMC, but the customer wanted it to run in standalone mode in a data center that lacked an HMC to connect it to.

I was running hyperterm on an old laptop on the raised floor. (How old was it? The laptop was old enough that it actually had a built-in serial port.) Using a serial cable to connect to the server’s serial port, I was able to connect to the server and look at the service processor menus. I could return it to standalone mode by resetting it to factory defaults. Then I powered on the system, and everything worked great until it booted up. At that point, the hyperterm screen went blank and the system didn’t respond.

I found that strange, although I know that serial ports can become unresponsive when servers are connected to an HMC. Maybe the server somehow still responded as though it was connected to an HMC. I knew it wasn’t a cabling or a laptop issue, since I’d just used them to access the menus over the same serial cables.

I asked a buddy who told me to swap out the serial cable. This struck me as a stupid idea, and I told him so. Nevertheless, I reluctantly tried his suggestion. My buddy was right. He’d recently run into the same issue where one serial cable wouldn’t work when he was using the console.

The point here is it’s best to be open-minded. In IT, a lot of us know quite a bit, but none of us knows it all (even if a few of us think they know it all and are never afraid to say so). You should have access to smart people. Even if you aren’t fortunate enough to work with these folks, you can still find them through user group meetings, discussion forums and various other channels. Networking isn’t just good for finding a job; industry contacts can, at times, help you do your job.

So listen to those around you. Their suggestions can save you time and money. Their ideas — even the “stupid” ones — can be the answer to your problems.

Note: From time to time I hear about resources that may not warrant their own post. Here’s another one.

Check out the section that covers AIX and HACMP lessons and let me know what you think.

The Case for Standardization

Edit: With automated builds and golden images it is even easier to have standards.

Originally posted December 8, 2009 on AIXchange

This fall I attended the RedHat conference. The conference hosted a break fix challenge, and the outcome was interesting.

Consider two particular teams of administrators that participated in the challenge. One team came from a large company that adheres to very strict server build and maintenance standards. Each machine its administrators touch is identical; there are no deviations. Every node admins login to is the same as every other node. You can see how this makes administration easier. When all of your tools, logs and cron jobs are in the same place, running at the same time, troubleshooting can be simplified. When you work in the same environment every day, you become very good at fixing that set of machines.

Another team came from a large company that manages machines for customers. When something breaks, these admins literally have no idea what they might find. Where are the tools? Where did the customer load the scripts? Which jobs are running when? When these folks get a call, they must spend some time getting familiar with the environment.

You can imagine which team fared better in the break fix challenge. Although I’m sure there are shades of gray when describing these administrators, the guys that constantly go into unknown environments to solve problems prevailed.

But what about administering your environment? Would you rather have the admins from the first team or the second team? You might think that there’s no need for standard server builds. You want to keep your guys sharp. You want them on their toes. You want them to be able to figure out the environment before they go fix things.

While this makes sense when you’re managing machines for customers, when they’re your own machines, I certainly prefer a standardized approach. I know when I’ve worked across large teams where we shared pager duty, it made life easier knowing I could log into a machine and find things where I expected to find them. Solving problems was quicker. I could get back to sleep sooner.

I’ve said it before (here and here), and I continue to believe that companies benefit from having good server build documentation and procedures.

For instance, I was recently at a customer site. The company had had some staff turnover, and the machines were set up in different ways. This caused nothing but headaches for the new guys who were trying to figure out the environment and prioritize the projects.

Standardization does make life easier, both for the teams that we work on now, and those who will follow in our footsteps. So let’s have good documentation, and good server build documentation. It’s fun to sharpen our skills to win a contest, but let’s make sure we’re sharpest where it counts. Let’s build the perfect automated system to turn out identically built servers.

In IT, Right is Might

Edit: This still rings true.

Originally posted December 1, 2009 on AIXchange

Having previously written about management in “A Tale of Two Managers,” I really liked this Computerworld article. Some quotes:

“Geeks are smart and creative, but they are also egocentric, antisocial, managerially and business challenged, victim-prone, bullheaded and credit-whoring. To overcome these intractable behavioral deficits you must do X, Y and Z.”

“… my personal experiences working within IT groups have always been quite good, working with IT pros for whom the negative stereotypes just don’t seem to apply.”

“Few people notice this, but for IT groups respect is the currency of the realm. IT pros do not squander this currency. Those whom they do not believe are worthy of their respect might instead be treated to professional courtesy, a friendly demeanor or the acceptance of authority. Gaining respect is not a matter of being the boss and has nothing to do with being likeable or sociable; whether you talk, eat or smell right; or any measure that isn’t directly related to the work.”

There’s much more, and I encourage that you read it all, but this idea struck me: I’ve worked with arrogant people, but I put up with them and their arrogant attitudes because they were really, really smart. I knew that when I asked a question, I’d get the right answer every time, end of story. Sure, I may have wished these people were wrong occasionally, but it never happened. They were always right, and being right demands respect.

The article expresses it this way:

“While everyone would like to work for a nice person who is always right, IT pros will prefer a jerk who is always right over a nice person who is always wrong. Wrong creates unnecessary work, impossible situations and major failures. Wrong is evil, and it must be defeated. Capacity for technical reasoning trumps all other professional factors, period.”

Fortunately, I have had the privilege of working with a few nice people who also knew their stuff. These people not only knew what they were talking about, they also patiently took the time to help others understand what they already knew. Yes, they were busy — in part because they were constantly bombarded with questions from people who were unable or unwilling to find answers on their own. However, rather than berate their questioners, they took the time to help.

Are you a jerk who’s always right? Are you nice but often wrong? Are you really nice and really good? Be honest.

Migrations Made Simple with NIM

Edit: The migrate DVD is no longer there. The fixpack is no longer there.

Originally posted November 17, 2009 on AIXchange

I had some VIO servers running 1.5.2.5-FP-11.1 SP-01 that I wanted to upgrade to 2.1.2.10-FP-22. The machines were in a remote location and consisted of both blade and “regular” servers. While I could have recruited someone to burn the migration .iso image and physically install the media into each DVD drive, I wanted to keep things simple. I decided to use NIM for the migration.

It helped that I could follow this great document that outlines the necessary steps for conducting an NIM migration.

I started by downloading the migration .iso image from here and the VIOS fixpack from here.

Then I loaded the migration .iso image into a virtual optical library (something I discuss here).

After making the DVD image available to a client LPAR using my virtual optical device, I mounted the DVD in the client LPAR. Then I used NFS to export the mounted DVD from my client LPAR over to my NIM server, which in this case happened to be a stand-alone server. Obviously, if my NIM server had been a client LPAR of my VIO server, I could have used the virtual optical directly to my NIM server, but I had to get a bit more creative here. Ultimately though, I accomplished my goal, which was to be able to mount the DVD image on my NIM server.

I made sure to bring my NIM server to the latest AIX level (6100-04-01-0944 as of this writing) and then followed the instructions in the aforementioned Migrate VIO document.

In my case, I first ran:

“gencopy -d /mnt -t /export/vios all”

After everything was copied from the optical media, I ran these commands:

“nim -o define -t lpp_source -a location=/export/vios -a server=master
vios_migration_lpp”

“nim -o define -t spot -a location=/export/spot -a server=master –a
source=vios_migration_lpp vios_migration_spot”

The first command defined my lpp source; the second defined my spot.

At this point I logged onto my NIM server so I could set up my NIM client to install using rte. (By the way, here’s more info on NIM.)

I rebooted the VIO server and went into SMS mode. After verifying that the network installation settings were correct, I booted the VIO server from the NIM server. Then I selected the option to have it migrate my VIO server. Everything came up and ran as expected. After the reboot, it reported that it was at level 2.1.0.0.

Then I loaded the fixpack by NFS mounting my fix directory and running:

updateios -accept -install -dev /nfsmnt/fixes/vios.fixpack.22

At this point, my VIO server reported that it was at level 2.1.2.10-FP-22. That’s where I wanted it.

Physical media is fine; use it all you want. But I’ll take virtual optical and NIM every time.

A ‘Hot’ New Option for Dealing with SSDs

Edit: The links at the end no longer work.

Originally posted November 10, 2009 on AIXchange

For a while now customers have been asking me how to identify the filesystems and physical and logical volumes that are good candidates to move to solid state drives (SSDs). Given these drives’ I/O characteristics and the higher prices they currently command, you should be selective about the data you move from traditional drives to SSDs.

Obviously you don’t want to put seldom-used data on these pricey new drives–just leave that on your older, slower disks. Your SSDs should house critical, heavily used data. Fortunately, figuring out how to get the most from your SSDs just got easier.

The latest AIX releases–5.3 TL11 and 6.1 TL4–include new options that can help you determine the data that’s best suited for SSDs. I updated some of my machines so I could test out these new options, and they seem to work as advertised.

The key new option is found with the filemon tool and its new flag, -O hot, which creates a report that shows you your most frequently accessed data.

To receive my filemon output, I ran:

filemon -O hot -A -x “sleep 20” -r fmon -o fmon.out

Incidentally, -O hot must run in conjunction with the -A flag to enable automated offline mode. It cannot run in realtime mode. I know this because I tried and received this error:

filemon -O hot  -o fmon.out
hot option not supported in realtime mode.

Once I had my output file, I ran trcrpt:

trcrpt -r fmon.out > fmon.rpt

Then I viewed a report showing me hot files, hot logical volumes, and hot physical volumes by running: more fmon.rpt

I intentionally ran the trace ran at a time where my disks were heavily utilized. Actually, it may make sense for you to run these traces several times, especially during peak workloads. This will give you a good idea about which files and physical and logical volumes may be good candidates for moving to SSDs.

This data will greatly help you make informed decisions about moving filesystems and physical and logical volumes to SSD.

While I do regularly work with and speak to customers, SSD is fairly new technology, so there’s plenty more to be learned about how people are using these new drives. So please enlighten me and your fellow readers by sharing your experiences with SSDs in Comments.

On an unrelated note, I saw these interesting articles about tech support for the International Space Station that I thought I’d pass on.

http://crave.cnet.co.uk/gadgets/0,39029552,49304003,00.htm

http://crave.cnet.co.uk/gadgets/0,39029552,49303917,00.htm

Tools Gold

Edit: This is all still good stuff I use.

Originally posted November 3, 2009 on AIXchange

AIX pros have been using VNC and Screen for a long time. Years ago I wrote about these handy tools.

For a more recent and more in-depth look at Screen’s capabilities, check out this article from IBM developerWorks.

I literally use Screen every day, yet I’m amazed whenever I find people who still don’t know about it. I guess the methods and tools that I take for granted are always new to someone. Another tool that I use daily is called Synergy.

I can remember showing a then co-worker how Synergy allowed me to have my Windows-connected mouse and keyboard also control a Linux machine via the network. (And, yes, the keyboard was a Model M. Of course it was.)

I had two systems, each with a monitor attached. Using Synergy, I can move my mouse seamlessly between the desktops, and even cut and paste between systems. This can work with multiple Windows machines, Linux machines or combinations of each. I had a chuckle a few years later I visited that co-worker; he was still running Synergy and had been spreading the word. Now others in his office were using it, too.

I always like hearing about how people use their systems. How are they being backed them up? How many monitors are there, and are they extended desktops or different machines, with each running synergy? Are vnc and screen being used? What different things are being done on the command line?

For chatting, I like pidgin.

From Wikipedia:

“Pidgin (formerly named Gaim) is a multi-platform instant messaging client, based on a library named libpurple. Libpurple has support for many commonly used instant messaging protocols, allowing the user to log into various different services from one application.”

For a good ssh/scp/sftp client on Windows, there’s PuTTY, which, as explained here, is “a free implementation of Telnet and SSH for Win32 and UNIX platforms, along with an xterm terminal emulator. It is written and maintained primarily by Simon Tatham.”

Whenever I’m forced to use a Windows workstation, I rely on several options that will give me a UNIX-like experience, including cygwinVMware (also here) and Damn Small Linux (DSL).

As explained here, DSL is “a very versatile 50MB mini desktop-oriented Linux distribution. Damn Small is small enough and smart enough to… boot from a business card CD as a live Linux distribution (LiveCD), boot from a USB pen drive, boot from within a host operating system (that’s right, it can run *inside* Windows).”

Which tools make your life easier? Please share your experiences in Comments.

More From the Grab Bag

Edit: The developerworks link no longer works.

Originally posted October 27, 2009 on AIXchange

Every now and again I like to present a grab bag of links that I find to be interesting, even if they don’t warrant their own post.

For instance, some weeks back I wrote that more people are talking about AIX on Power and its industry-best downtime numbers.

Here’s more on that topic.

“Among the customers surveyed by ITIC, IBM’s Power Systems running AIX experienced (this includes older System p and pSeries iron) the least amount of downtime per year, when averaged across all customers using these platforms. AIX shops reported an average of 0.42 Tier 1 incidents per year and 0.34 Tier 2 incidents, and not one customer reported a Tier 3 outage on their AIX boxes. The Power Systems machines (and this includes older System i and iSeries iron) had an average of 0.56 Tier 1 outages per year, 0.44 Tier 2 outages per year, and 0.12 Tier 3 outages. So in 2009 at least, the i platform fared a little worse than the AIX platform running on Power iron.”

I also liked these recent articles:

* From ComputerWorld:

“The National Oceanic and Atmospheric Administration (NOAA) system, which went online in August, is compromised of ‘Stratus’ and its backup, ‘Cirrus,’ two separate supercomputers with about 5,000 IBM POWER6 chips running AIX operating systems.”

* IBM developerWorks has this piece on a script that can gather information about your systems. Then you run the diff command to find the differences between them.

* Here, the reuse of SAN disks is examined:

“The support of these scenarios in which remapping, copying, and reuse of SAN disks is allowed and supported has never been officially documented. There have been some documents and IBM Redbooks that have claimed support for specific scenarios, but they do not list the specific steps or restrictions. The scenarios detailed here guide systems administrators through the steps taken to achieve the specific environment desired. They also attempt to explain why the setup must be followed to achieve the desired results. If the steps are not followed, in some cases the system may not boot.”

* Here’s an IBM Systems Magazine Web Exclusive on using the mkramdsk command:

“One of the most common concerns for system administrators is maximizing disk performance. The AIX command mkramdisk is ideal for producing very high speed I/O by letting the memory do all the work. Database administrators are well aware of the benefits of keeping frequently accessed data in memory in order to reduce the need to retrieve that data from disk. The AIX mkramdisk command allows system administrators to create memory-resident file systems. The performance benefits of using RAM disk can be astonishing. The unload of a large TSM database was reduced from 40 hours on SAN disks down to 10 minutes using RAM disk.”

* Finally, here’s something to look forward to with the POWER7 processors:

“IBM’s Hot Chips presentation on its forthcoming 45nm POWER7 server processor had a wealth of information on the chip… POWER7 will come in 4-, 6-, and 8-core varieties, with the default presumably being the 8-core and the lower-core variants being offered to improve yields.”

If you’ve found interesting articles online, please share your links in Comments.

Migrate When the Time is Right

Edit: The statement of direction is no longer working. The POWER7 link no longer works.

Originally posted October 20, 2009 on AIXchange

I know that many of you have been upgrading from older technology to POWER6 servers. As one customer recently told me, “The machine is working great. Performance is better than I expected.” That seems to be a constant theme: people are happy with the newer hardware when they get it deployed.

Still, I’m sure that many others have been delaying their upgrades. They figure that POWER7 will be here soon enough, so why not just put that upgrade on hold for a while?

As I noted, IBM recently issued a statement of direction that I consider to be good news for these customers. If you’ve been debating whether to order a POWER6 server now or wait until POWER7 comes out, you can have the best of both: POWER6 performance now, and POWER7 when it ships.

From IBM:

“IBM plans to provide an upgrade path from the current IBM Power 595 server with 12X I/O to IBM’s next-generation POWER7 processor-based high-end server. The upgrade is planned as a simple replacement of the processor books and two system controllers with new POWER7 components, within the existing system frame. IBM also plans to provide an upgrade path from the current IBM Power 570 server with 12X I/O to IBM’s next generation POWER7 processor-based modular enterprise server.”

More details are emerging about POWER7 processors.

“‘POWER7 is an 8-core, high performance server chip. A solid chip is a good start. But to win the race, you need a balanced system. POWER7 enables that balance…’ Starke noted that the POWER7 offered ‘multiple optimization points,’ such as improved energy efficiency, upgraded thread performance, dynamic allocation of resources and an ‘extreme’ increase in socket throughput. In addition, the POWER7 provides scalability up to 32 sockets, 32MB on chip eDRAM shared L3, dual DDR3 memory controllers, 100GB/s memory bandwidth per chip (sustained), 360GB/s SMP bandwidth/chip and 256KB L2 per core.

Also:

“Basically, the POWER7 is an 8-, 6, and 4-core chip with 1.2 billion transistors, running at an undisclosed clock speed. A shared L3 cache of up to 32 Mbytes in size will use eDRAM. The POWER7 will scale up to 32 sockets and 1,024 threads. Not surprisingly, it will be backward-compatible with the POWER6.

“Not surprisingly, the performance of the POWER7 exceeds the POWER6 by a significant amount, although IBM has left off actual numerical comparisons. Application comparisons such as integer workloads seem to indicate an improvement of about 20 percent across the board on a per-core basis, and a 4X to 5X performance when compared chip to chip.”

Hardware constantly evolves, and every organization needs to evaluate the tradeoffs when comparing performance to the costs of acquisition and running new machines. Keeping up on all of the benefits of POWER6 and POWER7 compared to older technology that you might be running today can be a challenge. However, by creating an upgrade path that allows your organization to migrate when the time is right, IBM has made it easier to protect your investment.

Changing the padmin .profile

Edit: This should still be useful.

Originally posted October 13, 2009 on AIXchange

Recently I worked with a customer who was frustrated with the VIO server. When he logged in as padmin, he was faced with a $ prompt. When he ran oem_setup_env and became root, he had a # prompt. Whenever he switched users, he had to remember to run set –o vi to get access to his command history. He would log into two VIO servers and couldn’t easily keep track of the one he was working in. For all of the other machines in his environment, he had customized his .profile to his liking. He customized his prompt and the name on his title bar, and would automatically run set –o vi, etc. However, those oddball VIO servers weren’t playing nice.

I know that we’re supposed to treat the VIO server like an appliance. We’re supposed to set it up and let it run. But in this instance, we were still in the middle of setup. We were mapping LUNs to different vhost adapters for different clients, and we were going into oem_setup_env to install additional drivers.

Often I see customers set up their .profile with things like set –o vi, or customize their prompts with things like the user ID they’re logged in with, their current working directory, their hostname, etc.

While you can certainly make changes to the /home/padmin/.profile file, they will usually go away whenever you upgrade your VIO server. That’s fine; since we’re supposed to treat it like an appliance, I understand that we shouldn’t change things. In reality though, I continually find customers who alter the padmin .profile.

Even when customers change the .profile, those changes don’t carry over when running the oem_setup_env command. For the sake of usability, something had to give in this case. Whenever we ran the oem_setup_env command, we had to run manually run /.profile. This customer was getting fed up.

After asking around for a more elegant solution, someone reminded me to just use a .kshrc file.

I was told to create a /home/padmin/.kshrc file. I put set –o vi and my other desired prompt settings into that file. Then at the end of the /home/padmin/.profile, I added:

export ENV=/home/padmin/.kshrc

This allowed the customer’s desired environment to be set up automatically when he logged in as padmin. Then when he switched over to root by running the oem_setup_env command, that was set up as he wanted it.

Yes, I realize that .profile and .kshrc will have to be revisited when this customer upgrades his VIO server. But in the meantime, this technique allowed us to customize the environment to his liking. It helped me keep a customer happy. Maybe it will make you happy as well.

With Virtualization, Nothing Compares to Power

Edit: The first link has been changed to a new location.

Originally posted October 6, 2009 on AIXchange

Although this is written about mainframes, I think that many of its arguments also hold water when discussing Power Systems servers.

From the article on mainframes:

“Joe Clabby, who leads the analyst firm Clabby Analytics, said the trend off mainframes and onto x86 servers running virtualization is actually a step backward.

“‘PCs are not mainframes, and VMware is nowhere near mainframes from an advanced virtualization and provisioning perspective,’ he said. ‘Mainframes are so much more advanced than VMware that it may take that customer 10 years to get what they have now in virtualization, automatic provisioning and workload management.

“Not surprisingly, mainframe kingpin IBM concurs. ‘It’s like comparing a Mini Cooper to a tractor-trailer truck. Sure, the Mini Cooper is more efficient, but if you are moving out of your house, which one is better to have? Which one will be able to carry that baby-grand piano? And how many trips would you need to do the same job with the Mini Cooper?’

” ”They’re buying into a broken model of computing known as distributed computing, where management costs are way out of control and where they’ll scale by adding more blades, racks or servers — all running at maybe 40 percent of utilization because they have to leave headroom for their application and database servers to handle peak workloads,’ Clabby said.

“‘Start adding up all the networking costs, the tons of people they need to manage the distributed environment they’re putting in place, the multiple security licenses, the business continuity measures they’ll have to put in place–they’re out of their minds if they think they’re going to save big money doing this,’ he added.”

Yes, you can run some software hypervisor and virtualize your x86 hardware. But can you combine dedicated and virtual adapters in the same partition the same way you can with Power systems servers? Is your hypervisor running at the bare metal level, or is it running Linux code? Can you run micropartitions and share CPUs?

Once you look at all of the things that you can do with PowerVM, I don’t know why you’d want to bother with anything else.

The Case for Linux on Power

Edit: It is even easier now to run Linux on Power. The link to the Lx86 site no longer works. The link to the whitepaper no longer works. The link to the supercomputing site does not work, but I added a link to the ibm newsroom that looks like the origin of the information. The statement of direction link no longer works.

Originally posted September 29, 2009 on AIXchange

People occasionally ask me why they should run Linux on Power hardware. Their first inclination is typically to deploy Linux on some inexpensive commodity x86 hardware. They don’t even think about the benefits of running Linux on Power. They believe they’ll need to recompile their binaries to even get them to run. They assume the extra effort will outweigh the benefits.

As I’ve noted in “Lx86 Works as Advertised,” you can run unmodified x86 binaries on top of Linux on Power by using Lx86. From IBM’s PowerVM Lx86 for x86 Linux applications Web site:

“PowerVM Lx86 supports the installation and running of most 32-bit x86 Linux applications on any System p or BladeCenter model with POWER6, POWER5+ or POWER5 processors, or IBM Power Architecture technology-based IBM BladeCenter blade servers. It creates an x86 Linux application environment running on POWER processor-based systems by dynamically translating x86 instructions to Power Architecture instructions and caching them to enhance performance, as well as mapping x86 Linux system calls to Linux on POWER system calls. Best of all, no native porting or application upgrade is required for running most x86 Linux applications.”

Recently I was at a session where Linux on Power was discussed. And a few benefits of running Linux on Power were raised that I think are worth noting.

  1. IBM PowerVM virtualization technology provides the option of dynamically optimizing the mix of processor, memory, disk and network resources. This means that I can add just another LPAR to my frame to handle that Linux workload. Instead of going out and buying and managing another server, I just give it the necessary LUNs, shared Ethernet, memory and CPU allocation, and away I go.
  2. Reliability matters. The IBM whitepaper “IBM Power Platform Reliability, Availability and Serviceability,” is well worth reading. I’ll highlight this point: “The base reliability of a computing system is, at its most fundamental level, dependent upon the intrinsic failure rates of the components that comprise it. Very simply, highly reliable servers are built with highly reliable components.” Do you want your Linux machines running on commodity hardware, or do you want them running on reliable hardware?
  3. Device drivers don’t matter so much. Often you may find that there are no drivers for your hardware. However, this is less of an issue with Linux on Power, especially in a virtualized environment. Using a VIO server, how you get to the storage or the network isn’t a concern; your VIO server handles that. You’re just using a virtual adapter in your client LPAR. This allows you to use enterprise hardware without waiting for a device driver.
  4. Power is less expensive overall. This is highlighted in an article on supercomputingonline.com titled, “IBM Power Systems Again Trump Latest Intel Offering, Highlights Total Systems Value.” From the article: “According to an International Technology Group (ITG) report, three-year costs for use of Power servers ranged from 25-33 percent less than those built around commodity x86 servers in an SAP solution-based environment. Conversely, the three-year cost for the use of x86 servers was 48 percent higher than for Power servers in the retail industry, 49 percent higher in manufacturing, and 33 percent higher in energy. Costs outlined in the report include hardware acquisition and maintenance, systems software licenses and support, including operating systems and virtualization tools, personnel and facilities.”


Obviously in some cases, running Linux natively on x86 hardware is the best choice. However, instead of dismissing the idea out of hand, take the time to research whether it might make sense to add a few LPARs to an existing enterprise frame.

On an unrelated note, check out this recent IBM statement of direction:

“IBM plans to provide an upgrade path from the current IBM Power 595 server with 12X I/O to IBM’s next-generation POWER7 processor-based high-end server. The upgrade is planned as a simple replacement of the processor books and two system controllers with new POWER7 components, within the existing system frame. IBM also plans to provide an upgrade path from the current IBM Power 570 server with 12X I/O to IBM’s next generation POWER7 processor-based modular enterprise server.”

This is good news if you’ve been debating whether to order a POWER6 server now or wait until POWER7 comes out. Now you can have the best of both–POWER6 performance now, and POWER7 when it ships.

Finding a Hidden Gem

Edit: AIX is still a hidden gem, and people still need access to test machines.

Originally posted September 14, 2009 on AIXchange

While wandering around Chicago recently, I discovered a little museum along the Chicago River. Admission to the McCormick Tribune Bridgehouse & Chicago River Museum was only $3, so I figured, “why not?” I went down the stairs (the museum is below street level) and found myself face to face with the gears that are used to raise and lower the Michigan Avenue Bridge. Then, as I climbed some steps, I found myself standing inside the Bridgehouse. From the two-story structure I could look out the windows and watch the tourists go by.

The place was air conditioned. The staffers were friendly. The restrooms were clean. And I was the only visitor. It is a relatively new museum, and no one seems to know about it yet.

From the website:

“The McCormick Tribune Bridgehouse & Chicago River Museum is located in the southwest bridge tower of the Michigan Avenue Bridge. A charming space with stunning views… The historic bridge tower is itself a relic of another time, when bridge tenders operated the opening and closing of the bridge from the narrow tower. Visitors to the Bridgehouse Museum are treated to a rare look at the interworkings of the bridge that is lifted by two 100 horse powered engines.”

I wanted to run out and let greater Chicago know what it was missing. The staffers told me how they’d occasionally head to the street level and hand out fliers, but most people would keep walking. They weren’t interested in learning more about the bridges or the river. They preferred taking bus and boat tours.

This little museum actually makes me think of AIX. Both are cool, and both are sometimes overlooked by the world around them.

Consider other UNIX systems. I work with Linux. I’ve always argued that what makes Linux easy to learn is that it’s easy to get. Nearly everyone has access to an x86 machine, and you can download and play around with any of several distributions.

And before there was the ubiquitous Linux, there was Solaris. Back when I was in school, Solaris was running all over in universities, so students used it. And because they had access, they could easily make the transition to using Solaris in the business world.

In contrast, getting your hands on Power Systems hardware and copies of AIX can be difficult if you don’t already work for a company that has it in-house. It’s a vicious circle. People aren’t exposed to AIX, so they don’t know much about it. Then, when they do get to touch it, it’s foreign to them.

Without the hardware at home, and without access to the test lab or sandbox machines, becoming proficient with AIX can be a challenge. Sure, employers can send folks to formal classes, and I encourage that. But once classes conclude, IT pros still need access test hardware to continue to learn.

Even the high quality of AIX and Power hardware, in a sense, keeps users away from the systems. At many sites the systems are installed by a consultant, and they run with little care or feeding. So people become afraid to touch them, since no one wants to be responsible for causing an outage by fiddling with an enterprise application running on enterprise hardware.

On the other hand, people who do get to know AIX really, really like it as I blogged about in “AIX Tops in ITIC User Survey.” And efforts like the IBM Academic Initiative help bridge this gap for educators and students.

I guess most tourists in Chicago will continue to flock to the popular and well-known bus and boat rides. That’s fine.

As for AIX, I’ll continue to beat this drum and tell people to find out what they’re missing. At least in the computer world, we AIX users appreciate our hidden gem.

IVM to HMC Migrations

Edit: I cannot imagine anyone needs to do this now, but you never know what is still running out there all these years later.

Originally posted September 8, 2009 on AIXchange

Over lunch recently, a customer told me about the frustrations he incurred while trying to migrate his server from the Integrated Virtualization Manager (IVM) environment to the Hardware Management Console (HMC) environment.

From the IBM Repaper, “Integrated Virtualization Manager on IBM System p5“:

Section 5.2.4: “There is no officially announced procedure to migrate from an IVM environment to the HMC. If an HMC is connected to the system, the IVM interface will be disabled immediately, effectively making it just a Virtual I/O Server partition. The managed system goes into recovery mode. After recovery completes, the HMC shows all of the LPARs without a profile. You have to create one profile for each LPAR.”

So it can be done, as I figured it could be. In an effort to find specifics, I e-mailed a few people I thought might know how to do this. IBMer Janel Barfield, who has worked with and given training on POWER virtualization technology, sent me the reply that follows. I’ve not had time to test this, so while I expect it would work, I don’t know for sure that it will. Perhaps a reader might be willing to try it out and report back. If not, I’ll fire up a test box and give it a try.

Before connecting your HMC to your server, log into the IVM, and back-up your profile data. Start at the Service Management menu. Select Backup/Restore and then, when the Backup/Restore page is displayed, select Generate Backup. Alternatively, you can do this from the Virtual I/O Server by entering this command:

bkprofdata -o backup -f /home/padmin/profile.bak

Source: IBM PowerVM Virtualization Managing and Monitoring Redbooks publication, section 5.2.3.

You may have to store the profile backup file on a CD to be able to restore it on the HMC.

From the HMC, restore the profile data from an ssh session. Load the diskette or CD containing your configuration data into your HMC drive. Then enter this command:

migrcfg -t 1 -m [system-name] -f [filename]

Source: Appendix C in the Hardware Management Console V7 Handbook.

Why might this procedure be useful? If you wish to try out Power servers but don’t know if you’ll need an HMC, you can just use the IVM on a standalone machine. However, once your environment grows and you add more Power servers, then having an HMC manage them all might make more sense. So in the event you eventually decide to migrate from the IVM to the HMC, backup files save you the hassle of having to manually recreate them from the GUI.

In the same vein, hopefully you saw the IBM Systems Magazine case study on a customer that used  scripting in the creation of LPARs.

From the article:

“Now, thanks to VIO and some crafty coding, the company can re-create all of its LPARs at the same time.

“It’s Fortman’s scripts that do much of the heavy lifting, especially when it comes to automatically building the VIO client LPARs on the HMC. For example, in TTI’s production environment, the scripts query and extract all VIO client LPAR information. They then transform that information so each LPAR can then be rebuilt from the command line of the HMC.” You can see an example of Fortman’s scripting.

“Fortman adds: ‘We basically extract all of the information about our environment on a daily basis and turn that into executable commands that we send off-site with all of our other DR information. Then, at the DR site, we pull out these scripts and use them to auto-build our entire VIO client LPAR infrastructure.’

“This is in contrast to using the HMC Web interface, which requires users to manually navigate through this process. In the case of TTI, someone can remotely run the scripts in the HMC and have all of the
LPARs automatically populated based on the information included in the scripts.”

In each case, the goal is to try to avoid having to use the GUI to recreate the profile information. And, in both cases, it appears we can do just that.

Active Memory Sharing Takes the Guesswork out of Partitioning

Edit: The links seem to work for now. I think this is my first reference to seeing a link from twitter that I shared with readers, I do that all the time these days.

Originally posted September 1, 2009 on AIXchange

Like I wrote in AIXchange a few weeks ago, I was minding my own business when someone tweeted a link to a new IBM developerWorks article by Chris Gibson, “Configuring Active Memory Sharing: From a customer’s experience.”

I’d seen the movies that Nigel Griffiths had done based on this topic:

http://download.boulder.ibm.com/ibmdl/pub/systems/power/community/aix/AIX_Movies/AMS_RegularPaging.wmv

http://download.boulder.ibm.com/ibmdl/pub/systems/power/community/aix/AIX_Movies/AMS_Concepts.wmv

http://download.boulder.ibm.com/ibmdl/pub/systems/power/community/aix/AIX_Movies/AMS_Setup2.wmv

http://download.boulder.ibm.com/ibmdl/pub/systems/power/community/aix/AIX_Movies/AMS_Monitor2.wmv

I’d also read the IBM Redpaper, “PowerVM Virtualization Active Memory Sharing.”

Chris’ article, though, is yet another great source of information about this additional functionality. PowerVM Active Memory Sharing allows you to virtualize memory on your frame. You can overcommit physical memory and allow the memory to flow between logical partitions and paging space.

Virtualized memory operations aren’t as instantaneous as virtualizing your CPU. While your micro-partitioned CPU makes changes on a millisecond basis, memory sharing uses paging space, so it’s obviously orders of magnitudes slower as it reads and writes to disk. So if you have several logical partitions that are all starved for real memory at the same time, this isn’t for you. However, it is great if your LPARs don’t simultaneously demand memory. When you create an LPAR and allocate some
amount of memory, it’s possible, even likely, to guess wrong. Allocate too much memory and it goes to waste. Allocate too little and you might run into performance issues. With Active Memory Sharing however, the system determines which LPAR actually needs the memory. This makes more efficient use of your resources.

From the Redpaper:

“Active Memory Sharing allows overcommitment of memory resources. Since logical memory is mapped to physical memory depending on logical partitions’ memory demand, the sum of all logical partitions’ logical memory can exceed the shared memory pool’s size. Each logical partition is allowed to use all assigned logical memory. When the cumulative usage of physical memory reaches the pool’s size, the hypervisor can transparently steal memory from a shared memory partition and assign it to another shared memory partition. If the removed memory page contains data, it is stored on a paging device and the memory page content is cleared before it is assigned to another logical partition. If the newly assigned page contained data, it is restored from the disk device. Since paging disk activity has a cost in terms of logical memory access time, the hypervisor keeps track of memory usage to steal memory that will likely not be used in the near future. The shared memory partition’s operating system cooperates with the hypervisor by providing hints about page usage and by freeing memory pages to limit hypervisor paging.”

Read Chris’ article and the Redpaper, and watch the movies. Think about the different ways that this functionality can be useful in your environment. Look at the list of requirements.

Again, from the Redpaper:

“In order to use the Active Memory Sharing feature of IBM PowerVM, the following are the minimum requirements:

“An IBM Power System server based on the POWER6 processor
Enterprise PowerVM activation
Firmware level 340_075
HMC version 7.3.4 service pack 2 (V7R3.4.0M2) for HMC managed systems
Virtual I/O Server Version 2.1.0.1-FP21 for both HMC and IVM managed systems
AIX 6.1 TL 3

“Shared memory partitions can be created on a system as soon as the shared memory pool has been defined. In order to be defined as a shared memory partition, the logical partition must meet the following requirements:

  • Use shared processors.
  • Use virtual I/O, including: virtual Ethernet adapters, virtual SCSI adapters, virtual Fibre Channel adapters, virtual serial adapters

The operating system running in the logical partition can be either AIX, IBM i, or Linux.”

Start planning now for any necessary upgrades so that you can take advantage of this new way to manage your machines.

Revisiting IBM Systems Director

Edit: They tried. The workshop link is not active. The schedule / ibmquicklinks link is not active. The links to download and install are no longer active.

Originally posted August 25, 2009 on AIXchange

As complexity increases, we find that we need better tools to manage our machines. If we’re only responsible for one or two machines, then logging in and leaving topas or vmstat running in a few xterm windows might make sense. Managing these smaller environments probably isn’t very challenging. However, if you’re dealing with many different physical and virtual machines, logical partitions, the HMC, VIO servers, etc., management and administration can become much more complex and cumbersome. Then add to the puzzle the physical x86 servers running VMware and providing the opportunities to run vmotion or live partition mobility. Now you need to track where LPARs are physically running.

IBM’s solution for dealing with these challenges is IBM Systems Director.

I recently attended an IBM Systems Director workshop where I listened to lectures and participated in hands-on lab exercises with Director 6.1. I had seen 6.1 before, and I’ve read announcements and articles about it, but I hadn’t spent much time working in depth with it. If your only exposure to Systems Director is the 5.2 version, check into this workshop (link not active). It can help you see the benefits of using IBM Systems Director 6.1 to manage your computing environment. As IBM says, this is a workshop, not a class.

This is from the Web site:

“IBM is presenting these workshops to give participants the information needed to be successful in the implementation, use and maintenance of an IBM Systems Director management environment. This includes the use of IBM systems management tools and utilities including IBM Systems Director.”

Go view the schedule and see if a workshop is headed to your area. It’s well worth a moment of your time.

More from IBM:

“The workshop will include discussions, presentations, videos, demonstrations and hands-on activities. Topics will vary based on time considerations and attendee interest. There is no cost for participation in the workshop.”

In my workshop we had access to a laptop loaded with VMware. It had a total of three virtual machines preinstalled: one ran Systems Director, another a Windows operating system with a Systems Director agent running on it, and the third ran agentless. Among other tasks, we were able to log-on and run discovery to find devices, request access to machines so that Systems Director could log in, run inventory on the systems to get more information about them, set up alerts based on criteria that we are interested in, manage updates for the different systems, remotely install System Director agents on systems we wanted to manage, set up Systems Director user access, etc.

If you can’t get to a workshop yourself, you can still browse various resources devoted to System Director, including frequently asked questions and links to videos and the CD that are handed out during the workshops.

Go here and here to download and install the actual IBM Systems Director product.

Here’s IBMer Greg Hintermeister describing the installation process in an article (link not active) he wrote earlier this year:

“IBM Systems Director integrates the embedded AIX console tasks for extending AIX operating system management. From the Power Systems Management summary page, select ‘AIX systems’ under the operating systems category. From here, right-click an operating system to select the AIX web console to launch. One new task in the AIX console is the health task, which shows system configuration values and graphs of key performance metrics, as well as the top processes and file systems in use.”

I’d heard for some time that IBM Systems Director 6.1 is new and improved compared to the 5.2 version. Now I know, thanks to the workshop. I encourage you to look more closely at the product–download it, install it, use it. Then post your own thoughts in Comments.

AIX Tops in ITIC User Survey

Edit: The developerworks and VUG links no longer work, but the survey information is still there.

Originally posted August 18, 2009 on AIXchange

To the surprise of some, users think highly of AIX. This recent IT Business Edge blog post cites an annual ITIC survey on enterprise server platforms.

“IBM AIX won in virtually every major category, from uptime, to time to patch, in every type of problem report. It appears that IBM missed the memo that UNIX was dead.”

Blogger Rob Enderle, president of Enderle Group, a technology advisory firm, goes on to say:

“It is amazing how often something we think of as dead seems to come back. Then, like the old Monty Python movie ‘Holy Grail’ and the ‘bringing out the dead’ scene, it not only isn’t dead but it seems to be feeling more alive every moment. This was true of the mainframe and now it appears to be true of AIX, which did surprisingly well.'”

Even more striking about the results–this is the second straight year that AIX topped the field in this survey.

This doesn’t surprise me, since I use AIX every day. Of course, not everyone shares my feelings.

On Twitter I see comments like: “HP-UX isn’t so bad… I used to manage HP-UX, AIX, Solaris, Dynix, Redhat and SCO… AIX was my least favorite.”

Or: “Fixing Windows Itanium and IBM AIX issues in our product. Wish people stopped using those platforms.”

Even in this economy, I continually see job openings for people with AIX skills being posted to job sites and Twitter, and I don’t think this is likely to change in the near term.

I’m convinced that the people who are tweeting about how much they hate AIX haven’t actually spent much time using it. Yes, it differs from other operating systems. And yes, there is a learning curve — though many avenues are available to anyone who seeking more AIX knowledge (starting with IBM Redbooks site, IBM developerWorks site and the IBM AIX Virtual User Group site). But in my experience, once people learn about all the capabilities of AIX, they tend to love it.

At least AIX is getting some positive recognition. I’m especially glad to see it being recognized for things like uptime and time to patch. Hopefully with reports like this one, word will continue to spread.

Confessions of a Model M Bigot

Edit: Still one of my favorite topics. I am still a Model M Bigot. Mechanical keyboards seem to have made a bit of a comeback. The PC world link no longer works. The article talking about why Model M went away no longer works.

Originally posted August 11, 2009 on AIXchange

Some years ago I took a job that required a move across the country. I was going to live in the new area for a few months while I waited for my house to sell. Once it sold, my family would join me. I brought clothes, a laptop and other essentials. I also took along the IBM Model M keyboard–and not just one. I brought one for the office, and one for the apartment I was going to stay in.

Awhile back I saw an article that mentioned the Model M. Of course, googling “Model M” will lead you to many other pages dedicated to this keyboard. Search for Model M, and then try to search for whatever keyboard model you use today. I wonder if you’ll find as many devoted users. Ask yourself if your current keyboard–the one you use for hours every day–fills anyone with as much passion as people show for the old Model M.

I ended up working in a cube farm for a while, and many of my cube-dwelling neighbors didn’t appreciate the amount of noise that came from my keyboard. Unlike today’s whisper-quiet keyboards, the Model Ms announced their presence. Still, I say you can take your DVORAK, ergonomically sound, cheap throwaway free keyboards that come with your new PC and throw them all away. Make some noise and use a real keyboard.

There was a time when I remember the Model M being the only keyboard out there; it shipped with IBM PCs and I saw them all over. According to this article, price pressures forced manufacturers to look for ways to cut costs. Bundling lower cost keyboards with machines made sense.

For those of you who are younger, you probably did not see widespread Model M deployments, and have quite possibly just never seen one before. I understand that you just don’t know any better. But to anyone who started out on the Model M and moved to something else, I can only ask, “What were you thinking?”

As I said, when you type on a Model M, you know you’re typing. Those around you know you’re typing. There’s no question that typing is being done.

The Model M has a tactile feel that I haven’t seen duplicated on any other keyboard. And they’re indestructible. I’ve used the same Model M for many years, and I expect my fingers will succumb to arthritis before this keyboard gives out.

Model Ms have PS/2 connectors, which makes it harder to connect them to newer machines with USB connections for their keyboard and mouse. However, getting a PS/2-to-USB resolves that issue, and it’s well worth the effort. Just be sure that you get the adapter converter as opposed to a passive adapter, as is pointed out on clickykeyboards.com. From Wikipedia:

“Most fans of the Model M especially prize its feel and sound. Unlike the common (but cheaper) dome switch design, the Model M’s buckling spring design gives users obvious tactile (a distinctive resistance as the keys are depressed) and aural (a characteristic, loud “click-clack”) feedback. Many users report that they can type faster and more accurately on the Model M than other keyboards.

“In addition, the Model M keyboard is less susceptible to dirt and wear and tear; while dirt will interfere with proper operation of a dome switch keyboard, the design of a buckling spring keyboard is such that any dirt that falls between the cracks usually fails to make it into the spring mechanism. Failure of the mechanism to operate properly would require a large amount of accumulation, which is unlikely to occur.

“There are some drawbacks to the Model M design. Because the keyboard is so heavy, it is not as portable as many modern keyboards. The keys are noisy enough to be inappropriate in a location (such as a public library) where noise is an issue. Also, liquids spilled on the keyboard would not drain out, and would remain in the keyboard with potential to cause a short circuit.”

If you’d like to get your own Model M, try clickykeyboards.com or pckeyboard.com. And for the ultimate keyboard with a trackball and a pointing stick, you might like the On the Ball models, which are sold on clickykeyboards.com.

You could argue that the prices are high. However, understand that once you buy one, you’ll be able to use it for many years to come.

If I need to spend any amount of time working on a machine, I will plan on connecting a Model M keyboard to it. If you must sit near me, I apologize in advance for the noise that will come from my keyboard.

The Cost of Unprotected Data

Edit: This is still an interesting dilemma. Maybe more organizations would just be more cloud centric and not worry about any of it these days.

Originally posted August 4, 2009 on AIXchange

I was recently talking with someone who works for a non-profit organization. He was tasked with being the AIX administrator. He didn’t necessarily want those extra duties. He didn’t have a background in systems administration, and he had no training or real inclination to do the job. Still, he did his best to administer the non-profit’s machines on top of his normal duties.

Most of us who’ve worked on computers for a while understand that backups are critical. If we haven’t experienced the horror stories first-hand, we’ve certainly heard them. We know that the capability to restore your system–on different hardware that, preferably, is in a different location–is essential.

The non-profit stored its mksysb and savevg images on another machine. However, the organization ran out of disk space on this secondary machine and decided it would recover some disk space by moving the backup images to the production machine. In short, they gambled that nothing would go wrong with the production machine, and they lost everything. Disaster struck on that primary machine. The filesystems were erased, and the data and backups vanished. No offsite tapes. No way to recover the data.

All the while, those in charge at the non-profit maintained that there was no budget to hire a system administrator. They were doing their best.

I know of another company that figured, because it had primary and backup sites, it didn’t need to worry about backups. The problem was, due to the way the machines were configured, when data was deleted at the primary site, it was deleted at the backup site as well. Redundant machines are great, but you must have a way to access older data if your current data becomes lost or corrupted.

I don’t expect people juggling sysadmin duties with a “regular” job to get all of these issues. Hopefully though, the importance of backups is universally understood. Data must be protected.

Along those lines, backups are useless if they’ve never been tested. Be sure to verify that you can restore your data, however you choose to back it up.

I keep wondering, what’s the solution for organizations that cannot afford administrators? Do we ask consultants with experience to step in, pro bono, or at least at a reduced rate, to help non-profits manage their machines?

If nothing else, it might be nice for these part-time administrators to have someone that they can turn to when they have questions. These folks can benefit from Internet forums, IRC channels etc., although it can be hard for seasoned professionals to give strangers more than general advice about machines in which they have no vested interest.

In this case, had the non-profit asked whether moving these images back to the primary machine was a good idea, they would have had received a resounding “no,” and disaster could have been averted. To organizations that claim that they cannot afford proper help, I have to wonder about the value of that lost data. I’m willing to bet that trying to recreate data–or being forced to live without it–is far more expensive than finding the appropriate assistance in the first place.

Twitter: A Toy and a Tool

Edit: Pretty crazy to see how this evolved, and who ended up using it, and for what purposes.

Originally posted July 28, 2009 on AIXchange 

You may not be able to say much in only 140 characters, but you can still do plenty via Twitter.

I’m a late adopter when it comes to social networking. I dabble with LinkedIn but continue to resist MySpace and Facebook. When Twitter first launched, I went to the website and wondered, “What am I going to do with this?” But since I started “tweeting” and, specifically, running TweetDeck, I’ve found ways to make Twitter useful for me.

For those similarly wary of social networking, here’s the rundown on Twitter, courtesy of Wikipedia:

“Twitter is a free social networking and micro-blogging service that enables its users to send and read each others’ updates, known as tweets. Tweets are text-based posts of up to 140 characters, displayed on the author’s profile page and delivered to other users — known as followers — who have subscribed to them. Senders can restrict delivery to those in their circle of friends or, by default, allow open access. Users can send and receive tweets via the Twitter Web site, Short Message Service (SMS) or external applications.”

Through Twitter, I’ve watched people from all walks of life as they broadcast their thoughts and ideas. Many link to stories they find on Slashdot or Digg, though I don’t follow these folks for long, as they aren’t providing anything new or valuable to me.

I’ve followed companies. At least some of them seem to recognize Twitter’s potential as a customer service tool. I watched employees from an airline help stranded travelers and inform customers about new cities or services that they’re adding. I also watched Tony Hawk hide skateboards and tell people where to find them. And of course, in the aftermath of the recent Iran elections, Twitter became the primary means for Iranians to tell the world about the protests and violence.

I agree with those who say there are more followers than tweeters. In my case, I don’t spend much time tweeting. I figure people don’t care about what I’m eating or that I’m stuck in traffic. What I find powerful about Twitter is the ability to search.

Numerous Twitter-related clients are available for the PC or your smartphone. With TweetDeck, I can continuously run searches for things that interest me (AIX, POWER6, IBM, etc.). Whenever a tweeter comments on these topics, I’ll see it. Often I’ll check the person’s profile and recent tweets. If they interest me, I’ll follow the tweeter. Over time you’ll find information that you might not have otherwise uncovered.

Google helped me find ways to automate sending Twitter posts from the command line using “curl” and “split” on a Linux machine. To test the functionality, I downloaded information from different Web sites and used the split command to divide these into several files. Then I published them on Twitter. I was doing things like downloading weather forecasts or RSS feeds, splitting them into 140 character chunks, and tweeting them. People more creative than I am could find other uses for splitting data using cron and simple text parsing.

I’d read that someone posted the text to “Moby Dick” onto Twitter. I wanted to do something similar, but  on a much smaller scale, so I found the text from the Dr. Seuss’ “The Cat in the Hat” and tweeted that file.

First I split it into 140-character chunks:

split –b 140 catinhat.txt cathat

This created files in my directory with names like cathataa, cathatab, cathatac, etc. To send them out via Twitter, I just ran:

for i in a b c d e f g h i j k l m n o p q r s t u v w x y; do curl -u
username:password -d “status=`cat cathata$i`”
http://twitter.com/statuses/update.xml; done

Where the username and password were obviously real ones.

While for me this was simply an exercise in automatically splitting and posting large files, were anyone following me on Twitter, they would have be notified of my action. And anyone searching for the text of “The Cat in the Hat” would have been directed to my Twitter page. Granted, using this service to tweet “Moby Dick” and “The Cat in the Hat” is silly, but as I said, more and more people will find interesting things to do with Twitter.

Some argue that Twitter is a fad, and that it will soon be replaced by something else. That may be true. Nothing lasts forever. For now I’ll use it until the next big thing comes along. Then I’ll migrate to that next big thing–in due time, of course. I am a late adopter, after all.

Learn by Studying, Then by Doing

Edit: Edited some links to Tom’s articles and the location of nstress.

Originally posted July 20, 2009 on AIXchange

As I noted in last week’s AIXchange blog entry, Ken Milberg has a soon-to-be-released book on understanding AIX performance. Blogging about Ken’s book got me thinking about performance, and how challenging it is for AIX administrators to become well-versed in this area.

In 2003 I worked for IBM in Boulder, Colo. We were invited to attend an onsite class taught by Tom Farwell. These AIX performance classes were going to be held twice a week over five weeks, 10 classes in total. Intrigued by the subject matter, I attended the first class. It felt like there were a hundred people in that conference room. But by the end of the course, only a handful of administrators remained. I often wonder about that. AIX performance is critical for an administrator to understand, yet it can be a challenging topic to master. Some, I imagine, dropped out due to a heavy workload or commitments outside of work. Others didn’t want to put forth the effort necessary to understand the concepts, or felt there was too much detail that they didn’t need to understand.

Later that year at IBM Technical University in Miami, Tom was a presenter at some of the sessions. I remember sitting in the crowd and being struck by the huge amount of interest from the attendees. During breaks people would bring their laptops to the front and ask Tom to take a look (by virtue of their wireless connections) at their production systems that were having issues. Simply by having the admins run a few commands, Tom quickly deduced what was wrong in many cases.

He was the first person that I heard apply queueing theory principles to AIX performance, and Tom’s authored several articles in IBM Systems Magazine that are worth reading. Some samples are “Examining vmstat” and “It’s All in the Networking.” A Google search should yield several others.

In addition to Tom’s writing, the “AIX 5L Practical Performance Tools and Tuning Guide” Redbooks publication and other performance-themed IBM Redbooks publications can help you gain knowledge on this topic.

Once you’ve done your reading and studying, get a test machine to practice on. Start by running commands from nstress as explained on IBM developerWorks, including ncpu, ndisk, nfile and nmem to simulate different types of workloads on your machine.

Spend the time with the AIX tools and commands that can help you pinpoint the causes of system problems and show you how the output appears when the machine is under load.

Better still, get your actual application running and find ways to break your test box. Troubleshooting a live, misbehaving machine is the best way I know to prove you really understand how to diagnose and fix problems.

Get the knowledge from the books, and then get the hands-on experience from the machines. Both acts will help you understand AIX performance–and, accordingly, allow you to become better at your job.

‘Driving the Power of AIX’

Edit: powertco.com link is not active. Updated the author page. The developerworks page no longer works.

Originally posted July 14, 2009 on AIXchange

I’ve known Ken Milberg for about three years. You may know him from the numerous articles he’s authored for IBM Systems Magazine. Ken is the president of PowerTCO, a New York-based IBM business partner. He’s also the founder and leader of the NY Metro Power AIX/Linux User Group.

http://ibmsystemsmag.com/authors/ken-milberg/

http://www.powertco.com/ (link not active)

http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=104533263

In a recent promotional e-mail for his user group, Ken mentioned his book, “Driving the Power of AIX.” If you’ve read this blog for any length of time, you should know that I’m keenly interested in AIX performance. It’s critical for administrators to understand this area, yet it can be challenging to master. Some admins don’t understand basic performance concepts, while others fail to grasp important details.

It’s funny — some of the problem stems from the excellent products we use. Our machines run great most of the time, which is wonderful. As a result though, it’s easy to get out of practice when it comes to diagnosing machines that are behaving badly. Plus, it’s easy to call IBM Support. These experts deal with misbehaving systems day in and day out, so they can often uncover problems more quickly than we do. It’s an awfully tempting crutch to use.

But back to the book. Ken sent me a self-published copy. This version — while not identical to the one that MC Press is releasing later this year — gives me a good feel for what’s coming.
   
Just scanning the table of contents, I can see the interesting topics that Ken will cover in the book, including his tuning methodology and philosophy, tuning the CPU, memory, disk I/O and network performance. The closing sections are specifically devoted to tuning Oracle and Linux on Power machines. In spelling out the differences between tuning AIX 5 and AIX 6 systems, he notes that the AIX 6 defaults are much more reasonable for the typical AIX workload compared to earlier releases.

Whether Ken’s discussing the AIX Virtual Memory Manager, persistent and working memory, why cio is a good choice for your file system mount option or why he prefers nmon to topas, he makes it understandable to readers. Each chapter concludes with a quiz and a summary section so you can digest what you’ve just read. Don’t skip over the chapter summaries — I found several valuable tidbits in there. I also enjoyed his anecdotes throughout the text. Clearly, Ken’s been in the trenches tuning real-world production machines.

Useful command names and options are listed throughout the book, though I would like to have seen  Ken offer even more examples of how these commands are used. Nevertheless, the version of the book that I read is a useful addition to my library, and I plan on getting Ken’s new edition when it’s released.

Incidentally, if you’re going to be in New York City on Aug. 6, you may want to drop by the NY Metro Power AIX/Linux User Group meeting. Ken’s doing a technical presentation based on his upcoming book, and he’ll distribute free copies. It’s part of an IBM technology book fair hosted by MC Press.

And if you can’t get to the meeting, check out a local user group, or a virtual one. As I’ve said before, user groups are a valuable resource for IT professionals.

https://robmcnelly.com/user-groups-still-going-and-still-worth-your-time/

IBM Hardware Fits in the Big Picture

Edit: Still seems to be a reasonable discussion to have.

Originally posted July 7, 2009 on AIXchange

I’ve worked for numerous companies that were IBM customers. For several years I worked for IBM in Boulder, Colo. I currently work for an IBM business partner. Through all this time, I’ve been happily using IBM racks with IBM servers. Sure, there were other servers running other operating systems that used different racks in the various computer rooms, but the IBM servers all lived in IBM racks.

I was recently on a raised floor at a server co-location company and watched IBM customer engineers move a POWER5 Model 570 from an IBM rack to a non-IBM rack. To install the server, they had to carefully measure its required depth. To make the flex cable on the front of the machine fit into the rack, they had to modify the side of the rack due to some protruding metal on the side doors that was interfering with the flex cable. Obviously, the whole operation took longer than a regular rack install into a standard IBM rack would have taken.

While they got the server to work in the non-IBM rack, it wasn’t an ideal situation. The power distribution units in the back of the rack were rotated compared to the IBM rack, and the plugs were oriented so that when the servers were plugged in, power supplies and PCI cards couldn’t be swapped without removing the entire PDU first.

So why go to this trouble? These non-IBM racks are taller and have a smaller footprint than the previously used IBM racks. When you’re paying by the square foot, you want to squeeze in as much equipment as you can. By using extra space at the top of the rack, the co-locator could fit more equipment into its floor space. I can certainly understand that with several taller, narrower racks, you can wedge more racks into a row of hardware, and that can translate into more revenue for the operators of the shared environment. (This might not matter so much in a customer-owned data center, but in this shared environment, it’s critical.)

One of the facility managers said that IBM just doesn’t get co-location. He argues that every other vendor can make its equipment fit into these racks, why can’t IBM?

My take is that IBM engineering designs IBM racks with IBM hardware in mind. They’re rugged. They’re built to fit without modification, and they’re built so that parts can be easily hot-swapped in and out of the machines. If IBM racks don’t fit nicely into your cookie cutter server room that was designed with non-IBM servers and racks in mind, if they’re wider than other enclosures, then maybe it’s time to rethink the design of the room. The problem as I see it is that many customers think that everything is the size of an x86 machine. However, they should plan for equipment of other sizes. It’s not necessary to redesign the computer room–but maybe designating an aisle for nonstardard-sized racks would give them some flexibility.

The thing is, the greater serviceability of IBM solutions is well worth the sacrifice of some floor space. IBM engineering designed and built its racks with IBM servers in mind. Using PowerVM and IBM Power Systems servers, you can consolidate multiple large workloads onto a much smaller footprint. And isn’t creating the smallest footprint the goal here? Yes, IBM racks might take up an extra square foot or two compared to non-IBM racks, but this should be easily offset by an overall reduction in server hardware.

I believe IBM keenly understands co-location. They’ve given me hardware that I can easily carve into multiple LPARs per physical frame, which allows me to consolidate workloads. The technology is very forgiving and flexible–as workloads grow and change, it’s very easy to use a dynamic LPAR operation and make the adjustment.

Ultimately of course, the customer is always right. So if you need to make your server fit into a smaller footprint on your raised floor, you can and should do so. But first, reconsider what you’re giving up by making this change. Saving physical space doesn’t always equate to saving money.

SSD is Something to See

Edit: I cannot believe that the demo video is still available for download. and the index still works! Inconceivable. Also amazing how cheap commercial SSDs are, I try to run them in everything these days, and would not go back to spinning rust for anything.

Originally posted June 30, 2009 on AIXchange

Nigel Griffiths has another great demonstration video, this one a discussion of the recently announced SSD drives. I’ll give you some highlights here, but watching the entire clip is well worth the 22 minutes.

Nigel mentions that these drives evenly spread data throughout the memory cells on the device, and all of the cells can be accessed at once. He also states that you shouldn’t attempt to defragment an SSD because doing so unnecessarily consumes some of the finite number of writes and shortens the drive’s lifespan.

These disks currently ship with 69 GB usable out of a 128-GB capacity. This spare capacity allows the disk to actively search for suspect memory cells, mark them as bad and move data away from them without losing any capacity, since you have only half of the possible disk space available for your use.  Compared to traditional drives, SSDs can access every memory cell at once, so rotational latency and seek times aren’t issues. Nigel shows random I/O per second (IOPS) in the range of 28,000, as opposed to 125-200 IOPS with a traditional disk drive.

He also notes that SSDs use less power than a traditional disk. In his example he compares 330 watts for five SSDs vs. 7,700 watts in a single DS4700 system. Disk per disk, he says SSDs require about half of the power of traditional disk subsystems.

In the demo Nigel covers the different disk carriers, machines and expansion drawers that the drives can be used in, going from 2.5-inch small form factor disks for blades to 3.5-inch SAS carriers. He also displays various pictures–including some of a dismantled SSD–to provide a more complete visual perspective.

To take advantage of SSD, you must update your machine firmware and make sure that HMC, AIX and VIOS are at supported levels. In the demo Nigel connects an SDD directly to his LPAR, and then runs lsconf to show the disk in his machine. Then, after running cfgmgr, you see that the disk doesn’t immediately appear. Nigel goes into the menus–DIAG > task selection > RAID array manager > IBM SAS disk array manager–to create an array candidate pdisk and format to 528-byte sectors.

Once you take these steps, you’ll be able to select the solid state drive that will format the disk. At this point, you can see a pdisk, which can put into an array. From the same IBM SAS disk array manager menu, you can create an SAS disk array by selecting the controller, RAID level, stripe size and the pdisk you just created. You can then you can go back and list the array, which will show you the hdisk and pdisk mappings that you just created.

If this doesn’t seem clear, watch the demo. It should make more sense once you see it performed on a live system.

Now when he runs lsconf, the hdisk appears, and it can be used as would any other disk — you can add it to a volume group, create filesystems, etc.

Nigel runs some tests to demonstrate SSD speed. To prepare for testing, he unmounts his test /SSD filesystem and remounts it with the cio option. By running a tool called ndisk64, he performs some random I/O tests, runs iostat and can see 11,000 IOPS and 44 MB/sec.

For the second test, he uses the raw device and gets 27,000 IOPS to the disk. He then mentions the rumors he’s heard about even faster times being recorded in the labs.

So why do we care about SSD? Customers with intense performance requirements can put heavily accessed data onto an SSD. Nigel also brings up using SSD in conjunction with active memory sharing and paging spaces, as SSD will provide much faster access than regular disks. (However, I’ve also recently heard arguments that the slower write times vs. read times might be an issue with this application. I’ll dig a bit further into any downfalls with this use for SSD.)

He covers many other demos covering topics like active memory sharing, the Integrated Virtualization Manager, partition mobility, WPARs, nmon, etc. An index is available so you can browse around for the topics that interest you.

YouTube also has some relevant material. Search on “ssd vs. hdd laptop” and watch demonstrations where people put two identical laptops (save for the disks) side by side and boot them. Of course, these disks aren’t the same as those in your Power server, but you can still see the performance gains–those laptops with SSD are much faster than those with traditional drives.

As the technology evolves and the prices come down, I expect more customers to examine SSD’s benefits, in both their servers and their personal machines.

Communication Matters, Even for Techies

Edit: Presentation skills are still valuable to work on.

Originally posted June 23, 2009 on AIXchange

I often wonder how many readers of this blog are hard-core technical people. You live and work heads-down on the raised floor, doing the “real” work. You generally don’t deal with bean counters or management types because your immediate supervisor shields you from all that business-specific nonsense.

That’s an easy attitude to have. Machines are the heart and soul of any business, and what you do is critical to your enterprise’s survival. On top of that, you’ve worked hard to get where you are. You have spent many years acquiring skills and knowledge. You read Redbooks. You get continual technical education. You hammer away on test machines. Your problem solving skills are top notch. You’ve spent years acquiring education and gaining experience. But there is more to your job than machines. Do you ever consider your “soft” skills?

I constantly hear that knowledge is power. But the inability to effectively share knowledge saps much of that power. We need to articulate what we know. We need to write accurate documentation, compose coherent e-mail messages and communicate effectively using instant messenger programs. We need to convey ideas during staff meetings and conference calls. We need to get in front of groups and give presentations.

I heard those messages when I was going to school, but I didn’t think they applied to me. I was going to be crawling around raised floors, pulling cables and physically installing servers into racks. Giving presentations? That was for management and sales and marketing.

It turns out that I was very wrong. Technical skills are valuable, but so is the ability to explain technical concepts to non-technical people. We need to inform others that, sometimes, downtime is essential, because we need that window to apply patches or do other important work. We need to articulate that the tools we need benefit the business, and aren’t merely fancy toys that we want to play with. We need to be able to convince higher-ups to budget for things that will make our jobs easier.

Even for the technical, communication is part of the job. And communicating is something that even the technical can learn.

Of course for most people, technical or not, getting in front of one’s supervisor, a group of coworkers or a roomful of people can be unnerving. Jerry Seinfeld had a great bit about public speaking: “According to most studies, people’s number one fear is public speaking. Number two is death. Death is number two. Does that sound right? This means to the average person, if you go to a funeral, you’re better off in the casket than doing the eulogy.”

If public speaking spooks you more than your mortality, you can conquer those fears. One option to consider is Toastmasters International. From the Web site:

“Most Toastmasters meetings are comprised of approximately 20 people who meet weekly for an hour or two. Participants practice and learn skills by filling a meeting role, ranging from giving a prepared speech or an impromptu one to serving as timer, evaluator or grammarian.”

It’s a given that we need to grow our technical skills, but it’s just as important to develop our presentation and communication skills.

Power is Everywhere — Except in the Public Eye

Edit: This post is back from the dead. Power still needs more love. They may not be in as many consoles these days. They currently run the fastest supercomputers in the world. The marketwire link no longer works.

Originally posted June 16, 2009 on AIXchange

You catch a TV advertisement for a PC. At the end of the ad, you hear the familiar music, and then see the “Intel Inside” logo. Chances are, any PC you use at home or work displays that same logo. Even non-technical folks recognize that their computers are running an Intel chip. So why isn’t IBM Power Systems — the name and the logo — similarly ubiquitous?

According to a 2006 press release, “microchips based on the Power Architecture are the electronic brain of devices large and small, and are inside automotive safety systems, printers, routers, servers and the world’s most powerful supercomputers.”

http://www-03.ibm.com/press/us/en/pressrelease/20213.wss

Most consumers don’t realize that their Nintendo Wii, Microsoft XBOX and Sony Playstations have Power inside. Why doesn’t IBM let the world know that these people are using Power chips in their hardware? It’s a simple message that needs to be delivered.

IBM Power Systems use IBM Power chips instead of x86 Intel chips. They run a range of operating systems, from AIX to IBM i (formerly OS/400 and i5/OS) and Linux on Power (the Linux operating system that’s been compiled to run on Power chips). With a program called Lx86, you can run unmodified x86 Linux binaries so you don’t necessarily need to recompile Linux applications to run on Power Servers.

https://robmcnelly.com/lx86-works-as-advertised/

Power chips are in Mars Rovers and orbiters. They’re in blades, midrange servers and large enterprise servers. They currently running at 4-5 GHz; the bigger machines can have up to 4 TB of memory. A water-cooled Power Systems machine can have up to 448 POWER6 cores per frame.

Raj Desai, vice president IBM Global Engineering Solutions, “With Power-based processors in all three major game consoles, in 50 percent of automobile models worldwide, in 60 percent of the world’s fastest computers, and in 100 percent of the systems on Mars, Power is truly the most versatile computing platform in the solar system.”

http://www.marketwire.com/press-release/Ibm-NYSE-IBM-757462.html

Power is literally out of this world. That it’s not more aggressively advertised just kills me. Everyone knows about Intel inside. Why isn’t there “IBM inside” or “Power inside”? Hearing about the technology will only make people more interested in it. Instead, many IBM salespeople spend half their time with customers just explaining what Power is.

Most of us love these systems. We work on them all the time, and we already know they’re great. But why isn’t the message reaching the world at large. I’m still waiting to hear the Power jingle and see the logo appear on my game consoles.

Clock Synching is Worth Your Time

Edit: Hopefully by now you have everything in sync.

Originally posted June 9, 2009 on AIXchange

How many physical machines or virtual partitions do you have in your environment? One? Ten? Hundreds? How often do you verify that the time is set correctly on each of your servers? I’ve seen customers set the time on a machine using the date command, and then forget about it.

Why should you care whether your machines keep accurate time? In many environments, one server hosts a database while another hosts the application. If either doesn’t maintain accurate time, problems can result. For instance, the database might think it’s getting requests from the future, or from the past. Have you ever tried to troubleshoot two machines whose timestamps aren’t in sync? Good luck trying to figure out what happened at a given point in time. Maybe you’ve tried running an HACMP failover on machines with clocks that are off, or maybe you’ve tried live application mobility on machines keeping different times. Each of these scenarios can introduce problems into your environment due to timing issues, and in same cases will refuse to work at all. Therefore, setting up network time protocol (NTP) is something we must do to keep our clocks in sync.

From Wikipedia: “Clock synchronization is a problem from computer science and engineering which deals with the idea that internal clocks of several computers may differ. Even when initially set accurately, real clocks will differ after some amount of time due to clock drift. Clock drift refers to several related phenomena where a clock does not run at the exact right speed compared to another clock.”

Some people will run the “ntpdate –u ” command from cron at some interval to keep clocks in sync. However, I would argue that we should set up the xntpd daemon instead. If you’re unsure how to do this, keep reading.

First, determine which time server you will use. Many times a time server is already set up in your environment, so you can use that one. You’ll need to edit the /etc/ntp.conf file and make changes. On a default AIX machine, at the end of the file you should find:

#   Broadcast client, no authentication.
#
broadcastclient
driftfile /etc/ntp.drift
tracefile /etc/ntp.trace

Comment out the broadcastclient stanza, and replace it with:

server
where is the name of the server you plan to use.

The NTP Web site, ntp.org, offers this as a starting point:

server 0.pool.ntp.org
server 1.pool.ntp.org
server 2.pool.ntp.org

However, if you look at that Web site, you’ll also find that you could use a sub-zone of pools based on continent or country.

According to ntp.org: “As pool.ntp.org will assign you timeservers from all over the world, time quality will not be ideal. You get a bit better result if you use the continental zones (For example europe, north-america, oceania or asia.pool.ntp.org), and even better time if you use the country zone (like ch.pool.ntp.org in Switzerland) – for all these zones, you can again use the 0, 1 or 2 prefixes, like 0.ch.pool.ntp.org. Note, however, that the country zone might not exist for your country, or might contain only one or two timeservers. If you know timeservers that are really close to you (measured by network distance, with traceroute or ping), time probably will be even better.”

After set up your /etc/ntp.conf file the way you want it, save it and run startsrc –s xntpd.

Once your daemon is running, check your output with ntpq –p.

After some period of time (ntp.org says it could take up to 30 minutes), you should see:

ntpq -p
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
+ip-72-167-54-20 193.5.216.14     2 u   41   64  377    63.26   -9.092    5.14
*rrcs-64-183-55- 130.149.17.8     1 u   41   64  377    84.99   -1.517    7.90
mirror          131.188.3.222    3 u   41   64  377     8.44   -9.533    5.33

Again from ntp.org: “The IP addresses will be different, because you’ve been assigned random timeservers. The essential thing is that one of the lines starts with an asterisk (*), this means your computer gets the time from the Internet.”

Once you set up everything, be sure to go into /etc/rc.tcpip and uncomment the line:

start /usr/sbin/xntpd “$src_running”

This enables xntpd to restart after you reboot your machine.

Keeping all of your clocks accurate and in sync is a best practice. It should be one more step on your new server build checklists, if it isn’t already.

Remember the Alternatives to GUI

Edit: I cannot remember the last time I encountered IVM. The publib link no longer works, but your search engine will lead you to this link.

Originally posted June 2, 2009 on AIXchange

Most readers of this blog are command line-savvy. Had we wanted to use a GUI on our servers, we would have become Windows administrators. Why gain console access using a browser to login to the Hardware Management Console (HMC) or the Integrated Virtualization Manager (IVM) GUI when you can just as easily ssh directly to your IVM server (using the padmin userid) or HMC (as hscroot)?

In the recent AIXchange post about live partition mobility, I was on a blade, so I was able to run the mkvt command to open a console window. If you’re unsure about your partition names and numbers, then run the lssyscfg command (I use the –F flag to give me the name and lpar_id fields; you can choose other fields that might interest you as well):

# lssyscfg -r lpar -F name,lpar_id
aix6san1,6

#mkvt –id 6

If someone left the console running elsewhere and the console is already in use, then use:

#rmvt –id 6

Next, run your mkvt command on the command line. From your HMC, you can ssh in as hscroot and use the vtmenu command. From publib:

http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp?topic=/iphcx/vtmenu.htm

“vtmenu is a Perl script which displays a list of partitions, opening a virtual terminal on the one selected. If more than one managed systems exists, a list of them is displayed first. After a managed system is selected, a list of all logical partitions on that managed system is displayed. The vtmenu command does not accept any flags or parameters and ignores all that are specified.”

vtmenu displays the managed systems connected to your HMC, and it prompts you to choose which managed system to connect to. Once you connect, it displays a list of the LPARs from which you can select.

Partitions On Managed System:

Enter Number of Running Partition (q to quit):
Enter Number of Running Partition (q to quit): 1
Opening Virtual Terminal On Partition lpar1 . . .
Open in progress
Open Completed.

Once you select the session you want to connect to, it should behave exactly as any other console window to which you’re accustomed. 

You can also use the mkvterm –m -p command if you know the machine name and the LPAR name. I find vtmenu to be useful if you do not know that information off the top of your head.  If you need to get the machine name, try lssyscfg –r sys, then use lssyscfg -r lpar -m –F to get a list of LPAR names.  If someone else is using a console, or you left a console running somewhere else, you can use the rmvterm –m -p command. 

In any event, when you are done using a console, you can type ~~. in order to cleanly exit, and you will get a message that says Terminate session? [y/n]. Answer with y and you will go back to the vtmenu screen or to the command line, depending on what method you used to create the console.

Of course, the GUI is always an option, but don’t forget that you can also get console sessions using ssh with your HMC or IVM.

Keep it Simple? If Only

Edit: The links still work. I still like simplicity when I can find it.

Originally posted May 26, 2009 on AIXchange

In a recent e-mail exchange, a friend and I were debating which was a better characteristic of a computer system, the capability to scale up or scale out? Would you rather have one machine that you can grow and virtualize, or a room full of many small machines? Which solution costs less and which is easier to administer? Is a cluster better than a beefy, powerful machine? Which philosophy is best for customers? The two of us hold different perspectives on these matters.

Somewhere during our discussion, he sent me a link to this article from The Register. I enjoyed reading it, and I’ll be on the lookout for someone to implement the ideas it raises. In particular, this paragraph got me thinking:

“The other lesson to learn from the venerable AS/400–one that DEC and HP managers didn’t get about their own minis and a lesson that Big Blue, to its chagrin, eventually forgot–is that an integrated system is not about itself. It’s about the applications that run atop it. You make the system invisible, easy, beloved, reliable, and you get as many real-world, enterprise-class applications as possible on the machine and you help software houses make money by helping them push complete, turnkey systems to customers who are sick of thinking about computers.”

The idea of a turnkey system for customers that don’t want to think about a computer is exactly right. We’re surrounded by technology, but we don’t necessarily need to know how it works under the covers. How many of us have a DVR that runs Linux as the operating system? Do we care that Linux is running under the covers? We just want to record our favorite shows so we can watch them at our convenience. It’s not like we want or need root access to our DVRs. (Of course, if you really want that, it’s out there.)

What about smart phones, video game consoles, computer electronics in our cars, etc.? Sure, some folks want to hack their own hardware, but most of us just want the technology to work. My point is, you shouldn’t have to know how to administer and secure your hardware. It should be secure and work out of the box.

I think people are drawn to simple products that simply work. You plug them in and there’s no learning curve. You don’t need to know that the operating system is running– it just does what it’s supposed to do, and does it well.

And what applies to individual consumers certainly applies to businesses. If your core business is manufacturing widgets, do you really want computer engineers on site taking care of your machines, or would you rather have a turnkey system that just sits in the corner and works? To have that turnkey system, someone has to write the code and make sure that it’s working, and someone might be needed to come on site for the initial setup and install, but after that, what’s wrong with a machine that just runs?

While some of us enjoy the ins and outs of servers–new installs, performance monitoring, patching, keeping up on the latest trends in the industry, trying out new things–just as many people want to be able to plug in a machine and have it work. They’re tired of feeling dumb because they don’t think to scan for viruses or avoid opening every attachment that comes their way. They want to spend their time thinking about their business instead of their business computer.

So where does that leave me? Still busy. The complexity seems to continue to grow in computer rooms, and plenty of people still need help understanding and setting up their servers. However, I have to wonder if it’s too farfetched to think that, someday, server complexity will be such that customers can expect to get their machine, plug it in, and have it just work.

Take the Initiative

Edit: The VUG link no longer works. Edited the link to the IBM Academic Initiative. Added current link to the FAQ. The links to courses no longer work. The link to the discounts no longer works. I still think it could be useful to pair up with your local schools to help learn more about IBM Power.

Originally posted May 18, 2009 on AIXchange

In a previous AIXchange entry titled, “Some New Virtual Disk Techniques,” I said that I usually learn something new whenever I attend or download the Central Region Virtual User Group meetings from developerWorks.

For instance, at the most recent meeting, Janel Barfield gave an excellent presentation on Power Systems Micro-paritioning. Before Janel’s presentation began. IBMer Linda Grigoleit took a few minutes to cover material about the IBM Academic Initiative, which is available to high school and university faculty.

From IBM:

“Who can join? Faculty members and research professionals at accredited institutions of learning and qualifying members of standards organizations, all over the globe. Membership is granted on an individual basis. There is no limit on the number of members from an institution that can join.”

Check out the downloadable AIX and IBM i courses and imagine a high school or college student taking these classes. With this freely available education, these students would be well on their way to walking in the door of an organization and being productive team members from the beginning of their employment. Think about the head start you would have had you been able to study these Power Systems AIX or these IBM i course topics at that age.

Although as I said in a previous AIXchange entry titled, “You Have to Start Somewhere,”  I like the idea of employees starting out in operations or help desks in organizations, the Academic Initiative is a great way for people to get real-world skills on real operating systems.

Instructors also benefit from the program, as IBM offers them discounts on certification tests, training and either discounted hardware or free remote access to the Power System Connection Center.

There’s more. From IBM:

“The Academic Initiative Power Systems team provides vouchers for many IBM instructor-led courses to Academic Initiative members at no cost.

“The IBM Academic Initiative hosts an annual Summer school event for instructors. Each summer this very popular event features topics for those new to IBM i platform.”

Maybe it’s time you get involved. Go to your local high school or university. Find the instructors who would be interested in learning and teaching this technology. Get them to sign up with the Academic Initiative and get involved. With your skills and experience, you could help them get started, and your ongoing assistance would be appreciated by instructors and students alike.

Another Case for Using the Command Line

Edit: I updated the links.

Originally posted May 12, 2009 on AIXchange

Live partition mobility is a topic I’ve covered previously in an AIXchange blog post titled, “Getting Hands-On with Live Partition Mobility.” More recently, Chris Gibson wrote a great Web exclusive article for IBM Systems Magazine titled, “Using the Command-Line Interface for LPM,” that talks about performing live partition mobility migrations from the command line.

To make sense of articles like Chris’, I like to take them to the lab. Basically I run through many of the same commands that he lists in his article. So if you read both his article and this post, you’ll have two different perspectives of doing migrations from the command line on two different sets of blades.

I ran the commands that he provided and in my environment, I used two blades. (In the output below, my command line entries start with #. The output is on the next line.)

Blade 1:
#lssyscfg -r sys -F name
Server-7998-61X-SN100BB8

Blade 2:
#lssyscfg -r sys -F name
Server-7998-61X-SN1004DAA

I have many partitions running on each blade, so I’ll list only the VIOS partition on each blade (10-0BB8A and vios_server, respectively) and the partition (aix6san1) that I migrated:

Blade 1:
#lssyscfg -r lpar -F name,state
10-0BB8A,Running
aix6san1,Running

Blade 2:
#lssyscfg -r lpar -F name,state
vios_server,Running

Blade 1:
#lslparmigr -r lpar
name=10-0BB8A,lpar_id=1,migration_state=Not Migrating
name=aix6san1,lpar_id=6,migration_state=Not Migrating

Blade 2:
#lslparmigr -r lpar
name=vios_server,lpar_id=1,migration_state=Not Migrating

Chris explains that by using the LSLPARMIGR command with the proper parameters, we can learn some virtual adapter mappings that can be used during the migration. My source system (Server-7998-61X-SN100BB8) needs to know the server that’s being migrated to (Server-7998-61X-SN1004DA) as well as its IP or hostname (blade12), the LPAR name that’s migrating (aix6san1) and the appropriate attributes (suggested_virtual_scsi_mappings in this case).

#lslparmigr -r virtualio -t Server-7998-61X-SN1004DA –ip blade12
–filter lp ar_names=aix6san1 -F suggested_virtual_scsi_ mappings

2/vios_server/1

Again, using commands Chris provided, I ran the command to verify that the migration would run, (-o v means to validate only; -t is the name of the destination managed system, –ip is the IP address or hostname of the HMC or IVM managing the destination system and –id is the ID of the partition on which to perform the operation). The command ran for a bit, and then returned no output. Using the echo $? command, we can see the return code, which confirms that the command successfully executed.

#migrlpar -o v -t Server-7998-61X-SN1004DAA –ip blade12 –id 6
#echo $?
0

By running the same command from before and changing the option –o v to –o m, I started the migration. And, using the techniques listed in Chris’ article, I could verify the successful migration of my LPAR.

# migrlpar -o m -t Server-7998-61X-SN1004DAA –ip blade12 –id 6

In one instance I migrated from one blade to the other, but didn’t receive verification that it was ready to migrate back. I ran the command from the second blade, and the migration process hung. By using the –o r (recover) option, it was able to clear out the operation. (In one instance, I also had to use the –force parameter.)

As Chris states, whether you want to automate live partition mobility from the command line or manually perform a migration from the VIOS command line instead of using the GUI, scripts are a great option for AIX admins. Creating a script would allow you to use cron if, for instance, you want to move a partition at a particular time of day. This provides you with a predefined action that’s ready to go; all you have to do is run the script and the partition will move. This can be much simpler than logging in and pointing and clicking on different GUI screens.

We are not Windows administrators. For the most part we do our work from command lines. While point-and-click can seem easier at times, it’s still important to remember that most of the things you can do in GUI can also be done from the command line. There will be situations where running partition migrations directly from the command line makes the most sense.

A Look at Recent IBM Announcements

Edit: The link to the announcements and overview and blade no longer work. The link to the updated Redbook works. The link to the facts and features no longer works.

Originally posted May 5, 2009 on AIXchange

IBM made some new announcements last week. Take the time to get familiar with this information.

From the overview:

“IBM announced new Power Systems blades and servers, virtualization software and POWER6 technology, all designed to help companies build, manage and maintain a 21st century dynamic infrastructure.”

Highlights include:

  • New 4-core JS23 and 8-core JS43 blades using 4.2 GHz POWER6+ processors.
  • Virtual tape support for IBM i and BladeCenter implementations. One thing that customers wanted was a way to natively back up their IBM i images directly to tape drive — this announcement will help address that issue. Customers will be able to use a LTO4 SAS-attached single tape drive model TS2240. Once they upgrade VIOS, firmware on the blade and PTFs for IBM i (in the May 22 timeframe), they’ll be able to use the tape drive as if it were natively attached.
  • Power 520 Express and 550 Express servers with POWER6+ processors. The 520 has 2- and 4-core 4.7 GHz models, and the 550 will have 2-, 4-, 6- or 8-core 5.0 GHz models.


According to the overview, “The POWER6+ processor has also been shipping in the 3.6 GHz model of the Power 560, the 4.4 and 5.0 GHz models of the 16-core Power 570 and the 4.2 GHz model of the 32-core Power 570 since they were announced last October.”

In addition, IBM has announced 69 GB solid state drives (SSD) to be used with Power Systems. I hear that you should expect performance increases when you put your “hot” data onto SSD drives versus
traditional drives. Using SSD allows you to use fewer drives (removing mechanical seeks and waits frees you from worrying about spindle count). This reduces power consumption and takes up less physical space.

Another thing I hear is that, to get to the most out of the SSD drives, you want many random reads and a low percentage of sequential/predictable reads, and you want a higher percentage of reads than writes.

More from the overview:

“PowerVM, IBM’s virtualization solution, is now enabled for the first time with Active Memory Sharing, an advanced technology that allows memory to automatically flow from one logical partition to another for increased utilization and flexibility of memory usage.” IBM published a Redpaper titled “PowerVM Virtualization Active Memory Sharing” on this topic.

This gives you the option of telling your LPARs that they have more physical memory than they actually do. Basically, you’ll be paging memory to a paging device. So if all your LPARs try to do this at the same time, you could face the same kind of contention and poor performance that you’d have on a single LPAR that is memory-constrained and constantly using paging space. Therefore, you wouldn’t want to do this with LPARs that all need simultaneous access to memory. However, it is helpful if you have individual LPARs that need additional memory at different times. Think of this as another way to better utilize your server. With Micro-partitioning we can better utilize CPUs; with Active Memory sharing, we can better utilize our memory across our servers. Active Memory Sharing will work across AIX 6.1, Linux and IBM i 6.1 partitions that use VIOS and shared processors and have virtual I/O. Of course, they must be POWER6 processor-based systems with the Enterprise Edition of PowerVM.

Look over all of the announcement material, including the new facts and features documentation. Then contact your business partner or IBM to help you determine how this new technology can help your business.

Documentation Worth the Effort

Edit: I am still talking about diet and exercise. The link to peruseit.com no longer works. The link to sydiproject seems to work, that site has a link to https://networklore.com/

Originally posted April 27, 2009 on AIXchange

I like to eat. That’s a good thing too, since going too long without eating will kill you. Of course, eating too much and/or exercising too little isn’t a healthy way to live.

Hopefully we’re making better choices these days about what and where we eat. Making meals at home instead of eating out is a good place to start. Of course, a home-cooked meal requires preparation. Perhaps you need to reference a cookbook or dig up an old recipe so you can remember the correct measurements for the different ingredients.

Building a server has its parallels to making a meal. Just as recipes and cookbooks are essential to food preparation, we in IT rely on documentation to properly build a server. As an old coworker would point out, how will we know what’s in our “golden image” unless document all of the software that we’ve loaded? How will we track the special requests that come in during the build process if we don’t make note of them?

Some argue that the system documents itself. If you want to know what’s loaded, you can find out by logging on and running various commands. That might work under normal circumstances, but what if the server dies and you need to rebuild it? Referencing the information that’s on the dead machine might be a challenge then.

Others prefer automated tools (e.g., this and this), or even address their documentation needs with the IBM System Planning Tool.

These tools certainly can be useful, but again, think big picture here. Sure, when it comes to your environment, you can probably create a file system in your sleep. But when a newly promoted junior admin joins the team, can you point the newcomer to the instructions that specify how you want it done? How will new admins learn without carefully compiled documentation?

All environments have unique requirements. One organization might prefer that CIO mount options are used for particular file systems. Another may want raw logical volumes, or have some other requirement when the server gets built. Isn’t it important to have server build standards that you can compare to these requests?

While it’s true that putting out fires takes priority over washing the walls, those walls still need to be cleaned.

Documentation and server build standards are critical. It’s one more thing that we must be sure we make time for.

NIM Benefits, Power Systems Whitepapers

Edit: A mention of Systems Director. Yikes. The literature page link still works, but there is not much there anymore.

Originally posted April 21, 2009 on AIXchange

I’m a big NIM fan. I’m also a big fan of installing VIOS (along with regular AIX LPARs) using NIM, which is described in the IBM Redbooks publication, “IBM BladeCenter JS21: The POWER of Blade Innovation.” One reason I like NIM is you can use it to quickly get a system up and running (as I explain in previous IBM Systems Magazine articles here and here).

When I’m doing a new server build, especially for a large deployment across an enterprise, I rely on NIM. I can install once, create a “golden image,” take a mksysb of it, and then clone that image using NIM. I mean, why install the same tools and software 30 times across 30 LPARs?

Speaking of NIM’s benefits, I recently found on a mailing list this great tip on using NIM with VIOS 2.1:

    The install may hang after the message: “Mounting File Systems.”

    Why? VIOS 2.1 requires more than the 1GB of memory typical for VIOS 1.5.

    According to the release notes, each LHEA *port* assigned to the VIO server requires a minimum of 612MB of memory — and no, that isn’t a typo. This is in addition to the base memory that VIOS 2.1 itself requires (typically 512MB).

    Example:

    If you have two LHEAs assigned to VIOS 2.1, the VIOS will need a minimum of:

    612 MB + 612 MB + 512 MB = 1,736MB
    (LHEA) + (LHEA) + (Base) = Minimum RAM

On an unrelated note, if you’re ever asked to articulate the benefits of Power Systems, the IBM Power Systems Literature page is a good place to start. This page includes three different whitepapers that detail how Power Systems are designed for serviceability, availability and reliability. You’ll also find papers on using Performance Manager (PM) for Power Systems, why you should upgrade to IBM i 6.1, and how to manage Power Systems with Systems Director 6.1.

Using ASMI as an Alternative to HMC

Edit: Not sure this would work these days.

Originally posted April 14, 2009 on AIXchange

Sometime back, I wrote a blog entry about using the Advanced System Management Interface (ASMI) to access POWER5 processor-based machines for customers that don’t have an HMC.

In the era of POWER6 processors, ASMI remains an option for smaller shops with one or two computers and no compelling reason to have an HMC in their environment. For instance, one of my customers recently wanted to connect a dumb terminal to a model 8203-E4A.

For this environment, that’s a completely workable solution. The admins are only a short drive away from the computer room, so if things ever get so bad that someone requires console access, they’re willing to come in to do the work as opposed to trying to log in over the network.

The customer had a very old dumb terminal that it wanted to reuse with this machine. How old was it? It was so old that nobody could remember how to change the settings. They couldn’t even find anything on Google. That’s one old dumb terminal.

However, with your Windows laptop, a serial connection and Hyperterm, you can easily access the vty0 serial device and use it as the console. Set your connection speed to 19200–although if you’re plugging into a green-screen terminal and you can’t remember how to change the settings, you should slow down the server’s serial port speed rather than try to speed up the terminal.

That’s where ASMI comes in. If you go in through AIX and smitty, you can’t change the speed on the vty device. Only ASMI gives you this option.

 According to the Redpaper, “IBM Power Systems 520 Technical Overview“:

“Service processor Eth0 or HMC1 port is configured as 169.254.2.147 with netmask 255.255.255.0”

In my case, when I changed the IP address on my laptop, I went with another address on the 169.254.2.x network, plugged the network cable between my Ethernet port and the server’s HMC1 port, and was able to access ASMI at https://169.254.2.147. I logged in with the admin ID and changed my serial port speed to 9600. Then I went into smitty and changed the TERM type to something that the dumb terminal recognized. All was well.

Would an HMC or a newer dumb terminal onsite make it easier to gain console access? Certainly. However, I strongly believe that customers know what’s best for them. I’ll lay out the pros and cons, but ultimately the customer decides how to run the business. This console will only be used infrequently, if ever. Reusing still-working equipment made perfect sense here.

Reverting to an Older VIOS Level

Edit: VIOS 1.5! VIOS 2.1! POWER6 blade! Changed the old publib link for lpcfgop to a knowledgecenter link.

Originally posted April 7, 2009 on AIXchange

Recently I needed to replicate a customer environment. So I took my blade, which was successfully running VIOS version 2.1, and loaded VIOS 1.5 with the latest service pack. I assumed I could just reload it from NIM and overwrite my current VIOS installation, and everything would be fine. However, after the reload, when I edited the TCP/IP information and logged into the Integrated Virtualization Manager (IVM), I was surprised to find that all of my partition information was still there.

When I attempted to remove partitions, I received errors:

rmsyscfg -r lpar -n i5os

[VIOSE01050502-0145] The current virtual adapter configuration for the management partition is not compatible with the requested configuration. Depending upon where this product was acquired, contact service representative or the approved supplier.

According to lsmap, the disks were still mapped; I’d assumed that they would no longer be there after the reload:

lsmap -all
SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost0          U7998.61X.060F4CA-V1-C11                     0x00000000

VTD                   NO VIRTUAL TARGET DEVICE FOUND

SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost1          U7998.61X.060F4CA-V1-C13                     0x00000000

VTD                   NO VIRTUAL TARGET DEVICE FOUND

SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost2          U7998.61X.060F4CA-V1-C15                     0x00000000

VTD                   NO VIRTUAL TARGET DEVICE FOUND

I could remove these vhosts by running this command:

rmvdev -vtd vhost0

That seemed to clean up the output when I ran lsmap. However, when I rebooted, the mapping information returned.

I called support, and was told that my LPAR information was stored in NVRAM, and VIO was just rereading and reloading this information.

Support had me run:

lpcfgop -o clear

[VIOSW01040F00-0055] You requested to clear all partition configuration data on the managed system and set the configuration back to its original state. Do you want to continue (0 = no, 1 = yes)?

Then I restarted the VIO server.

shutdown -restart
Shutting down the VIO Server could affect Client Partitions. Continue [y|n]?

y

The lpcfgop command can also be found here.

After running lpcfgop and rebooting, my VIOS was in a pristine state, running the older VIO version:

ioslevel
1.5.2.5-FP-11.1 SP-01

It isn’t everyday that administrators are faced with reverting to an older VIOS level (though if you ever have issues with a VIOS 2.x upgrade, you may need to go back). Just remember, should you ever need to do it, that you’ll first need to clear your partition configuration information.

Living in the Future

Edit: If I thought 2009 was the future, what do I think I am living now? This was when I was first getting used to GPS, I do not know how to live without it now. Wil Wheaton’s blog has moved.

Originally posted March 31, 2009 on AIXchange

I’ve owned a BlackBerry smartphone for a while. I’m happy with it, to the point that the iPhone vs. BlackBerry vs. Windows Mobile wars don’t interest me. The BlackBerry works just fine.

I was, however, unhappy to find that my cellular provider disabled the phone’s built-in GPS. While this issue was remedied with the purchase of a small GPS “puck” that can be connected via Bluetooth, I now must be sure to have both the phone and the puck if I plan on using GPS.

I’m directionally challenged. One thing I liked about living in Denver is that if I knew where the mountains were, I knew which way was west. Once I left the Rockies, I found that printing directions from MapQuest or Google Maps helped me get where I was going. Of course, this isn’t a perfect solution. I’ve been undone by getting the wrong address, making a wrong turn or having the meeting location change while I was en route.

Between these experiences and the fact that I’m constantly traveling to different locations and customer sites, I decided that I needed a GPS. Plus, using them in rental cars sold me on the functionality.

Initially I wasn’t sure which device I wanted, but then I discovered that my phone could run GPS mapping software. It talks to my puck and uses the cellular data network to access the back-end servers that do the heavy lifting of calculating directions and routing. The phone gives both audio and on-screen directions to the addresses that I enter. Although I can only use this combination in cellular coverage areas, and I cannot be talking on the phone when it needs to use the data link to access the back-end servers, I find that this accounts for 95 percent of my usage patterns, so it works fine.

My first software package was a monthly subscription plan. The directions were accurate and I was happy with it. Eventually though, I realized that there was only one audio option–the phone’s built-in speaker. It was audible, but because cars can be noisy, I wanted to route the audio through the car stereo. (If you have an older vehicle that has a cassette tape player, you can get a cassette adapter and run the audio from your phone’s built-in jack through the car stereo. Otherwise, you can use small radio transmitters that will broadcast over FM to your car radio.)

Unfortunately, my software didn’t support audio from the phone’s built-in jack, only the built-in speaker. And after multiple interactions with support, I learned that the solution provider didn’t plan on changing this any time soon. I was hardly the only one seeking this function. The provider’s support forums were overflowing with posts from frustrated users/soon-to-be former customers who wanted this functionality added. I too wound up turning to a new GPS software solution. Now I can route the audio through the car stereo or an earpiece.

As I think about all the things I can do from my phone–make calls, check e-mail, play MP3s, update my calendar, use GPS mapping software, browse the Internet, ssh to servers–I realize that we are  living in the future. Wil Wheaton (you might know him from his books, his blog, the movie “Stand by Me” or the television series “Star Trek: The Next Generation”) explains it well when he talks about attending a MacWorld conference many years ago:

“… Tim (Jenison, who’s considered the father of desktop video) had this little slab of RAM that was about the size of a credit card. ‘One day,’ he said, ‘you’ll be able to put a whole album on something this size.’

“… the way Tim presented this thing to us — not like it was something awesome that could happen but that it was something awesome that would happen — made quite an impression on me. It was at that moment that I became truly aware of how rapidly the world was changing, and how lucky I was to be living in it.

“I wasn’t mature enough to consider it then, but I wonder if people have felt the way I did throughout history, just for different reasons: mechanical flight, telegraphs, telephones, atomic energy and weapons, home computers, stuff like that. …”

Even in our jobs, where on a daily basis we work with technologies that were not long ago the stuff of fiction (virtualization, Capacity on Demand, hot swap of server components and live partition mobility, to name a few), it’s still exciting to think of what tomorrow will bring. A decade from now, who knows what amazing things we’ll be able to do with our servers and our phones?

IT Has its Mysteries

Edit: The link where authorities might step in no longer works, I found a working link to the same article.

Originally posted March 24, 2009 on AIXchange

Some people love a good mystery. Others enjoy a challenging puzzle. Working in IT, many times the task at hand involves solving mysteries and working out puzzles.

In the computing world, some people want to know “whodunnit” so that they can affix blame. Others want to know who did it so they can educate and enlighten that person or persons, and help them avoid repeating their mistake in the future. I suppose if it’s a serious problemauthorities might step in so they can prosecute.

The puzzle-solving starts when the problems are first reported. Either the help desk will get a call or an admin will notice something changed via reporting software or system alerts. What is causing the system to behave this way? What has changed in the environment? Who made the change that caused the problem? What new software was installed?

Once a mystery is solved, you’re often left with a mess to clean up. If you don’t have good policies in place, or if developers or junior level administrators have root access to machines, one simple mistake can cause problems.

Recently an administrator was trying to install the OpenSSH server from the IBM expansion pack CD. When he tried the installation, he would get an error:

RSA key generation failed

instal: Failed while executing the ./openssh.base.server.post_i script.

As a result, when an admin tried to run ssh-keygen, they would get
“PRNG not seeded” error messages.

In this case, the /dev/urandom file was somehow missing from the machine, and the randomctl –l command was used to re-create it.

After running this command, he was able to install the openssh.server filesets without any problems. It was pretty obvious who had deleted the file, and some education was in order.

Do you have the tools in place to know who’s logging in to your machines, what commands they’re running and what changes are occurring on the machine? Do you have file-level backups so that you can recover individual files or, if things get really fouled up and you have to rebuild the system, conduct full system backups?

If the answer to any of the preceding is no, why not? Do you like mysteries and puzzles that much?

Knowledge Flows From Mistakes

Edit: I still think that there is much to be learned when we make mistakes.

Originally posted March 17, 2009 on AIXchange

The above image is the work-safe version of a popular problem-determination flowchart. (The more widely circulated NSFW version can be found by searching on “flowchart no problem” or “problem solving flowchart.”)

Why bring this up in the first place? The point of the flowchart seems to center around blaming mistakes on others, or, if people don’t know about the mistakes, being sure we don’t tell anyone about them. It’s fun in theory, but hopefully in real life, we aren’t looking to hide things or blame others.

Hopefully the environments in which we work have test labs and other places where people can figure things out. Even with test labs though, people will make mistakes. We all do, we all have and we all will. It’s a fact of life. Hopefully with experience we make fewer mistakes, but I bet that many of the good habits that you have formed over the years are the direct result of yours or someone else’s mistake, coupled with the desire to not repeat that same mistake.

Of course, we must own up to our mistakes. Recently someone accidentally pulled the wrong power cords loose in a computer room. Those cords fed a critical SAN switch that was being used by a ton of machines. However, the guy who pulled the cord didn’t report what happened, leaving it to the SAN administrators to figure out what had gone wrong.

I remember watching a guy pull up one of the tiles on a raised floor at the same instant that the power went out and the UPS kicked on. The look on everyone’s face was priceless. He hadn’t touched anything or done anything, but for whatever reason, that exact moment was when the power went out. Would he have told anyone about it if he had been the cause of the outage, or would he have covered up the tile and gone to hide in his office?

In our work environments, mistakes need to be–if not tolerated- accepted, at least to the extent that people are allowed to understand and learn from them. The point of root-cause analysis isn’t to assign blame, but to figure out what went wrong and how it can be done better in the future. Sometimes that brings new procedures to bear. Sometimes it leads to better documentation. In all cases, if done right, it should reduce the likelihood of the same situation reoccurring.

In test labs I’d go as far to say that mistakes should be encouraged. Some of the best learning comes from trying and failing–multiple times–to get something to work. All of the effort that went into gaining that hard-earned knowledge is much more valuable than simply going step by step through
someone else’s documentation. Yes, you can learn that way. But when you actually figure things out for yourself, you’re in a much better position to really fix things when unexpected problems arise.

The flowchart is worth a laugh. It may even be worth printing out and displaying in your work area. But don’t live by it. In fact, when it comes to your job, do the opposite.

Karma and the Home PC

Edit: The troubleshooting link no longer works. Changed the tshirt link from thinkgeek to amazon.

Originally posted March 10, 2009 on AIXchange

As I noted in a recent AIXchange blog post, IT pros are natural targets for family members and friends looking for free assistance with their home computers.

When cornered, some of us will say that we can’t help because we don’t actually work with Windows. It’s a good excuse, and one I’ve used on occasion. In reality though, if you have experience with any operating system, you can probably help with a home PC (even if you don’t want to).

Certainly, I’m no Windows expert. However, the concept of maintenance and rescue isn’t foreign to me, and it probably isn’t to you, either.

I can boot from AIX media or a NIM server and get a machine into maintenance mode. I can resolve problems and get machines running again. Admittedly, it’s nice to know that IBM support is available if there’s an issue. They can usually help resolve whatever problem customers are seeing. If not, as a last resort, I can always reload from a backup.

The same can be said for a Linux rescue. You boot from media, chroot and fix whatever problem you are seeing. And–assuming you have a support contract for an enterprise distribution of Linux–many times you can call for support. Again, if there’s a problem, you can restore from a backup.

With Windows, it can be a little trickier. People ask us for help because they don’t want to pay for repairs unless absolutely necessary. So I go into it knowing that the machine is old and that its warranty and/or support has expired. I’m on my own.

Often I can Google for the symptoms they’re seeing. Microsoft has plenty of information to help people fix computers that won’t start. And numerous other sites give basic troubleshooting information.

In several instances, I’ve been able to boot from XP media, get into a recovery console, run chkdsk and solve the problem. For me, it’s relatable to running fsck on file systems to fix UNIX machines. Other times I’ve had to run fixmbr. And other times, such as when BIOS and Windows administrator passwords were lost, I moved the hard disk into another machine (since I could not change the boot list in the BIOS) and booted from a Linux Live CD (like Ubuntu or Knoppix). Then I used Linux to mount the Windows partition and Samba to share the data with another machine on the network so that the data could be recovered before the machine was rebuilt.

In one sad case, nothing that I tried worked; the hard drive had a mechanical problem. In that instance, having a vendor recover the data was more costly than it was worth.

Just like in the IT world, in the home PC world, peace of mind can be obtained through regular backups. It’s never a good feeling when your machine won’t start, but if your data is stored (on external USB disk, CDs, etc.), a bad situation will be much less bad.

Ultimately, if I have the time, I’ll try to help folks who come to me. I don’t consider myself a Windows guy, but I am a computer guy. Really, for an IT pro, booting from media and running chkdsk isn’t a major imposition. But to the average home PC user who doesn’t really understand computers, it’s a big help.

So when your friends come calling, you could point to the t-shirt. But in my case, I agree with Earl J. Hickey: I want karma on my side. At the very least, I like knowing that I helped someone out. It never hurts to do a good deed.

More From the AIX Grab Bag

Edit: The first link no longer works, though this one looks similar. The publib link for mount options no longer works. The online course no longer works. I added the original ending back into the post.

Originally posted March 3, 2009 on AIXchange

Though I’ve yet to try running HMC code in VMware, it sounds interesting. I cannot count the number of times people have asked me if there’s a way to play around with HMCs and Power Systems servers before buying them. Relatedly, I often hear how difficult it can be to find a sandbox system to learn on.

If you’ve had success in this area, please share your experience in comments. From reading through the comments on the original posting, it looks like some people got it to work while others had issues, so your mileage may vary.

I occasionally post information like this to spark ideas and, hopefully, generate some discussion. With that in mind, here are some other links that should appeal to AIX professionals.

I’m always interested in things that people do with their hardware. I found that running Mac OS X on a Dell Inspiron Mini 9 was interesting as well.


These items were mentioned on a mailing list.

    1) Thanks to LL for sharing this obscure gem:

    mount –o log=NULL

    DISABLES metadata logging. Performance gains are very significant. NOT a general purpose feature/option.

    2) Thanks to Nigel G. for sharing this gem:

    The AIX mount command has a “noatime” option. This avoids the need to update access times on files and can give you a considerable performance boost if you have lots of files being accesses.

    More mount options can be found here. Feel free to share others that you think would be useful.

This online course offers some good reference information about the command line interface for the HMC.

A recent issue of AIX EXTRA, IBM Systems Magazine’s monthly e-mail newsletter for AIX professionals, featured this article that covers how you do things in AIX.

And finally, here’s a post I wrote about the consolidation of Power Systems running both IBM i and AIX.

No, I Will Not Fix Your Computer

Edit: Had to change the tshirt link from thinkgeek to amazon.

Originally posted February 24, 2009 on AIXchange

You’ve probably seen this. It’s perfect weekend attire for the IT professional. Maybe you don’t go as far as wearing a T-shirt that says, “No, I will not fix your computer.” But do you tell people to leave you alone when they ask for help?

If you work in IT, the world assumes that you can fix home computers. It doesn’t matter if you’re the network guy, the mainframe guy or a Web designer–people only understand that you work on computers, so you must naturally know how to fix their broken machines.

In an ideal world, home PC owners would fix their computers–or at least they’d know how to boot from media and run chkdsk. But in reality, people bring their computer issues to IT pros, either those they know or those they don’t.

With that in mind, here’s a story I’ve been following. Maybe you heard that themayor of Racine, Wis., was recently arrested in Dateline-esque style sting.

Disturbing as it is that child pornography was found on his computer, what’s interesting is the way he got caught. The (now former) mayor apparently “brought a personal computer to city hall and asked city workers to fix a problem with it. Whatever the workers found was turned over to Racine police, who in turn passed it on to the state task force.”

I also read that the technician who worked on this machine in August 2007 copied and saved the files, and then went to the mayor and suggested he delete them. It was more than a year before the computer tech turned the files over to the police.

If you ever do need to take your computer in for service, take caution. While you don’t have illegal files or images on your PC, you may have personal data and files that you don’t want others to see. And it’s not uncommon for computer technicians to find–and make copies of–materials they find on customer machines. Keep that in mind when you’re seeking help with your computer.

People really would be best off fixing their own computers, but in reality, their choices are generally limited to trusting strangers in a repair shop or trying their luck with the IT guy that they know. When you’re that IT guy, and people come to you, what do you say? Do you try to help out, or do you just point to the T-shirt?

We Can Learn from End Users

Edit: This still holds up, and Nick Burns is still a jerk.

Originally posted February 17, 2009 on AIXchange

I spent a year in high school working as a busboy at Sizzler Steakhouse. You might think that anyone can do that job–and admittedly, the stress level wasn’t high compared to other jobs I’ve had. But when you go to a restaurant, do you notice the busboys? If you were one, sure; otherwise, probably not.

And yet, how the busboy does his menial job can set the tone for your whole dining experience. If your table isn’t wiped down properly, you may conclude that the whole restaurant, including the kitchen, is dirty, or that the employees don’t take pride in their work. You may even end up taking your dollars elsewhere.

Sometimes the restaurant was busy–often it wasn’t. During the slow times, our manager would ask us to make sure the dishes were washed, dried and put away, and that floors were cleaned. When things got really slow, we cleaned the walls. Sometimes we’d wonder, “what’s the point of cleaning a wall?” But we learned that, without a good scrub-down, walls can get quite dirty over time.

The point is, customer service matters. It certainly matters in IT. So how can we “clean the walls” in our IT environments and do our jobs better? For starters, we should change the way we think about our customers–the end users. In some help-desk environments, “lusers” are commonly belittled for their lack of technical expertise. But we should take a moment to step into their shoes. These people might not be technical, but that doesn’t mean they aren’t intelligent. They just don’t share our background.

A co-worker of mine once snapped at a nurse when she had problems logging into her workstation. She responded by asking him if he’d like to come up the hall with her and fix an IV or administer some drugs. Touche. The nurse was just as knowledgeable and passionate about healthcare as my coworker was about technology. Working with computers was important, but it was only a small part of her job. She just needed to enter data and to print some reports. She didn’t care about drivers, passwords or proper startup/shutdown sequences. Once we showed her how to do what she needed to do, she was fine, and we didn’t hear from her again.

End users may not know computers, but they know when they’re running slowly. How often do you take the time to actually sit down with your end users and find out how things are working from their perspective? I’ve had users who were printing out reports from one system and retyping the data into another. How easy would it be to save folks from that effort and aggravation? Just leave the raised floor and take a walk. Find people in other departments that use your systems and ask them for feedback. Ask them if you can look over their shoulder while they use your machine sometime.

End users are our customers. If they weren’t using the data we store and process, there would be no need for us. And if we have a better understanding of users’ problems and frustrations, if we show them better ways to do things, the entire organization benefits.

It’s true that there’s never much time for “cleaning walls.” We’re always told to do more with less. But we should make the time for our end users, our customers. We can learn from them.

A Tale of Two Managers

Edit: Had to find a link to the article, it is not the exact same place but it looks like the same story.

Originally posted February 10, 2009 on AIXchange

Do good managers need to watch their backs? That’s the subject of this recent article. The author tells the story of two managers. One gave credit to those under him and above him. Understandably, he was well-liked. The second manager manager took credit for work that he and his team didn’t do. He’d also yell and threaten and harass, and scheme behind closed doors. At the end of the article (spoiler alert), the nice manager was let go.

I’ve had good managers and bad ones. I assume most of you can relate.

Certainly, I’ve been around long enough to understand that shouting and screaming doesn’t constitute leadership. I’m reminded of “Real Genius.” Toward the end of this 1980s era movie, a character named Lazlo is working at a computer, and things get a little tense. People are looking to him to get something done. As I recall, he says something about not working well under pressure. I don’t find anything wrong with this admission. Lazlo’s like a lot of smart folks I’ve known over the years. They like things orderly and calm, and in a crisis they’ll do what it takes to solve the problem.

Think back to tense moments in your career. Machines are down, response time is slow, people cannot login, applications won’t start. What ever the circumstance, was anything resolved more quickly, or made better, with a manager waving his arms and telling everyone that the sky is falling? Does yelling or screaming ever help pinpoint a problem’s root cause or develop better procedures to ensure that the same kind of outage doesn’t reoccur in the future?

I distinctly remember a group of us trying to troubleshoot a problem while a non-technical manager sat in the room. We were on hold with IBM support, discussing potential causes and possible solutions, and the manager would threaten and complain about the money being lost–as if we needed to be reminded that was a critical situation. On top of that, he slowed the process by asking basic technical questions, rather than let us technical people do our jobs and get things working again.

Good managers don’t yell, they motivate. Good managers try to keep the political and non-technical issues to a minimum, so you can concentrate on your job. Good managers care about you. I’ve had great managers who would lobby for us to get offsite education. I’ve had managers who would go to bat for us with management, and try to help us come up with acceptable work/life balance solutions. These are the managers who give credit where it’s due. I still keep in touch with these people, and I’d be happy to work with them again.

Over the years I have certainly had warnings about different shops to avoid. The management is inflexible. They shout and rave and act like children. Bad managers might be able to bully some employees, but word gets around. Before you know it, you have a reputation — and a workplace that people try to steer clear of.

I hope we aren’t to the point where the bad managers are winning at the expense of good managers. I hope that the teams we assemble know not just how to get the job done, but how to have fun. I hope that people work together for a common good, rather than simply putting in the time and collecting a paycheck. I also have hope for the bad managers, that they’ll learn that there’s a better way.

Finally, I hope that you–if you haven’t already–find a good job, working with and for good people.

You Have to Start Somewhere

Edit: I still believe in promoting from within and ongoing training and learning.

Originally posted February 3, 2009 on AIXchange

As I’ve mentioned in previous AIXchange blog entries here and here, I believe that companies should recruit and promote from the inside, and provide employees with offsite education and training.

I like to see companies that have an operations or network operations department promote from there. Organizations can easily spot the go-getters and quick learners when they’re already working in these level-one jobs. (Of course, for companies that outsource much of their level-one support tasks, promoting from the ranks is a challenge when the desktop support and operations people work for someone else.)

Detractors say that promoting the “good” employees causes your level-one organization to suffer. Once the talented employees leave, levels two and three start getting more off-shift phone calls. But I see this as an opportunity to better train the remaining staff members.

Just recently, I saw an organization do exactly what I’m talking about. The experienced AIX administrator left the company. Rather than recruit an experienced administrator to take his place, they promoted from within. For a new server deployment, they brought in a business partner’s consultant. While the new administrator did the hands-on work, the consultant watched and offered instructions and advice. They tackled the new environment and determined at could be done with the old environment to improve operations.

The new guy took copious notes and asked many questions. Yes, it took longer compared to having the consultant just step in and do the work, but when the consultant left, the new administrator knew exactly what was involved in setting up the environment. He had hands-on training. He’d taken notes so he understood how to do things. He was well prepared for some future offsite training that will help him round out his newly acquired knowledge.

Again, detractors will point out that with internal promotions come raises. But bringing in an experienced administrator costs money, too. In a perfect world, there would be a veteran admin to train every newbie. But again, mentors can be found with business partners or even among contacts made at offsite conferences or educational opportunities.

Then there’s the experience factor. Sure, a newly promoted staffer won’t know what a long-time admin pro knows. I still argue for promoting from within. Your IT people know your environment, they know your users and they know your company. They know how things really get done. This knowledge is also valuable.

While companies must manage a learning curve with new administrators, give it a few years–maybe even a few months–and it will be hard to tell the new admin from the one with more experience today. Of course it is possible that a well-trained employee will move onto another company–but plenty of these people stay put, because they see an employer that gave them a chance to advance.

Once you’ve been around awhile, it’s easy to forget how much you’ve learned. None of us came into this industry knowing nearly as much as we do now. We all had to start somewhere.