Building Virtual Environments

Edit: Some links no longer work.

Originally posted November 8, 2016 on AIXchange

This IBM developerWorks page offers helpful information about building virtual environments. While it hasn’t been updated in awhile, the content is certainly relevant.

There are four sections, covering “pre-virtualization,” planning and design, implementation, and management and administration. The information that follows is excerpted in bits and pieces:

“Virtualization is large subject, so this section will assume you know the basics and you have at least done your homework in reading the two Advanced POWER Virtualization Redbooks. …

Skill Up on Virtualization
* You need to invest time and practice before starting a Virtualization implementation because misunderstandings can cost time and effort to sort out – the old saying “do it right the first time” applies here.
* There is no quick path. …

Assess the virtualization skills on hand
* It is not recommended to start virtualization with only one trained person due to the obvious risk of that person becoming “unavailable”
* Some computer sites run with comparatively low-skilled operations staff and bring in consultants or technical specialists for implementation work – in which case, you may need to check the skill levels of those people. …

Practice makes perfect
* In the pSeries, the same virtualization features are available from top to bottom — this makes having a small machine on which to practice, learn and test a realistic proposition. The current bottom of the line p505 is available at relatively low cost, so if you are preparing to run virtualization on larger machines, you can get experience for a tiny cost.
* Also, in many sites machines in the production computer room have to undergo strict “lock down” and process management – a small test machine does not have to be run this way, and I have seen system administrators and operations run a “crash and burn” machine under their desk to allow more flexibility.”

There’s a nice list of different scenarios that cover small machines, “ranches” of machines, production, etc. The last list mentions a dual VIO server, although I would argue that is the rule and not the exception:

“When to go Dual Virtual IO Server and when not to?

This is impossible to answer, but here are a few thoughts:

* The Virtual IO Server is running its own AIX internal code like the LVM and device drivers (virtual device drivers for the clients and real device drivers for the adapters). Some may argue that there is little to go wrong here. Adding a second Virtual IO Server complicates things, so only add a second one if you really need it.
* Only add a second Virtual IO Server for resilience if you would normally insist on setting up a high-availability environment. Typically, this would be on production machines or partitions. But if you are going to have an HACMP setup (to protect from machine outage, power supply, computer room or even site outage), then why would you need two Virtual IO Servers? If the VIO Server fails, you can do a HACMP fail over the other machine.
* If this is a less-critical server, say one used for developers, system test and training, then you might decide the simplicity of a single VIO Server is OK, particularly if these partitions have scheduled downtime for updates to the Virtual IO Server. Plan on scheduled maintenance. Also note the VIO Server and VIO Clients start quickly so the downtime is far less then older standalone systems.”

VIO server sizing info is one area where the content is old. Nigel Griffiths has updated information, noting, among other things, that the VIO server must be monitored as workloads increase.

Back to the original link. This is found in the section headed, “Common Mistakes”:

“Priorities in Emergencies
Network I/O is very high priority (dropped packets due to neglect require retranmission and are thus painfully slow) compared to disk I/O because the disk adapters will just finish and sit and wait if neglected due to high CPU loads. This means if a Virtual IO server is starved of CPU power, something that should be avoided, but if it happens then the Virtual IO Server will deal with network as a priority. For this reason some people consider splitting the Virtual IO Server into two. One for networks and one for disks, so that disks do not get neglected. This is only a worst case scenario and we should plan and guarantee this starvation does not happen. …

Virtual IO Server below a whole CPU
For excellent Virtual IO Server responsiveness giving the VIO Server a whole CPU is a good idea as it results in no latency waiting to get scheduled on to the CPU. But on small machines, say 4 CPU, this is a lot of computer power compared to the VIO client LPARs (i.e. 25%). If you decide to give the VIO Server say half a CPU (Entitle Capacity = 0.5) then be generous, never make the VIO Server Capped and give it a very large weight factor.”

These excerpts are from the “Implementation” section:

“* In most large installations the configuration is an iterative process that will not quite match the initial design so some modification may have to be made. …
* Also opportunities may appear too to add flexibility of a pool of resources that can be assigned later, once real life performance has been monitored for a few weeks.”

Finally, from the “Management/Administration” section:

“Maintain VIO Server Software
* New and very useful function appear in the latest VIO Server software which makes updating it worthwhile.
* Take careful note the this may require firmware updates too and it is worth scheduling these and in advance of VIO Server software updates.
* There are also fixes for the VIO Server to overcome particular problems.

It is worth making a “read only” HMC or IVM user account for people to take a look at the configuration and know they can’t “mess it up”.

I often get people claim that their Virtual I/O resource is not available when they create a new LPAR and 90% of the time it is due to mistakes on the HMC. The IVM features automation of these setup tasks and is much easier. Also recent new versions of the HMC software make the cross checking of the virtual VIO Server VIO client resources all match up.

It is strongly recommended the the HMC, system firmware and VIO Server software is all kept up to date to make the latest VIO features and user interface advances available and to remove known and fixed problems with early releases.”

At the very end of the document there are a few examples of how to get configuration data from the HMC, create an LPAR using the command line, and create LPARs using a configuration file.

Again, some of this information is dated. But overall, there’s lots of good advice.