Debating Support Scenarios

Edit: These are still interesting topics to consider.

Originally posted March 1, 2011 on AIXchange

In a recent post, I said this:

“Troubleshooting and administration are done via the network, from anywhere on the globe. This is great, especially for companies that utilize sun-support scenarios, where different teams in different countries and different time zones support machines during their normal business hours. Provided that good turnover information is being passed on from shift to shift, and calls and trouble tickets are accurately logged in a searchable database, this is a terrific support setup. At least it’s preferable, I think, to having IT staff members carry pagers and get called in the middle of the night to work on problems.”

However, this counter-argument is sometimes made to the follow-the-sun support scenario: If the administrators who built the machines are the same people who will get paged in the middle of the night in the event of a problem, then these admins will be extra careful when configuring their machines in the first place. Ultimately, if extra care is taken up front, there are fewer emergency calls.

Beyond that, some believe that the admin who built the server is the best person to fix it. We do get to know our machines over time. We know how they normally behave, we know where the logs are and when the cron jobs run, and we remember that quick little change we implemented a few days or weeks ago. An administrator who’s servicing an unfamiliar machine on a 3 a.m. call may need some time to get familiar with the applications it runs and its other unique characteristics.

All of this sounds logical, but I feel that the familiarity factor is a bit overrated. These days, many organizations take the time to standardize the look and feel of all their machines so that any team member can log into any machine and get right to work. But let me expound on what I said in the previous post: What I like about the follow-the-sun scenario is that people are actually working on the machines during their normal daylight hours. They’re not sleep-deprived; they’re fresh and alert and able to work on issues during the normal course of their day. And anything that isn’t resolved can be left for those coming in on the next shift.

Of course, in those cases, there’s a need to bring new shift members up to speed on what’s already been tried. But this isn’t all bad, either. Many times I’ve worked with IBM Support on issues that took multiple shifts to resolve. The departing shift members fill in the people coming in, and we continue to troubleshoot problems. Sometimes it helps to have a new set of eyes looking at a problem. A group of people will spend lots of time on an issue, then a new person will come in and immediately spot something that the rest of us overlooked. I’ve seen it happen.

Admittedly, globally dispersed support teams are a luxury available to only a few large companies. The rest of us generally work within individual IT departments.

So how do you deal with support issues? Do you prefer to have the on-call pager for a week at a time?  Do you prefer to have dedicated staff working second and third shifts? Is your after-hours call volume so high that you can only handle a few days of it before exhaustion creeps in? I knew a guy who hated his turn on the pager rotation so much that he would bribe his teammates — to the tune of hundreds of dollars — to take his week for him.

Hopefully you’re on good terms with your IT team and can adjust your schedule when need be. And hopefully your bosses recognize the perils of pager duty and allow you time off after an extended period of night calls. But how does your organization handle this? If you have some solutions — or some horror stories — e-mail me or make a post in Comments.