Thoughts on Performance Tuning

Edit: Still good stuff

Originally posted January 17, 2017 on AIXchange

I recently discovered this post to the UNIX & Linux Forums. While it’s from 2013, “The Most Incomplete Guide to Performance Tuning” has some great — and still relevant — ideas.

For starters, this is from the section called “What Does Success Mean?”

“The problem is that fast is a relative term. Therefore it is absolutely imperative that you agree with your customer exactly what fast means. Fast is not “I don’t believe you could squeeze any more out of it even if I threaten to fire you”. Fast is something measurable – kilobytes, seconds, transactions, packets, queue length – anything which can be measured and thus quantified. Agree with your customer about this goal before you even attempt to optimize the system. Such an agreement is best laid down literally and is called a Service Level Agreement (SLA). If your customer is internal a mail exchange should be sufficient. Basically it means that you won’t stop your efforts before measurement X is reached and in turn the customer agrees not to pester you any more once that goal is indeed reached.

A possible SLA looks like this:

Quote: The ABC-program is an interactive application. Average response times are now at 2.4 seconds and have to be reduced to below 1.5 seconds on average. Single responses taking longer than 2.5 seconds must not occur.

This can be measured, and it will tell you – and your customer – when you have reached the agreed target.

By contrast, here’s a typical example of work that is not covered by an SLA, a graveyard of hundreds of hours of uncounted, wasted man-hours:

Quote: The ABC-program is a bit slow, but we can’t afford a new system right now, therefore make it as fast as possible without replacing the machine or adding new resources.

The correct answer for such an order is: “if the system is not important enough for you to spend any money on upgrade it, why should it be important enough for me to put any serious work into?”

This is from the section, “What Does Performance Mean?”

“Another all too common misconception is the meaning of “performance”, especially its confusion with speed. Performance is not just about being fast. It’s about being fast enough for a defined purpose under an agreed set of circumstances.

A simple comparison of the difference between performance and speed can be described with this analogy: We have a Ferrari, a large truck, and a Land Rover. Which is fastest? Most people would say the Ferrari, because it can travel at over 300kph. But suppose you’re driving deep in the country with narrow, windy, bumpy roads? The Ferrari’s speed would be reduced to near zero. So, the Land Rover would be the fastest, as it can handle this terrain with relative ease, at near the 100kph limit. Right? But, suppose, then, that we have a 10-tonne truck which can travel at barely 60kph along these roads? If each of these vehicles are carrying cargo, it seems clear that the truck can carry many times more the cargo of the Ferrari and the Land Rover combined. So again: which is the “fastest”? It depends on the purpose (amount of cargo to transport) and environment (streets to go). This is the difference between “performance” and “speed”. The truck may be the slowest vehicle, but if delivering a lot of cargo is part of the goal it might still be the one finishing the task fastest.

There is a succinct difference between fast and fast enough. Most of us work for demanding customers, under economic constraints. We have to not only accommodate their wishes, which are usually easy – throw more hardware at the task – but also their wallet, which is usually empty. Every system is a trade-off between what a customer wants, and what he is willing to pay for. This is another reason why SLA’s are so important. You can attach a price tag to the work the customer is ordering, so they know exactly what they’re getting.”

This is from the section, “Work Like You Walk—One Step at a Time”:

“If you try to tune a system, change one parameter, then monitor again and see what impact that had, or whether it had any impact at all. Even if you have to resort to sets of (carefully crafted) parameter changes do one set, then monitor before moving onto the next set.

Otherwise you run into the problem that you don’t really know what you are measuring, or why. For example, suppose you change the kernel tuning on a system while, at the same time, your colleague has dynamically added several GB of memory to that system. To make matters “worse” the guy from storage is in the process of moving the relevant disks to another, faster subsystem. At the end, your system’s response time improved by 10%.

Great! But how? If you need to gain another 5%, where would you start? If you had known that adding 1GB of memory had improved the response time by 3% and that adding 3 GB more was responsible for most of the rest, while the disk change brought absolutely nothing, and the kernel tuning brought around 0.5%, you could start by adding another 3GB, and then check if that still has a positive impact. Maybe it didn’t, but it’s a promising place to start. As it is, you only know that something you, or your colleagues, did caused the effect, and you have learned little about your problem or your system.”

And this is from the conclusion:

“Always remember that, as a SysAdmin, you do not operate in a vacuum. You are part of a complex environment which includes network admins, DBAs, storage admins and so on. Most of what they do affects what you do. Have a lousy SAN layout? Your I/O-performance will suffer. Have a lousy network setup? Your faster-than-light machine may look like a slug to the users. There is much to be gained if you provide these people with the best information you can glean from your system, because the better the service you offer to them, the better the service you can expect back from them! The network guy will love you if you do not only tell him a hostname but also a port, some connection data, interface statistics and your theory about possible reasons for network problems. The storage admin will adore you if you turn out to be a partner in getting the best storage layout possible, instead of being only a demanding customer.

Unix is all about small specialized entities working together in an orchestrated effort to get something done. The key point in this is that the utility itself might be small and specialized but its interface is usually very powerful and generalized. What works in Unix utilities also works in people working together: increase your “interface” by creating better and more meaningful data and you will see that others will better be able to pool their efforts with yours towards a common goal.”

There’s a lot more, so take the time to read the whole thing.

Share this: