The Danger of Defaults

Edit: Some links no longer work

Originally posted April 11, 2017 on AIXchange

A friend who was in the midst of a migration project recently asked me what I knew about TCPTR. Short answer: not much. So I went searching and found this definition:

Configures or displays TCP Traffic Regulation (TR) policy information to control the maximum incoming socket connections for ports.That led me to this more detailed explanation:

TCP network services and subsystems running on AIX automatically and transparently take advantage of this powerful DoS mitigation technology using simple administrative tuning. This new feature provides a simplified approach to increased network security by leveraging centralized management and firewall-based customization. In addition to providing effective service-level and system-level TCP DoS mitigation, IBM AIX TCP Traffic Regulation provides system-wide TCP connection resource diversity across source Internet protocol addresses initiating connections.That jarred my memory, so I went back to this article:

Over the weekend, a client implemented security hardening on their production LPARs. They used AIX 6.1 Security Expert. Apart from some users who had been locked out due to weak passwords, testing went well … until about 9am Monday, when some users reported they couldn’t log in.

I forwarded all that to my friend, but in the meantime, he’d figured out his issue. The details are pretty interesting:

The TCPTR functionality in AIX regulates the amount of connections on certain ports. If you run AIXPert and chose the high settings, it enables this functionality.

This particular application that hits the database on our server generates a lot of connections, more than are allowed by TCPTR by default. So, it was dropping connections. It doesn’t log this, in fact you can’t even enable logging of it (I asked IBM).

We turned it off and our problem went away.

Here is a basic rundown of what happened:

  • We were working on a migration project from Oracle 9 on Solaris 9 to Oracle 11 on AIX 7.1.
  • We had done preliminary migrations and testing with a small number of users.
  • On the weekend of the cutover, things were looking good. Database exports/imports went fine.
  • On Monday morning things were still looking good. No complaints from the users.
  • About mid-morning, we started getting reports of some users experiencing slowness and/or disconnects.
  • We began troubleshooting. We found errors in the Oracle logs like “TNS:packet writer failure” and “TNS:lost contact.”
  • This lead us to believe that we were dealing with an Oracle issue.
  • We spent a good part of the day reviewing and changing Oracle settings including TNS name resolution settings, etc.
  • Later in the day, after doing some we searches, one of the guys stumbled across this article.
  • We checked our systems, and sure enough, tcptr was enabled.
  • We disabled tcptr, and the issue cleared.
  • Upon some further investigation of my notes from six years ago when we first rolled out these new LPARs, it looks like we decided to use the AIXPert tool to enable some hardening of the AIX systems.
  • We must have used the AIXPert “high” setting, which enables TCPTR.
  • We have been running all this time without any issue, because the number of connections to our systems never exceeded the restrictions that tcptr puts in place by default.
  • For this new database we migrated, a large number of client connections are made, which exceeded the default settings for TCPTR.

I see this story as a cautionary tale. We’re taking chances when we accept a tool’s defaults without fully understanding what is being changed under the covers. But what can we really do about this? I always argue for using test and development LPARs and doing testing whenever possible, but this environment had been running with these rules in production for years without an impact until the usage scenario changed and more connections were coming in.

Obviously this isn’t an isolated issue, as at least two customers have run into it that we know about. Now I throw the question out to my readers. Have you experienced this or at least heard about it? Moreover, what should we be doing to protect ourselves and our environments?