How Does Your Database Rate?

Edit: Do you ever check these?

Originally posted March 1, 2016 on AIXchange

The website db-engines.com rates “database management systems according to their popularity.”

The list has been around for a few years, and as this InfoWorld article notes, “It isn’t forensically precise, nor is it meant to be; it’s intended to give a sense of trends over time.”

The left nav bar contains database rankings by type, including relational databases (IBM’s DB2 is fifth on that list), key-value stores and document stores. You can also see how prevalent open source databases have become.

Here’s how the rankings are calculated:

“The DB-Engines Ranking is a list of database management systems ranked by their current popularity. We measure the popularity of a system by using the following parameters:

* Number of mentions of the system on websites, measured as number of results in search engines queries. At the moment, we use Google and Bing for this measurement. In order to count only relevant results, we are searching for <system name> together with the term database, e.g. “Oracle” and “database”.

* General interest in the system. For this measurement, we use the frequency of searches in Google Trends.

* Frequency of technical discussions about the system. We use the number of related questions and the number of interested users on the well-known IT-related Q&A sites Stack Overflow and DBA Stack Exchange.

* Number of job offers, in which the system is mentioned. We use the number of offers on the leading job search engines Indeed and Simply Hired.

* Number of profiles in professional networks, in which the system is mentioned. We use the internationally most popular professional network LinkedIn.

* Relevance in social networks. We count the number of Twitter tweets, in which the system is mentioned.

We calculate the popularity value of a system by standardizing and averaging of the individual parameters. These mathematical transformations are made in a way so that the distance of the individual systems is preserved. That means, when system A has twice as large a value in the DB-Engines Ranking as system B, then it is twice as popular when averaged over the individual evaluation criteria.

The DB-Engines Ranking does not measure the number of installations of the systems, or their use within IT systems. It can be expected, that an increase of the popularity of a system as measured by the DB-Engines Ranking (e.g. in discussions or job offers) precedes a corresponding broad use of the system by a certain time factor. Because of this, the DB-Engines Ranking can act as an early indicator.”

A blog posting went into further detail about the rankings:

“1) The Ranking uses the raw values from several data sources as input. E.g. we count the number of Google and Bing results, the number of open jobs, number of questions on StackOverflow, number of profiles in LinkedIn, number of Twitter tweets and many more.

2) We normalize those raw values for each data source. That is done by dividing them with the average of a selection of the leading systems in each source. That is necessary to eliminate the bias of changing popularity of the sources itself. For example, LinkedIn increases the number of its members every month, and therefore the raw values for most systems increase over time. This increase, however, is rather due to the growing adoption of LinkedIn and not necessarily resulting from an increased popularity of a specific system in LinkedIn. Giving another example: an outage of twitter would reduce the raw values for most of the systems in that month, but obviously has nothing to do with their popularity. For that reason, we are using a selection of the best systems in each data source as a ‘benchmark’.

3) The normalized values are then delinearized, summed up over all data sources (with weighting the sources), re-linearized and scaled. The result is the final score of the system.

The normalization step is the key to understanding the December results: the top three systems in the ranking (Oracle, MySQL and SQL Server) all increased their score. Oracle and MySQL gained formidable 16 and 11 points respectively. As a consequence the benchmark increased, leading to potentially less points for many other systems.

Why are we not using all systems as a benchmark for a data source? Well, we continuously add new systems to our ranking. Those systems typically have a low score (assuming that we are not missing major players). Then, each newly added system would reduce the benchmark and increase the score of most of the other systems.

Conclusion: it is important to understand the score as a relative value which has to be compared to other systems. Only that can guarantee a fair and unbiased score by eliminating influences of the usage of the data sources itself.”

While my work almost exclusively involves supporting systems with databases that run on AIX, I still find it worthwhile to learn more about other systems and databases. It’s good to know what else customers are working with.