Database Benchmarking – How to Benchmark Databases Effectively

benchmarkingdatabase

I see lots of discussions flying around about performance of db 'x' or that moving from 'x' to 'y' improved our site performance.

I'm yet to see proper benchmarking that works across different types of databases.

  1. Is it possible to write a meaningful benchmark that could be used across multiple db types, such as Relational, Document-oriented, etc.

  2. How would you go about designing such a benchmark?

Best Answer

Short answer

Yes, you can write a meaningful benchmark of a studied case, if you do it with care, and understand that if it's relevant for the particular case, it may not be for other cases. This is equally true when comparing the databases of the same type (relational database vs. another relational database) or the databases of different types.

No, you cannot write a benchmark which will magically prove that a specific database is ways better than another in every case, for every application.

Long answer

It's definitely possible to say that "moving from a database to another improved our site performance".

  1. You measure the performance of the previous database through profiling or runtime statistics by gathering enough information about the queries and how fast they are.

  2. You move the application to the new database.

  3. You do the same measures.

  4. You compare.

For example, if the full list of 3 182 432 products loaded in 2.834 s. on an old database and loads in 0.920 s. on a new database, given that in both cases, the application has an empty cache, it's a win: the new database improved your site performance concerning this query.

Now, as any performance metric, it's biased:

  • Agreed, the new query is faster. But wait, your DBA didn't know how to use the database you had before, so the query which loads all the products is not optimized. If you rewrite it like that, you would be able to load those products in 0.855 s. instead of 2.834.

  • Ok, you have a better result. But don't you think that it's unfair to compare a database with fresh data just flushed to a 10 years old database for which the last maintenance plan was run three years ago? By the way, don't you think you should have updated the database product at least once during the last four years?

  • Some queries are faster. Some are slower. How do you compute the average result to know that you gained performance overall when moving to the new database? Ok, the time you load all 3 182 432 products is faster. But does it matter, while the query is executed on the website only in a rare case when an administrator is performing some specific task he performed only two times in the last ten years? On the other hand, executing all queries on the home page for a fresh user wastes 0.281 s. with the new database, when it was 0.207 s. with the old database. This result matters much more, especially since those queries cannot be cached for a long time, and are executed tens of thousands of times per day.

  • Both databases must be tested on the same servers, same hardware, same structure. For example, you can't test one database on a single hard drive, and the other one on a RAID1 of two SSDs. When you migrate a large project to a new database, there are chances that you'll just host the new database on hundred other newly deployed rack servers, when the previous database will still remain on the previous machines.

To summarize, you can benchmark the database queries of an application, and obtain precise metrics. But then, you have to give a meaning to numbers. At this state, it's tempting to say that you gained site performance: otherwise, management would be angry to learn that you've spent thousands of dollars and months of work just to make things slower.

The most terrible mistake is to take those conclusions from the benchmarks and to conclude some stupidity like "Microsoft SQL Server is three times faster than Oracle": saying this is like saying that "Java is better than PHP". Define better. Better in what cases? For what sort of applications? For what team of developers?

More you interpret and generalize, more the thing becomes irrelevant and meaningless.

The query select [...] you can find in the revision #832 in file ProductFactory.cs, line 117 executes under 0.5 s. with the new database when tested under the conditions specified in the non-functional requirements annex M, case 3. This allows passing the non-functional requirement 527 (see page 80, revision 9). The same requirement was not satisfied with the previous database, when the test results was in the range 0.9..1.3 s. in the same conditions.

is meaningful for a developer, and precise enough to know what was tested, how, and what was the results. This answers your question number 2.

Sadly, it doesn't make any sense for the management. Instead:

Migrating our product from MySQL to the newest version of Microsoft SQL Server improved the overall performance of our product by five, reducing at the same time the costs by two and the environmental footprint by three. We believe that migrating all our applications to Microsoft SQL Server the next year will give even better results and increase our market competitiveness.

is a pure marketing jibber-jabber, and, technically, doesn't mean anything, but surprisingly has a value for the management and marketing departments.

Finally, can we compare different types of databases? I would say it's totally possible. Let's say I have a website hosting large photos. Those photos are stored in varbinary(max) in Microsoft SQL Server 2005 (so I can't use filestream). I am concerned about the performance when loading those photos, so I decide to store the photos as files instead, using the file system as my new database. First, those files are stored on the same machine than the database. I profile the new solution, and obtain the result that shows that in my case, files are loaded 4% faster from file system than from Microsoft SQL Server. The benchmark is very clear. Now I can think about deploying a dedicated server optimized for direct file storage, rather than using the server optimized for Microsoft SQL Server.

Related Topic