Database – How to test variable values which differ due to floating point inaccuracy caused by different DBMS types

databaseintegration-teststest-automationtesting

I am working on a program that needs to work on floating point values that are fetched from different database types: currently we support 12 different DBMS (for example, two of them are Sqlite3 and MariaDB).

My code applies some business logic to the values fetched from the database, a floating point score is calculated and values are ordered by the score. I have written tests for this business logic. Due to the differences in an order of magnitude about of 10^-6 between those values when fetched from different databases, the ordering outcome depend on the database type.

For production, I believe difference in ordering due to a difference of 10^-6 is acceptable. (Especially because we say the data integrity and quality is user's responsibility for our product.) Also, our tests for fetching values from dbs test up to a 10^-5 precision.

What is the most effective way to test this automatically? Namely, ordering changes due to small differences of floating point inaccuracy.

Note. Some clarificaiton as requested

  1. SQL query is aggregation with some group by clauses (different aggregation functions are tested) and two different time period filters. In this example aggregation function was mean of a column. So two dataframes are fetched from the database with same columns.

  2. These two dataframes are joined on group by columns. Difference of metric column for both time periods are calculated.

  3. Score is difference * z-score of score.

  4. The 10^-6 difference is from the way Sqlite3 and MariaDB calculates mean.
    All databases for each db type is created and inserted at the start of each test using same csv files as input.

Best Answer

I see the following alternatives:

  1. Tailor your test data to avoid the problem (so no two score values differ by a value as small as 10^-6)

  2. Write a special comparison function for the tests which tolerates the differences (for this, it will be better to fetch the values from the DB using the highest available precision, not by a cutted precision like 10^-5)

  3. Provide different test assessment data for each DBMS (or at least each group of DBMS where you expect differences).

  4. Store real numbers not as floating point, but in a database independent format (like a fixed point format, or a string type with a fixed number of decimals), and make sure all calculations are done with exactly the same internal precisions.

If option 1 is acceptable or not is a question of the acceptable overall risk and of the gory details of the business requirement and the scoring. There are systems where this is ok, and others, where it is not.

Option number 2 can be tricky, but that is probably what you would also do in manual testing. It may, however, come with a certain risk of overlooking certain kind of regressions.

Option 3 will require more maintenance effort for the assessment data - if it is manageable depends on the number of tests which would require such differentiated data, and the number of DBMS groups for which different data would become necessary.

Option 4 may be feasible for financial systems, where all the numbers involved may have a defined number of two or three decimals, and systems where the calculations are exclusively done at the client side, and none at the database side.

Related Topic