Database – How to test variable values which differ due to floating point inaccuracy caused by different DBMS types

databaseintegration-teststest-automationtesting

I am working on a program that needs to work on floating point values that are fetched from different database types: currently we support 12 different DBMS (for example, two of them are Sqlite3 and MariaDB).

My code applies some business logic to the values fetched from the database, a floating point score is calculated and values are ordered by the score. I have written tests for this business logic. Due to the differences in an order of magnitude about of 10^-6 between those values when fetched from different databases, the ordering outcome depend on the database type.

For production, I believe difference in ordering due to a difference of 10^-6 is acceptable. (Especially because we say the data integrity and quality is user's responsibility for our product.) Also, our tests for fetching values from dbs test up to a 10^-5 precision.

What is the most effective way to test this automatically? Namely, ordering changes due to small differences of floating point inaccuracy.

Note. Some clarificaiton as requested

SQL query is aggregation with some group by clauses (different aggregation functions are tested) and two different time period filters. In this example aggregation function was mean of a column. So two dataframes are fetched from the database with same columns.
These two dataframes are joined on group by columns. Difference of metric column for both time periods are calculated.
Score is difference * z-score of score.
The 10^-6 difference is from the way Sqlite3 and MariaDB calculates mean.
All databases for each db type is created and inserted at the start of each test using same csv files as input.

Best Answer

I see the following alternatives:

Tailor your test data to avoid the problem (so no two score values differ by a value as small as 10^-6)
Write a special comparison function for the tests which tolerates the differences (for this, it will be better to fetch the values from the DB using the highest available precision, not by a cutted precision like 10^-5)
Provide different test assessment data for each DBMS (or at least each group of DBMS where you expect differences).
Store real numbers not as floating point, but in a database independent format (like a fixed point format, or a string type with a fixed number of decimals), and make sure all calculations are done with exactly the same internal precisions.

If option 1 is acceptable or not is a question of the acceptable overall risk and of the gory details of the business requirement and the scoring. There are systems where this is ok, and others, where it is not.

Option number 2 can be tricky, but that is probably what you would also do in manual testing. It may, however, come with a certain risk of overlooking certain kind of regressions.

Option 3 will require more maintenance effort for the assessment data - if it is manageable depends on the number of tests which would require such differentiated data, and the number of DBMS groups for which different data would become necessary.

Option 4 may be feasible for financial systems, where all the numbers involved may have a defined number of two or three decimals, and systems where the calculations are exclusively done at the client side, and none at the database side.

Related Solutions

Database – how to compare two large sets of data using PL/SQL

If there are no security restrictions that prevent that, probably the easiest way is to create a database link, so those two databases see each other; then use the minus clause to find what is in one table but not the other.

CREATE DATABASE LINK otherdb
  CONNECT TO otheruser IDENTIFIED BY otherpwd USING 'otherdb.wherever';

SELECT mycolumn FROM mytable MINUS SELECT mycolumn FROM mytable@otherdb;

SELECT mycolumn FROM mytable@otherdb MINUS SELECT mycolumn FROM mytable;

Copying the table from one machine to the other (using 'CREATE TABLE xxx AS SELECT * FROM mytable@otherdb') and using that for the comparison might be faster, though.

Database Design Data Structures – Should Database Table Structure Match Its Intended Data Structure?

As a database person I want to throw up at the idea of someone designing a database table that way.

Database tables should be designed using three main criteria:

Data integrity
Performance of querying the database
Data security

A good database design often needs fields that are not used by the application in any way (but may be needed for data imports or auditing or for some other reason) and in many databases, multiple applications will need to access the data and have differing needs.

What looks easy to do for data entry screens may not work at all for reporting (which is often a totally different application that the developer designing only for his logic doesn't even know about) because reports tend to deal with larger data sets than data entry screens which typically only concern one record at a time. So a database struture that looks logical from the application programmers perspective (Like an EAV table, shudder!) can be the worst possible thing to use when getting data back out of the database later. Especially if there will be millions of records.

Databases are optimized to use normalized tables and should be designed with normalization priciples in mind not using Object-oriented principles. Yes there is mismatch between what is best for the database and what is best for an object-oriented application. Yes you have to deal with that and no it isn't a good idea to deal with it by making the database match the orject-orientation.

Best Answer

Related Solutions

Database – how to compare two large sets of data using PL/SQL

Database Design Data Structures – Should Database Table Structure Match Its Intended Data Structure?

Related Topic