Distance calculation and data normalization

database-designnormalization

I'm designing a system which needs to report basic distance between zip codes. It will use the Google Distance Matrix API but I want to cache the results in the database so that multiple requests for the same data points don't result in duplicate API calls.

A basic Distance class might look like this:

YUML Distance thingy

A basic data structure for the cache might look like this:

YUML DB table

Since I'm not using fine-grained detail concerned with changes due to one-way streets or divided highways, interchanges, etc., it therefore doesn't matter to me whether I'm getting the data from point A to point B or from point B to point A. But I'm not sure how to express this with full normalization in the database. With this primary key, it would be perfectly legal for the same distance to exist in two separate row, i.e.

var row1 = new Distance("00001", "00002");
var row2 = new Distance("00002", "00001");

It would probably be a good idea for me to require a SortedList parameter in the constructor, but is there a way I can design it from the database side to enforce full normalization?

Best Answer

If the location is being represented by a zipcode (fine for the cases where most distance estimates are within a tolerance for the same zipcode), then I'd impose a CHECK constraint on the fields Zip1 and Zip2 that Zip1 =< Zip2.

Of course this imposes a corresponding burden on the middleware logic to understand that the convention to assure only one record is putting the smaller zipcode in Zip1. This is mainly a concern for INSERTing rows, as queries can be agnostic about which field is first, i.e. WHERE (Zip1 = myZip1 and Zip2 = myZip2) OR (Zip1 = myZip2 and Zip2 = myZip1).

However normalization is not violated by storing both orders in the compound key. It uses something less than twice the number of records, but these look to be fairly small rows, so I'm not sure you can justify the logic complexity on efficiency grounds (least of all in terms of speed).

Best Answer

Related Solutions

Is Normalization Enough for Data Integrity and Quality Reporting?

Database – One wide table or multiple themed tables

Related Topic