Database – single table w/ extra columns vs multiple tables which duplicate schema

databasedatabase-designrelational-database

I am working on a project where at some point, I needed to make a decision on whether or not, in the database, I should have a single table with multiple columns that not every record uses, or multiple tables with duplicated schema.

I am creating a Sports information application which can handle multiple sports. We can handle NBA,NHL,MLB,NFL for example. Each sport has very similar concepts – Teams, Schedules, Injuries, Player info..

Our data source of course does not give us each piece of data in the same schema. Each sport has a different Schema to which we receive data from our vendor.

Because there wasn't enough time (client demands) to do upfront analysis of the data feeds to determine commonalities, I hedged my bet and took the 'safe bet' and made individual separate tables for each sport instead of one set of tables that all sports used.

The result is duplicated schema in several of the tables, and so duplicated interfaces to the database (e.g. stored procs) as well. I have something like NBA_Game, NFL_Game, NBA_Team,NFL_Team, etc.. each table might have a few properties that the other one doesn't, and several that are shared. it goes on for say 5-10 tables across 4 or 5 sports. I'm still not sure if this is entirely a bad thing – the alternative, having a single set of tables which had properties on it that not all sports would use, might have on its own been unwieldy as well.

Has anyone that has done this run into pitfalls of this kind of design and could share their experience here? Things that might help me to know now instead of learning the hard way down the road? Have you done it the other way, having one big table/set of tables, with columns that not every record would use? What pitfalls did you run into doing that?

Is there some alternative such as table inheritance that you have used in the past that worked better?

Thanks

Best Answer

Ultimately, it comes down to use and architecture.

Architecture

Does the system handle "any sport"? Is the idea that you put on your architecture astronaut hat, and build a generic system that can handle any future type of sport that may not even exist today?

If so, obviously having dynamically-named tables is a huge pain, so it would make sense to have a schema that supports n sports, if required.

That said, I have a very strong bias against this approach: this is almost always more work and leads to poorer results. Making a separate UI, schema, etc, for each sport will ultimately lead to better user experience and easier to maintain code, even if it does mean some superficial amount of duplication (how to avoid/minimize this is a separate question).

How do you handle players that play multiple sports? Do they get two entries (eg, you treat as different people) or are you trying to do something specific with them?

Use

So let's assume you don't do sports dynamically (eg, if someone wants to add a new sport, it requires development effort to add it).

Is there ever a time where you're displaying players (or any other object you mentioned) from more than one sport at a time?

I could see this for a search function, where you could search by player or team name (regardless of sport), but beyond that I can't imagine many use cases.

If you never need to do this, then your approach is perfectly fine. You can stop reading here.

Alternative Schemas

Views

I'm a fan of KISS. In 15+ years of software development, I continue to fall back to the philosophy "build the simplest thing that works".

So my initial reaction, assuming a cross-sport search function is really the only use case, is to create views:

SELECT PlayerName, 'NFL' as [Sport], TeamName FROM NFL_Players JOIN NFL_Teams ... 
UNION  
SELECT PlayerName, 'NHL' as [Sport], TeamName FROM NHL_Players JOIN NHL_Teams ... 
UNION ....

Of course, if you add a new sport, you have to add to the view. It may also be useful to include other common information but really that's dependent on what needs to get shown.

I'd try to keep all the sport-specific stuff in the View definition, so the search code doesn't need to have much or any specific code (besides maybe knowing how to link to /nhl/players/player-name vs /nfl/... or however your app does that).

Table Inheritence

Table inheritance can work, but is pretty complex. I don't have a ton of experience with it, and in fact, I think every time I've been involved in evaluating it we ended up doing something simpler (like I'm suggesting here).

So personally, I have yet to find why this would be useful, but maybe there is a convincing use case (that I don't know of) that justifies the complexity (eg, Table inheritance solves the use case better than any other solution).

Separate tables for sport-specific attributes

You could do a single players table which has attributes common to all players of all sports, and then another set of tables like nhl_players_details that contains a playerId and columns with additional info on the player. If there are a ton of common attributes, or you have many uses of "all players from all sports" then this may make sense.

Key value pairs for sport-specific attributes

Completely alternative approach: have a players table (again, with common attributes like name) and then a player_data table that has PlayerId, Sport, Attribute, Value. The attribute names entered would be sport-specific. This allows you to essentially add new attributes without modifying the schema (your code would still have to know to load/display them of course). The drawback is you lose some integrity: value would typically be a string field, so your app code would have to be resilient and handle potential failures converting the string value to a specific data type (like integer).

This concept can of course apply to Teams, Games, etc.

Related Topic