Storing Multi-select and Single-select data together

database-designrelational-database

I've got a database with some multi-select fields. To keep it simple, let's just say those fields are MatchGender and Interests. I want users to be able to find people who match (hence Gender is a multi-select; they can choose to match against one or both genders) and interests/hobbies. Along with those fields, there are several "single-select" fields such as the user's city, state, their gender, etc…

I'm trying to figure out the best way to store this data so it's easily searchable. Do I store the multi-select data in a single field that is comma delimited? Do I store it in a separate table with a UserID? Pros/Cons of each?

By the way, this is a SQL Server back-end database that will be accessed through an ASP web interface.

Best Answer

Make an Interest table, and then another table (UserInterest) holding the foreign keys of both User and Interest

What you're describing is a many-to-many relationship between User and Interest. If you go through the process of database normalization for your data model you will see that holding a comma-separated list of values fails on two points:

  1. Each column should hold only one data item, but a comma-separated list is multiple items.
  2. Each Interest is a repeated data item in User. If you needed to change something about it (maybe correcting a spelling error, or changing the name to something more descriptive), then if you hold the Interests as strings in the User table, you'd need to run a massive update statement on User.

The combination of holding everything in User and holding comma-separated lists compounds the second issue; if you did need to change the name of an Interest, you wouldn't be able to run a very simple update - you'd have to do some sort of string manipulation to find every instance of an Interest and modify it.

With the structure I have suggested, if you ever need to change the name of an Interest, you will update one row in the Interest table.

Also, this way means you'll never need to do string manipulation to match up interests between two people. If you had the interests of Person A as "fishing, baking, paper craft" and Person B as "tobogganing, paper craft, fishing" you'd have to do some unpleasant data-wrangling to realise they shared two-thirds of their interests. Instead, you can just do a set of joins - and with aggregation functions in SQL you could even do things like figuring out the number of matched interests, percentage of matched interests, etc.

Finally, you'll thank yourself later if you ever need to build any reporting, business intelligence, or other down-stream systems over this data. Or you'll remain in the good books of whoever it is who does need to build those things.

Related Topic