Mysql – thesql too many columns

database-designMySQLoptimizationsql

I'm creating a table with 30-50 columns. There are about 200K of these rows. Is it recommended to store this data in separate tables? Are there performance issues when you have this many columns.

I'll explain a bit about the table. I have to store all sports games over the past 10 years (basketball, baseball, football, hockey). For each of these, I need to keep additional data. Some of this data allows me to reuse fields across sports. For example, every team has a home and away team and a event date.

However, for each of these games I'm also storing things like how many first downs were acheived, how many strikeouts, and three pointers. Obviously, this data only relates to some of the rows in the table. I end up having a lot of NULL fields in each row as a result.

I can give more specifics if necessary. Thanks in advance for any general advice.

Best Answer

To elaborate on RichardOD's answer, you generally have three options when dealing with subtyping, and which you choose depends on what you need to do with the data in question.

The first option is the one you're currently using: keep all columns related to the different types in one table, with flags and nulls used to indicate which type a given record is. It is the simplest way to manage subtyping, and it generally works well when you only have a few types or if the different types aren't very different. In your case, it seems like the types can vary quite a bit.

The second option is to keep a central table that contain all of the common columns between the subtypes, and have one-to-one relationships with other tables that contains the type-specific details of those types.

The third option is to not think of the different types as subtypes at all and just keep all the types' records in separate tables. So you'd have no common table between the types that keeps the common data, and each table would have some columns that are repeated across tables.

Now, each option has its place. You'd use the first option when there aren't many differences between the different types. You'd use the second option if you need to manipulate the common fields independently of the type-specific fields; for example, if you wanted to list all sports games in a big grid with general information, and then let users click to see the type-specific details of that game. You'd use the third option when the types aren't really very related at all and you're just storing them together out of convenience; dissimilar schemas, even if it shares a few fields, shouldn't be merged.

So think about what you need to do with the data and how it fits into the three options and decide for yourself which is best. If you can't decide, update your question with the details about how you plan to use the data and I or someone else should be able to help you more.