MySQL: multiple tables or one table with many columns

database-designMySQL

So this is more of a design question.

I have one primary key (say the user's ID), and I have tons of information associated with that user.

Should I have multiple tables broken down into categories according to the information, or should I have just one table with many columns?

The way I used to do it was to have multiple tables, so say, one table for application usage data, one table for profile info, one table for back end tokens etc. to keep things looking organized.

Recently some one told me that it's better not to do it that way and having a table with lots of columns is fine. The thing is, all those columns have the same primary key.

I'm pretty new to database design so which approach is better and what are the pros and cons?

What's the conventional way of doing it?

Best Answer

Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.

If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).

You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.