Database Design – Why Using MySQL for a Dictionary Website Is a Bad Idea

data structuresdatabase-designmongodbMySQLrelational-database

I'm planning to design and set up a database to store dictionary entries (usually single words) and their meaning in another language. So, for example, the table Glossary must have entry and definition and each table record has a reference to the id of a record stored in Tag (Each entry must have a tag or category).

Since my data has a structure, I thought using a SQL database (like MySQL) is not a bad idea; but people say MongoDB is much better for performance.

At the client side, the application must be able to provide a search box with autocomplete which consumes a REST API provided by the backend. Is it safe to go with MySQL in such a scenario? or should I use MongoDB or ElasticSearch of any other solution for this? Hundred thousands of records are supposed to be stored and accessed in this way.

Best Answer

I can't tell you why it's a bad idea. I can tell you a bunch of reasons why a relational database is a good idea though.

  1. Remember that not everyone consults a dictionary for a definition. More times than not, a dictionary is used to find the correct spelling. This means you're not just finding a needle in a haystack, you are searching the haystack for needles that are similar to the one described by the user (if I may use an idiom).

    You won't just be doing primary key look-ups. You'll be doing keyword searches

  2. Words can be related, either in meaning or spelling (read, read, red and reed)

    Whenever you see the word "related" think "Relational Database"

  3. If you need speed, you need caching on top of your relational database, not a broken relational data model

  4. A properly normalized database speeds up primary key look-ups and searches since there is simply fewer bits to sift through.

  5. The people who say normalized databases are slower are referring to the 0.1% of cases where this is true. In the other 99.9% of cases they haven't actually worked with a truly normalized database to see the performance first hand, so ignore them. I have worked with a normalized database. Love it. Don't want to go back. And I'm not a database guy. I'm a C#/JavaScript/HTML/Ruby guy.

  6. Words have an origin. In fact, many words in the same language can have the same origin, which is another word in a different language. For instance, résumé (the thing we upload to recruiters websites so we can get incessant phone calls and e-mails for the next 7 years) is a French word.

  7. A dictionary also defines what kind of word it is (noun, verb, adjective ect). This isn't just a piece of text: "noun" it has meaning as well. Plus with a relational database you can say things like "give me all the nouns for the English language" and since a normalized database will be utilizing foreign keys, and foreign keys have (or should have) indexes, the lookup will be a snap.

  8. Think of how words are pronounced. In English especially, lots of words have the same pronunciation (see my example above with read and reed, or read and red).

    The pronunciation of a word is, itself, another word. A relational database would allow you to use foreign keys to any pronunciations. That information won't be duplicated in a relational database. It gets duplicated like crazy in a no-SQL database.

  9. And now let's talk about plural and singular versions of words. :) Think "boat" and "boats". Or the very fact that a word is "singular" or "plural".

  10. Oh! And now let's talk about past tense, present tense, future tense and present participle (to be honest, I don't know what the crap "present participle" is. I think it has something to do with words ending in "ing" in English or something).

    Look up "run" and you should see the other tenses: ran, runs, running

    In fact, "tense" is another relationship itself.

  11. English doesn't do this so much, but gender is another thing that defines a word. Languages like Spanish have suffixes the define whether the subject of the noun is male or female. If you need to fill in the blanks for a sentence, gender is extremely important in many languages.

    Since you can't always rely on language conventions to determine gender (in Spanish, words ending in "o" are masculine/male, but that's not true for all words), you need an identifying value: Male or Female. This is another relationship that a normalized database handles gracefully even at millions of records.

With all the twisted rules and relationships between words, and even different languages, it's hard for me to imagine this data store as a "document store" like a no-SQL solution provides. There are so many and such a large variety of relationships between words and their components that a relational database is the only sensible solution.

Related Topic