MongoDB Schema – Recommended Schema for a Quiz-Engine Scenario

document-databasesmongodbnosql

I'm working on a quiz engine for learning a foreign language. The engine shows users four images simultaneously and then plays an audio file. The user has to match the audio to the correct image. Below is my MongoDB document structure. Each document consists of an image file reference and an array of references to audio files that match that image. To generate a quiz instance I select four documents at random, show the images and then play one audio file from the four documents at random.

The next step in my application development is to decide on the best document schema for storing user guesses. There are several requirements to consider:

I need to be able to report statistics at a user level. For example, total correct answers, total guesses, mean accuracy, etc)
I need to be able to query images based on the user's learning progress. For example, select 4 documents where guess count is >10 and accuracy is <=0.50.
The schema needs to be optimized for fast quiz generation.
The schema must not cause future scaling issues vis a vis document size. Assume 1mm users who make an average of 1000 guesses.

Given all of this as background information, what would be the recommended schema? For example, would you store each guess in the Image document or perhaps in a User document (not shown) or a new document collection created for logging guesses? Would you recommend logging the raw guess data or would you pre-compute statistics by incrementing counters within the relevant document?

Schema for Image Collection:

_id "505bcc7a45c978be24000005"      

date    2012-09-21 02:10:02 UTC     
imageFileName   "BD3E134A-C7B3-4405-9004-ED573DF477FE-29879-0000395CF1091601"       
random  0.26997075392864645     
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioFiles  
[
0   
{
audioFileName   "C3669719-9F0A-4EB5-A791-2C00486665ED-30305-000039A3FDA7DCD2"
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioLanguage   "English"
date    2012-09-22 01:15:04 UTC
}
1   
{
audioFileName   "C3669719-9F0A-4EB5-A791-2C00486665ED-30305-000039A3FDA7DCD2"
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioLanguage   "Spanish"
date    2012-09-22 01:17:04 UTC
}
]

Best Answer

I would store each guess as a separate document in a user guesses collection. The structure of each document would be as follows:

Guess
- userId
- selectedImageId
- correctImageId

You may also store more information about the corresponding quick, such as the shown images and played audio. To generate statistics, you will need to run Map/Reduce over this collection. For example, to get stats of total and correct guesses, your Map/Reduce output document structure would be:

UserStats
- userId
- guesses
- correct

The map function may look like this:

function() {
    emit(this.userId, { guesses: 1, correct: this.selectedImageId === this.correctImageId ? 1 : 0 });
}

And the reduce function:

function(key, values) {
  var result = { userId = key, guesses: 0, correct: 0 };
  values.forEach(function(value) {
    result.guesses += value.guesses;
    result.correct += value.correct;   
  });
  return result;
}

Note, on MongoDB, Map/Reduce is not run automatically as source collections are updated, so you will need to implement that on your own.

Related Solutions

Database – Why is NoSQL better for this scenario

What advantages would I get from using NoSQL?

NoSQL will scale better as the number of users grows.

Traditional RDBMS don't really scale well. All that you can do is throw bigger machines at the problem. They aren't really suited for distributed systems (cloud e.g.).

NoSQL is (under given circumstances) better at handling hierarchical structures like documents/JSON.

The key point to understand is that these storage mechanisms are key-value based and thus can retrieved data that is stored together very fast, as opposed to data that is "merely related" (what RDBMS were built for).

In your case that would mean, that you can easily retrieve all records for a certain user very fast for example. In traditional relational databases you would either have to denormalize your schema for performance or keep the schema clean but potentially suffer performance penalties caused by joins or heavy aggregations.

Look at it this way: Why is a hash map (key value store) fast? You can retrieve items from a hashmap in almost O(1) as the hash directly translates to a memory address (simplified). Looking up a binary index in contrast to that would yield O(log(n));

For your case, MongoDB or CouchDB might be good solutions, as it's already based on JSON.

In my opinion, using a NoSQL solution here is a good choice. You want to retrieve all the activities of a user as a feed. If they're properly written to your data storage, then NoSQL should, in theory, excell at this, without the need for joining anything or worrying about proper indexes. @Earlz also mentioned that you have no ACID guarantee for NoSQL databases. This makes NoSQL fast and you probably don't need ACID properties for your application. Give it a try!

Moreover, there's a good article from Martin Fowler on the subject. He's made a nice diagram that I really like:

enter image description here

Go check out his pages to read some deep thoughts about NoSQL.

A good pattern for multi language in MongoDb

I use the following pattern for text that should be indexed in all the languages:

{
"id":"sdsd"
"title":{"languages":{"en":0,"fr":1},"texts":["this is the title in inglish","Celui ci c'est le titre en francais"]}
}

object = coleccion.find("id='xxxxx'");

// now if want the text in English

print(object.title.texts[object.title.languages["en"]])

// the use of objecttitle.languages index array it to improve performance in client accessing a determined text

// that allows us to add indexes to your translated texts on mongo, as

ensureIndex({title.texts})

We can also wrap the code to obtain text in an specific language in a Class.

Best Answer

Related Solutions

Database – Why is NoSQL better for this scenario

A good pattern for multi language in MongoDb

Related Topic