MongoDB Schema – Recommended Schema for a Quiz-Engine Scenario

document-databasesmongodbnosql

I'm working on a quiz engine for learning a foreign language. The engine shows users four images simultaneously and then plays an audio file. The user has to match the audio to the correct image. Below is my MongoDB document structure. Each document consists of an image file reference and an array of references to audio files that match that image. To generate a quiz instance I select four documents at random, show the images and then play one audio file from the four documents at random.

The next step in my application development is to decide on the best document schema for storing user guesses. There are several requirements to consider:

  1. I need to be able to report statistics at a user level. For example, total correct answers, total guesses, mean accuracy, etc)
  2. I need to be able to query images based on the user's learning progress. For example, select 4 documents where guess count is >10 and accuracy is <=0.50.
  3. The schema needs to be optimized for fast quiz generation.
  4. The schema must not cause future scaling issues vis a vis document size. Assume 1mm users who make an average of 1000 guesses.

Given all of this as background information, what would be the recommended schema? For example, would you store each guess in the Image document or perhaps in a User document (not shown) or a new document collection created for logging guesses? Would you recommend logging the raw guess data or would you pre-compute statistics by incrementing counters within the relevant document?


Schema for Image Collection:

_id "505bcc7a45c978be24000005"      

date    2012-09-21 02:10:02 UTC     
imageFileName   "BD3E134A-C7B3-4405-9004-ED573DF477FE-29879-0000395CF1091601"       
random  0.26997075392864645     
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioFiles  
[
0   
{
audioFileName   "C3669719-9F0A-4EB5-A791-2C00486665ED-30305-000039A3FDA7DCD2"
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioLanguage   "English"
date    2012-09-22 01:15:04 UTC
}
1   
{
audioFileName   "C3669719-9F0A-4EB5-A791-2C00486665ED-30305-000039A3FDA7DCD2"
user    "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF"
audioLanguage   "Spanish"
date    2012-09-22 01:17:04 UTC
}
]

Best Answer

I would store each guess as a separate document in a user guesses collection. The structure of each document would be as follows:

Guess
- userId
- selectedImageId
- correctImageId

You may also store more information about the corresponding quick, such as the shown images and played audio. To generate statistics, you will need to run Map/Reduce over this collection. For example, to get stats of total and correct guesses, your Map/Reduce output document structure would be:

UserStats
- userId
- guesses
- correct

The map function may look like this:

function() {
    emit(this.userId, { guesses: 1, correct: this.selectedImageId === this.correctImageId ? 1 : 0 });
}

And the reduce function:

function(key, values) {
  var result = { userId = key, guesses: 0, correct: 0 };
  values.forEach(function(value) {
    result.guesses += value.guesses;
    result.correct += value.correct;   
  });
  return result;
}

Note, on MongoDB, Map/Reduce is not run automatically as source collections are updated, so you will need to implement that on your own.

Related Topic