MongoDB – Is This a Good Schema Design?

I'm working on a Meteor app which lets users create events and assign them crew members. I've played around with MongoDB before and made some experiences along the way, where for instance my first attempt went overboard with embedded objects and quickly became sluggish and hard to maintain. Several people therefore suggested to try a more traditional design approach and that's what I've done this time.

I've split my collections up with manual refs in mind,

Events

_id : ObjectID,
propA: "Some prop",
propB: "Some other prop",
employees: [
    {employee_id: ObjectID, crew_id: ObjectID},
    {employee_id: ObjectID, crew_id: ObjectID},
]

Employees

_id : ObjectID,
propA: "Some prop",
crew_id: ObjectID

Crews

_id : ObjectID,
propA: "Some prop"

The reasoning is that the events collection will become quite substantial during the applications life cycle and thus I consider duplication to be a bad thing.
Another noteworthy feature is that an employee must belong to a certain crew, but on an actual event can work in any crew.

Intially I was quite happy with this design, but I have however found some problems which has lead me here.

1) When listing an event I have to manually join refs in order to show the actual properties of the employees and crews (such as name), it's a bit tedious but what worries me most is performance: Rather then running one query I have to run three (first get the event, then loop trough the employee array and then join employee with the Employee collection and finally join the crew with the Crew collection).

2) My biggest problem so far though is what I believe is known as deep querying. Before adding an employee to an event I have to know if he/she is already assigned, thus requiring to query the Events collection for an Event with a given ID and also find check if the given Employee ID exists in the employee array. So far I've been unable to do this in a single query.

So my question boils down to this. Is my design fine as it is or should I change it given my use cases? Or alternatively, if I keep my current design how should I tackle problem 1 and 2 (that is, manual joins and refs are fine. But is performance a real possible showstopper down the line?).

Best Answer

Why are you using a non-relational database in this case?

You said:

When listing an event I have to manually join refs in order to show the actual properties of the employees and crews (such as name), it's a bit tedious but what worries me most is performance:

and

My biggest problem so far though is what I believe is known as deep querying

[emphasis added]

You already have some strong indicators that a non-relational database is not the right fit for your needs. Use a relational database instead. Changing the schema in this case, won't compensate for issues within the toolset.

Try your design out with a relational database and see if the issues go away. That will provide the proof you need that you haven't picked the right tool to use in this case.

In a comment, you added that your stuck with this particular environment and are seeking alternative schema designs.

From what I have seen, the best way to compensate in this case is to denormalize all of your tables. Instead of 3 tables, you'll have 1 table where you lay out the crew and the employees per event.

That does raise some challenges with making sure you don't double assign employees, but having a status flag for the event should help control for that. Essentially you'll query to see if employee ABC is already assigned to an event (or crew) that is still active.

Best Answer

Related Solutions

Related Topic