My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.
The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.
I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.
Duration for 1,000 Nodes = 00:00:00:870
Duration for 10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for 100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100)
Duration for 1,000,000 Nodes = 'Didn't even try this'
And here's the duration for the new method (with the push stack method in parenthesis).
Duration for 1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for 10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for 100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)
Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.
You can read about the new method and get a copy of the code at the following URL.
http://www.sqlservercentral.com/articles/Hierarchy/94040/
I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article.
http://www.sqlservercentral.com/articles/T-SQL/94570/
If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.
Looks like you need a database model similar to this:
This model has the following properties:
- Essentially, each recipe is a series of steps.
- Each step has its order relative to other steps of the same recipe (STEP_NO), a unit (mass, volume, count...), a quantity in that unit etc.
- A particular step is connected either to an ingredient (when INGREDIENT_ID is non-NULL) or to another recipe (when SUBRECIPE_ID is non-NULL).1
- Other than that, the STEP is a fairly standard junction table implementing many-to-many relationship, which means the same ingredient can be used in multiple recipes (or even multiple steps of the same recipe) and also a recipe can be a "sub-recipe" of multiple other recipes.
- This is essentially a directed graph. The data model itself will not prevent cycles - they should be avoided at the client code level and possibly detected by triggers.
1 If MySQL supported CHECK constraints (which it doesn't), you could ensure that one (but not both) of them is non-NULL like this:
CHECK (
(INGREDIENT_ID IS NULL AND SUBRECIPE_ID IS NOT NULL)
OR (INGREDIENT_ID IS NOT NULL AND SUBRECIPE_ID IS NULL)
)
As it stands, you'll need a trigger for that.
Best Answer
Here is a link to a pretty advanced one:
but if you really want to code it yourself I would go with a third relational table.
cheers, Mike