My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.
The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.
I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.
Duration for 1,000 Nodes = 00:00:00:870
Duration for 10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for 100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100)
Duration for 1,000,000 Nodes = 'Didn't even try this'
And here's the duration for the new method (with the push stack method in parenthesis).
Duration for 1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for 10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for 100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)
Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.
You can read about the new method and get a copy of the code at the following URL.
http://www.sqlservercentral.com/articles/Hierarchy/94040/
I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article.
http://www.sqlservercentral.com/articles/T-SQL/94570/
If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.
Storing "Simple" Repeating Patterns
For my PHP/MySQL based calendar, I wanted to store repeating/recurring event information as efficiently as possibly. I didn't want to have a large number of rows, and I wanted to easily lookup all events that would take place on a specific date.
The method below is great at storing repeating information that occurs at regular intervals, such as every day, every n days, every week, every month every year, etc etc. This includes every Tuesday and Thursday type patterns as well, because they are stored separately as every week starting on a Tuesday and every week starting on a Thursday.
Assuming I have two tables, one called events
like this:
ID NAME
1 Sample Event
2 Another Event
And a table called events_meta
like this:
ID event_id meta_key meta_value
1 1 repeat_start 1299132000
2 1 repeat_interval_1 432000
With repeat_start being a date with no time as a unix timestamp, and repeat_interval an amount in seconds between intervals (432000 is 5 days).
repeat_interval_1 goes with repeat_start of the ID 1. So if I have an event that repeats every Tuesday and every Thursday, the repeat_interval would be 604800 (7 days), and there would be 2 repeat_starts and 2 repeat_intervals. The table would look like this:
ID event_id meta_key meta_value
1 1 repeat_start 1298959200 -- This is for the Tuesday repeat
2 1 repeat_interval_1 604800
3 1 repeat_start 1299132000 -- This is for the Thursday repeat
4 1 repeat_interval_3 604800
5 2 repeat_start 1299132000
6 2 repeat_interval_5 1 -- Using 1 as a value gives us an event that only happens once
Then, if you have a calendar that loops through every day, grabbing the events for the day it's at, the query would look like this:
SELECT EV.*
FROM `events` EV
RIGHT JOIN `events_meta` EM1 ON EM1.`event_id` = EV.`id`
RIGHT JOIN `events_meta` EM2 ON EM2.`meta_key` = CONCAT( 'repeat_interval_', EM1.`id` )
WHERE EM1.meta_key = 'repeat_start'
AND (
( CASE ( 1299132000 - EM1.`meta_value` )
WHEN 0
THEN 1
ELSE ( 1299132000 - EM1.`meta_value` )
END
) / EM2.`meta_value`
) = 1
LIMIT 0 , 30
Replacing {current_timestamp}
with the unix timestamp for the current date (Minus the time, so the hour, minute and second values would be set to 0).
Hopefully this will help somebody else too!
Storing "Complex" Repeating Patterns
This method is better suited for storing complex patterns such as
Event A repeats every month on the 3rd of the month starting on March 3, 2011
or
Event A repeats Friday of the 2nd week of the month starting on March 11, 2011
I'd recommend combining this with the above system for the most flexibility. The tables for this should like like:
ID NAME
1 Sample Event
2 Another Event
And a table called events_meta
like this:
ID event_id meta_key meta_value
1 1 repeat_start 1299132000 -- March 3rd, 2011
2 1 repeat_year_1 *
3 1 repeat_month_1 *
4 1 repeat_week_im_1 2
5 1 repeat_weekday_1 6
repeat_week_im
represents the week of the current month, which could be between 1 and 5 potentially. repeat_weekday
in the day of the week, 1-7.
Now assuming you are looping through the days/weeks to create a month view in your calendar, you could compose a query like this:
SELECT EV . *
FROM `events` AS EV
JOIN `events_meta` EM1 ON EM1.event_id = EV.id
AND EM1.meta_key = 'repeat_start'
LEFT JOIN `events_meta` EM2 ON EM2.meta_key = CONCAT( 'repeat_year_', EM1.id )
LEFT JOIN `events_meta` EM3 ON EM3.meta_key = CONCAT( 'repeat_month_', EM1.id )
LEFT JOIN `events_meta` EM4 ON EM4.meta_key = CONCAT( 'repeat_week_im_', EM1.id )
LEFT JOIN `events_meta` EM5 ON EM5.meta_key = CONCAT( 'repeat_weekday_', EM1.id )
WHERE (
EM2.meta_value =2011
OR EM2.meta_value = '*'
)
AND (
EM3.meta_value =4
OR EM3.meta_value = '*'
)
AND (
EM4.meta_value =2
OR EM4.meta_value = '*'
)
AND (
EM5.meta_value =6
OR EM5.meta_value = '*'
)
AND EM1.meta_value >= {current_timestamp}
LIMIT 0 , 30
This combined with the above method could be combined to cover most repeating/recurring event patterns. If I've missed anything please leave a comment.
Best Answer
The proper answer is really both, and not either or.
Setting aside for a moment the issue of no end date for recurrence: what you want is a header that contains recurrence rules for the whole pattern. That way if you need to change the pattern, you've captured that pattern in a single record that can be edited without risking update anomalies.
Now, joining against some kind of recurrence pattern in SQL is going to be a great big pain in the neck. Furthermore, what if your rules allow you to tweak (edit, or even delete) specific instances of this recurrence pattern?
How do you handle this? You have to create an instance table with one row per recurring instance with a link (foreign key) back to the single rule that was used to create it. This let's you modify an individual child without losing sight of where it came from in case you need to edit (or delete) the entire pattern.
Consider a calendaring tool like Outlook or Google Calendar. These applications use this approach. You can move or edit an instance. You can also change the whole series. The apps ask you which you mean to do whenever you go into an editing mode.
There are some limitations to this. For example, if you edit an instance and then edit the pattern, you need to have a rule that says either (a) new parent wins or (b) modified children always win. I think Outlook and Google Calendar use approach (a).
As for why having each instance recorded explicitly, the only disastrous thing I can think of would be that if you didn't have the link back to the original recurrence pattern you would have a heck of a time cancelling the whole series in one action.
Back to no end date - This might be a case of discretion being the better part of valour and using some kind of rule of thumb that imposes a practical limit on how far into the future you extend such a series - or alternatively you could just not allow that kind of rule in a pattern. Force an end to the pattern and let the rule's creator worry about extending it at whatever future point it becomes necessary.