Google Drive
If you're trying to make your own version of google docs, I suggest you take a look at the Google Realtime API. Google recently released this with the intent of allowing other developers to use the same tools they did to allow for realtime collaboration. This would allow you to save time on your development and get a working product sooner.
You could easily take the data that is in the document and push it into your database at regular intervals, or have the database itself be a 'participant' of the exchange, simply listening to and logging all the changes. It also allows for a user to define their own data structures which are then usable in the realtime API, so you are free to extend it as you see fit.
Non-Google Drive
So according to your research, Google Drive isn't an option. That's fine, but it's going to be tougher and possibly not work as well, depending on how much you put into it.
Here's a general strategy I would use to accomplish this problem:
Have the server be the communication multiplexer. Each person talks to the server, and the server sends out that information to everyone else. This way the server always has the most up to date view of the document.
Find a third party algorithm/module for conflict resolution. Conflict resolution is tough, and is something that still isn't perfect. Doing this alone could easily increase the scope of the project to be far too large. If you can't use a third party algorithm, I would suggest that you only allow one user to edit an area of a time, so that the user must obtain a lock before editing an area, or you risk destroying another users work, which will get very old, very fast.
When a new user joins, give them the most recent document and automatically start streaming the commands to them. The server has the most recent view and thus can dish it out automatically.
Backup to the database at certain intervals. Decide how often you want to back up (every 5 minutes or maybe every 50 changes.) This allows you to maintain the backup you desire.
Problems: This isn't a perfect solution, so here are some issues you might face.
Throughput of the server could bottleneck performance
Too many people reading/writing could overload the server
People may become out of sync if a message is lost, so you may want to make sure you synchronize at regular points. This means sending out the whole message again, which can be costly, but otherwise people might not have the same document and not know it.
I'm going to use Python-like pseudocode based around your test cases to (hopefully) help explain this answer a little, rather than just expositing. But in short: This answer is basically just an example of small, single-function-level dependency injection / dependency inversion, instead of applying the principle to an entire class/object.
Right now, it sounds like your tests are structured as such:
def test_3_minute_confirm():
req = make_request()
sleep(3 * 60) # Baaaad
assertTrue(confirm_request(req))
With implementations something along the lines of:
def make_request():
return { 'request_time': datetime.now(), }
def confirm_request(req):
return req['request_time'] >= (datetime.now() - timedelta(minutes=5))
Instead, try forgetting about "now" and tell both functions what time to use. If it's most of the time going to be "now", you can do one of two main things to keep the calling code simple:
- Make time an optional argument that defaults to "now". In Python (and PHP, now that I reread the question) this is simple - default "time" to
None
(null
in PHP) and set it to "now" if no value was given.
- In other languages without default arguments (or in PHP if it becomes unwieldy), it might be easier to pull the guts of the functions out into a helper that does get tested.
For example, the two functions above rewritten in the second version might look like this:
def make_request():
return make_request_at(datetime.now())
def make_request_at(when):
return { 'request_time': when, }
def confirm_request(req):
return confirm_request_at(req, datetime.now())
def confirm_request_at(req, check_time):
return req['request_time'] >= (check_time - timedelta(minutes=5))
This way, existing parts of the code don't need to be modified to pass in "now", while the parts you actually want to test can be tested much more easily:
def test_3_minute_confirm():
current_time = datetime.now()
later_time = current_time + timedelta(minutes=3)
req = make_request_at(current_time)
assertTrue(confirm_request_at(req, later_time))
It also creates additional flexibility in the future, so less needs to change if the important parts need to be reused. For two examples that pop into mind: A queue where requests might need to be created moments too late but still use the correct time, or confirmations have to happen to past requests as part of data sanity checks.
Technically, thinking ahead like this might violate YAGNI, but we're not actually building for those theoretical cases - this change would just be for easier testing, that just happens to support those theoretical cases.
Is there any reason you would recommend one of these solutions over another?
Personally, I try to avoid any sort of mocking as much as possible, and prefer pure functions as above. This way, the code being tested is identical to the code being run outside of tests, instead of having important parts replaced with mocks.
We've had one or two regressions that no one noticed for a good month because that particular part of the code was not often tested in development (we weren't actively working on it), was released slightly broken (so there were no bug reports), and the mocks used were hiding a bug in the interaction between helper functions. A few slight modifications so mocks weren't necessary and a whole lot more could be easily tested, including fixing the tests around that regression.
Change the system date and time for the test, using exec(). This can work out if you are not afraid to mess up with other things on the server.
This is the one that should be avoided at all costs for unit testing. There's really no way of knowing what kinds of inconsistencies or race conditions might be introduced to cause intermittent failures (for unrelated applications, or co-workers on the same development box), and a failed/errored test might not set the clock back correctly.
Best Answer
This is an old idea, that has popped up an unbelievable number of times over the last several decades. LabView (I think that's the name) is the best-known current version.
I worked with one such system, called RIPPEN, from Orincon (as I recall), around 1998-1999. (Orincon was bought by Lockheed-Martin in 2003. The product appears to be dead and gone now.) The biggest drawback I noticed was that about 20% of my CPU time went to moving data around between processing blocks. Considering that I was doing significant numbercrunching (coordinate transformations, from az/el/range to X/Y/Z and then applying a coordinate system rotation: BUNCHES of trig and matrix/vector multiplication, 20% is a LOT of CPU time.
Around this time, I went to a function at my old university, and mentioned the product to an old friend, who was a grad student when I was there and is now a full professor. His answer was that he'd done a similar thing quite a number of years earlier.