PHP and MySQL – Identifying a Race Condition

MySQLPHP

I have a friend arguing this is a race condition, but I'm struggling to understand how.

This is for a web application using PHP and MySQL where users can login and create tickets, these tickets have tasks which users can add. Each task has a time spend variable.

When the user submits a task, it adds a row with the task data to a table. After the entry has been added, it calculates the total time spent for all tasks for this particular ticket, and updates a field in a different table with that value.

The argument my friend puts forth is that with many users, there will be race condition issues as users are simultaneously adding tasks for tickets.

But my defence is that as long as the update for total time spent is run after the insert query, it will still update the field correctly regardless of how many users are adding tasks at the same time.

Best Answer

If you don't use transactions, then yes, there is a minor race condition that can result in having the incorrect value for the total time spent for a brief while. Consider for example the following timing of events:

[task 1 added]
[total selected]
                [task 2 added]
[total updated]
                [total selected]
                [total updated]

Now the client 1 selects the total amount, but a new task is added after this and for a brief while, the total contains an incorrect value. However, soon after the client 2 will update the total to the correct value.

Can the tasks be deleted in addition to adding them? If so, the situation might become more complex and requires further analysis.

You should really use transactions in this use case. This ensures that the updating of the total and the addition of the task happen simultaneously when viewed from outside of the transaction. Using a transaction with a proper isolation level eliminates the minor race condition.

You should note, however, that while transactions help in this use case there are various transaction isolation levels and all potential race conditions in don't disappear unless your database truly supports the SERIALIZABLE isolation level and you are actually using it. Many databases have the default isolation level set to something else than SERIALIZABLE and do not actually support the true SERIALIZABLE isolation level while they silently accept it and offer something worse. To read about transaction isolation levels, see e.g. http://www.postgresql.org/docs/9.1/static/transaction-iso.html (this is for PostgreSQL, not MySQL)

For an example of the case when the race condition can occur despite using transactions, see this timing:

[transaction 1 begins]
                       [transaction 2 begins]
[task 1 added]
                       [task 2 added]
[total selected]
                       [total selected, doesn't yet see transaction 1]
[total updated]
                       [total updated, waits for lock of transaction 1]
[transaction 1 commits]
                       [transaction 2 commits]

Now transaction 2 updates the total but doesn't yet see the row added by transaction 1, and thus the total ends up containing only the value of task 2 but not the value of task 1. This happens if you're not using the SERIALIZABLE isolation level. If, on the other hand, you are using the SERIALIZABLE isolation level on recent PostgreSQL, one of the transactions would fail due to serialization failure and you would need to be prepared to retry.

To fix this without using SERIALIZABLE, you need to select a locking row for update (the SQL command SELECT ... FOR UPDATE):

[transaction 1 begins]
                       [transaction 2 begins]
[select for update]
                       [select for update, waits for transaction 1]
[task 1 added]
[total selected]
[total updated]
[transaction 1 commits]
                       [select for update wait done]
                       [task 2 added]
                       [total selected, doesn't yet see transaction 1]
                       [total updated, waits for lock of transaction 1]
                       [transaction 2 commits]

I'm not sure if MySQL supports the true SERIALIZABLE isolation level or if it's just snapshot isolation. There was at least some work put into it by Michael Cahill, but I'm not sure if that work got ever merged to the MySQL distribution.

Do note that for transactions to work properly, you need to use a transactional database engine. I.e. InnoDB instead of MyISAM.

Related Solutions

PHP MySQL – How to Approach Complex Queries

I come across this kind of requirement all the time. Let's say you have a date range you want to check (ie, find free units for) which is defined with a start and an end date. In the database, for the sake of making it clear what we're talking about, lets say existing reservations have a first and a last date (of the reservation).

So there's six possible cases for any given reservation (|xxxx|) when compared to the date range you're checking ([----]):

[--|xxxx|--] It can be entirely contained within the check range. This means the date range is not available for this unit.
|xx[xx|----] It can straddle the start of the check range. Again, this unit is not available.
[----|xx]xx| It can straddle the end of the check range. Not available.
|xx[----]xx| It can completely contain the check range. Again, not available.
|xxxx|[----] It can preceed the check range. This unit may be available!
[----]|xxxx| It can entirely follow the check range. This unit may be available!

You'll notice that in all the cases where the unit is not available, the first date is always before* the end date, AND the last date is always after* the start date. This is not true for the two cases where the ranges don't overlap. So! We have a distinguishing condition.

(* or equal to)

In this situation, units that are in use will have at least one existing reservation where first <= 'end' AND last >= 'start'. MySQL has a bunch of really nice grouping and aggregation functions we can use here to actually get MySQL to do all the heavy lifting for us. For instance, we could do something like this:

SELECT Units.*, SUM(IF(ReservationID IS NULL, 0, First <= 'end' AND Last >= 'start')) AS ConflictingReservations
FROM Units LEFT JOIN Reservations USING (UnitID)
GROUP BY UnitID
HAVING ConflictingReservations = 0

In the above query (which I admit I haven't actually tested cough, but the idea is sound, honest!) we left join the Units table to the Reservations table so that if a unit has never been reserved it'll still show in the results. From there, we group the rows by the UnitID and create a derived column called "ConflictingReservations" which contains the sum of a logical comparison. As mentioned previously, if that logical comparison is TRUE (which in MySQL has the value '1') it means the specific reservation we're checking overlaps the check range. Summing all the "hits" will give us the number of reservations for a particular unit that overlap with the check range.

Note: we guard the logical comparison against the case where a unit has NEVER been booked before and therefore the left join returns a row with a NULL values in the First and Last columns. If the ReservationID is NULL (and therefore the First and Last dates would be null, forcing the entire ConflictingReservations derived column to NULL rather than 0) we count that as a non-match/false, ie: 0.

The final step says "throw away all rows (ie, units) that have a nonzero number of conflicting reservations." This should leave you with a list of rows from the Units table that have no reservations within the check range (ie, that are available to book in that range.)

If I've managed to get edge cases wrong (as I did with the zero reservations case) or syntax wrong etc, let me know and I'll edit the answer!

Php – Algorithm for tracking progress of controller method running in background

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level.

(100/Total Pages) * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Select count() from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. Do not start crawling your sub_links until all the master_links are crawled - this will also allow you to avoid duplicate url crawls and should have a minimal impact on the total time.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Best Answer

Related Solutions

PHP MySQL – How to Approach Complex Queries

Php – Algorithm for tracking progress of controller method running in background

Related Topic