Initialize array in amortized constant time — what is this trick called

algorithmsterminology

There is this data structure that trades performance of array access against the need to iterate over it when clearing it. You keep a generation counter with each entry, and also a global generation counter. The "clear" operation increases the generation counter. On each access, you compare local vs. global generation counters; if they differ, the value is treated as "clean".

This has come up in this answer on Stack Overflow recently, but I don't remember if this trick has an official name. Does it?

One use case is Dijkstra's algorithm if only a tiny subset of the nodes has to be relaxed, and if this has to be done repeatedly.

Best Answer

The aforementioned approach requires that each cell be able to hold a number large enough to hold the number of times the array may need to be reinitialized, which is a substantial space penalty. If a slot is capable of holding at least one value which will never be legitimately written, one may avoid having any other (non-constant) space penalty at the expense of adding an O(Wlg(N)) time penalty, where W is the number of distinct array slots written between clearing operations and N is the size of the array. For example, suppose one will be storing integers from -2,147,483,647 to 2,147,483,647 (but never -2,147,483,648) and one wants blank array items to read as zero. Start by filling the array with -2,147,483,648 (call that value B). When reading an array slot for the application, report a value of B as zero. Before writing array slot I, check whether it held B and if so and I is greater than one, store a zero to slot I/4 after performing a similar check for that location (and, if it held B, I/16, etc).

To clear the array, start with I equal to 0 or 1, depending upon the array base (the algorithm as described will work for either). Then repeat the following procedure: If item I is B, increment I and, if doing so yields a multiple of four, divide by four (terminate if the divide yields a value of 1); if item I is not B, store B there and multiply I by four (if I starts at zero, multiplying by four will leave it zero, but since item 0 will be blank, I will get incremented).

Note that one could replace the constant "four" above with other numbers, with larger values generally requiring less work tagging, but smaller values generally requiring less work clearing; since array slots that are tagged have to be cleared, a value of three or four is almost certainly optimal; since the value four is certainly close to optimal, is better than two or eight, and is more convenient than any other number, it would seem the most reasonable choice.

Related Topic