There is this data structure that trades performance of array access against the need to iterate over it when clearing it. You keep a generation counter with each entry, and also a global generation counter. The "clear" operation increases the generation counter. On each access, you compare local vs. global generation counters; if they differ, the value is treated as "clean".
This has come up in this answer on Stack Overflow recently, but I don't remember if this trick has an official name. Does it?
One use case is Dijkstra's algorithm if only a tiny subset of the nodes has to be relaxed, and if this has to be done repeatedly.
Best Answer
The aforementioned approach requires that each cell be able to hold a number large enough to hold the number of times the array may need to be reinitialized, which is a substantial space penalty. If a slot is capable of holding at least one value which will never be legitimately written, one may avoid having any other (non-constant) space penalty at the expense of adding an
O(Wlg(N))
time penalty, whereW
is the number of distinct array slots written between clearing operations andN
is the size of the array. For example, suppose one will be storing integers from -2,147,483,647 to 2,147,483,647 (but never -2,147,483,648) and one wants blank array items to read as zero. Start by filling the array with -2,147,483,648 (call that valueB
). When reading an array slot for the application, report a value ofB
as zero. Before writing array slotI
, check whether it heldB
and if so andI
is greater than one, store a zero to slotI/4
after performing a similar check for that location (and, if it heldB
,I/16
, etc).To clear the array, start with
I
equal to 0 or 1, depending upon the array base (the algorithm as described will work for either). Then repeat the following procedure: If itemI
isB
, incrementI
and, if doing so yields a multiple of four, divide by four (terminate if the divide yields a value of 1); if itemI
is notB
, storeB
there and multiplyI
by four (ifI
starts at zero, multiplying by four will leave it zero, but since item 0 will be blank,I
will get incremented).Note that one could replace the constant "four" above with other numbers, with larger values generally requiring less work tagging, but smaller values generally requiring less work clearing; since array slots that are tagged have to be cleared, a value of three or four is almost certainly optimal; since the value four is certainly close to optimal, is better than two or eight, and is more convenient than any other number, it would seem the most reasonable choice.