In C#, when a user creates an List<byte>
and adds bytes to it, there is a chance it runs out of space and needs to allocate more space. It allocates double (or some other multiplier) the size of the previous array, copies the bytes over and discards the reference to the old array. I know that the list grows exponentially because each allocation is expensive and this limits it to O(log n)
allocations, where just adding 10
extra items each time would result in O(n)
allocations.
However for large array sizes there can be a lot of wasted space, maybe nearly half the array. To reduce memory I wrote a similar class NonContiguousArrayList
which uses List<byte>
as a backing store if there were less than 4MB in the list, then it would allocate additional 4MB byte arrays as NonContiguousArrayList
grew in size.
Unlike List<byte>
these arrays are non-contiguous so there is no copying of data around, just an additional 4M allocation. When an item is looked up, the index is divided by 4M to get the index of the array containing the item, then modulo 4M to get the index within the array.
Can you point out problems with this approach? Here is my list:
- Noncontiguous arrays don't have cache locality which results in bad performance. However at a 4M block size it seems like there would be enough locality for good caching.
- Accessing an item is not quite as simple, there's an extra level of indirection. Would this get optimized away? Would it cause cache problems?
- Since there is linear growth after the 4M limit is hit, you could have many more allocations than you would ordinarily (say, max of 250 allocations for 1GB of memory). No extra memory is copied after 4M, however I'm not sure if the extra allocations is more expensive than copying large chunks of memory.
Best Answer
At the scales you mentioned, the concerns are totally different from those you have mentioned.
Cache locality
Data element access pattern
YourList[k]
andYourList[k+1]
have a high probability of being consecutive (one in four-million chance of being not), that fact will not help performance if you access your list completely randomly, or in large unpredictable strides e.g.while { index += random.Next(1024); DoStuff(YourList[index]); }
Interaction with the GC system
Overhead of address offset calculations
To illustrate why:
The last step still takes the lion's share of time.
Personal suggestion
CopyRange
function, which would behave likeArray.Copy
function but would operate between two instances of yourNonContiguousByteArray
, or between one instance and another normalbyte[]
. These functions can make use of SIMD code (C++ or C#) for maximizing memory bandwidth utilization, and then your C# code can operate on the copied range without the overhead of multiple dereferencing or address calculation.Usability and interoperability concerns
NonContiguousByteArray
with any C#, C++ or foreign-language libraries that expect contiguous byte arrays, or byte arrays that can be pinned.(3 * 1024 * 1024)
and ending at(5 * 1024 * 1024 - 1)
, this means the access will span acrosschunk[0]
andchunk[1]
. You can then construct an array (size 2) of byte arrays (size 4M), pin these chunk addresses and pass them to the underlying code.IList<byte>
interface efficiently:Insert
andRemove
will just take too long to process because they will requireO(N)
time.IEnumerable<byte>
, i.e. it can be scanned sequentially and that's it.