I'll expand my comment:
... if you're adding or removing elements, you want a list (or other flexible data structure). Arrays are only really good when you know exactly how many elements you need at the start.
A Quick Breakdown
Arrays are good when you have a fixed number of elements that is unlikely to change, and you wish to access it in a non-sequential fashion.
- Fixed Size
- Fast Access - O(1)
- Slow Resize - O(n) - needs to copy every element to a new array!
Linked-Lists are optimized for quick additions and removals at either end, but are slow to access in the middle.
- Variable Size
- Slow Access at middle - O(n)
- Needs to traverse each element starting from the head in order to reach the desired index
- Fast Access at Head - O(1)
- Potentially fast access at Tail
- O(1) if a reference is stored at the tail end (as with a doubly-linked list)
- O(n) if no reference is stored (same complexity as accessing a node in the middle)
Array Lists (such as List<T>
in C#!) are a mixture of the two, with fairly fast additions and random access. List<T>
will often be your go-to collection when you're not sure what to use.
- Uses an array as a backing structure
- Is smart about its resizing - allocates the double of its current space when it runs out of it.
- This leads to O(log n) resizes, which is better than resizing every time we add/remove
- Fast Access - O(1)
How Arrays Work
Most languages model arrays as contiguous data in memory, of which each element is the same size. Let's say we had an array of int
s (shown as [address: value], using decimal addresses because I'm lazy)
[0: 10][32: 20][64: 30][96: 40][128: 50][160: 60]
Each of these elements is a 32-bit integer, so we know how much space it takes up in memory (32 bits!). And we know the memory address of the pointer to the first element.
It's trivially easy to get to the value of any other element in that array:
- Take the address of the first element
- Take the offset of each element (its size in memory)
- Multiply the offset by the desired index
- Add your result to the address of the first element
Let's say our first element is at '0'. We know our second element is at '32' (0 + (32 * 1)), and our third element is at 64 (0 + (32 * 2)).
The fact that we can store all these values next to each other in memory means our array is as compact as it can possibly be. It also means that all our elements need to stay together for things to continue working!
As soon as we add or remove an element, we need to pick up everything else, and copy them over to some new place in memory, to make sure there are no gaps between elements, and everything has enough room. This can be very slow, especially if you're doing it every time you want to add a single element.
Linked Lists
Unlike arrays, Linked Lists don't need all their elements to be next to each other in memory. They are composed of nodes, that store the following info:
Node<T> {
Value : T // Value of this element in the list
Next : Node<T> // Reference to the node that comes next
}
The list itself keeps a reference to the head and tail (first and last nodes) in most cases, and sometimes keeps track of its size.
If you want to add an element to the end of the list, all you need to do is get the tail, and change its Next
to reference a new Node
containing your value. Removing from the end is equally simple - just dereference the Next
value of the preceding node.
Unfortunately, if you have a LinkedList<T>
with 1000 elements, and you want element 500, there's no easy way to jump right to the 500th element like there is with an array. You need to start at the head, and keep going to the Next
node, until you've done it 500 times.
This is why adding and removing from a LinkedList<T>
is fast (when working at the ends), but accessing the middle is slow.
Edit: Brian points out in the comments that Linked Lists have a risk of causing a page fault, due to not being stored in contiguous memory. This can be hard to benchmark, and can make Linked Lists even a bit slower than you might expect given just their time complexities.
Best of Both Worlds
List<T>
compromises for both T[]
and LinkedList<T>
and comes up with a solution that's reasonably fast and easy to use in most situations.
Internally, List<T>
is an array! It still has to jump through the hoops of copying its elements when resizing, but it pulls some neat tricks.
For starters, adding a single element doesn't usually cause the array to copy. List<T>
makes sure there's always enough room for more elements. When it runs out, instead of allocating a new internal array with just one new element, it will allocate a new array with several new elements (often twice as many as it currently holds!).
Copy operations are expensive, so List<T>
cuts down on them as much as possible, while still allowing fast random access. As a side effect, it may end up wasting slightly more space than a straight-up array or linked list, but it's usually worth the tradeoff.
TL;DR
Use List<T>
. It's normally what you want, and it seems to be correct for you in this situation (where you're calling .Add()). If you're unsure of what you need, List<T>
is a good place to start.
Arrays are good for high-performance, "I know I need exactly X elements" things. Alternatively, they're useful for quick, one-off "I need to group these X things I've already defined together so I can loop over them" structures.
There are a number of other collection classes. Stack<T>
is like a linked list that only operates from the top. Queue<T>
works as a first-in-first-out list. Dictionary<T, U>
is an unordered, associative mapping between keys and values. Play with them and get to know the strengths and weaknesses of each. They can make or break your algorithms.
Best Answer
Iterating through a List is (slightly) slower than a plain array due to a few factors:
Bounds checks: This is likely to be the biggest factor; every single indexer access to the List is going to do a bounds check. Bounds checking on a raw array can often be trivially optimized out of a loop by the JIT.
Method call costs: The indexer on a List is a method call. This might get inlined, but it also might not.
An extra indirection: In order to get to the array inside the List, you need to dereference the List reference. With an array, you have a reference directly to the necessary data.
Potentially an extra copy of the element: When you use the List indexer, you get a copy of the element back. With an array, you can access the element directly in memory. Depending on what the rest of your code looks like, you can avoid making a copy of that data.
When using the Enumerator to iterate, your costs are mostly dominated by the aforementioned bounds check as well as the "version" check to make sure you haven't modified the list while iterating over it.