C# – Should I Use a List or an Array?

arrayclistwinforms

I'm working on a windows form to calculate UPC for item numbers.

I successfully create one that will handle one item number/UPC at a time, now I want to expand and do it for multiple item numbers/UPCs.

I have started and tried using a list, but I keep getting stuck. I created a helper class:

public class Codes
{
    private string incrementedNumber;
    private string checkDigit;
    private string wholeNumber;
    private string wholeCodeNumber;
    private string itemNumber;

    public Codes(string itemNumber, string incrementedNumber, string checkDigit, string wholeNumber, string wholeCodeNumber)
    {
        this.incrementedNumber = incrementedNumber;
        this.checkDigit = checkDigit;
        this.wholeNumber = wholeNumber;
        this.wholeCodeNumber = wholeCodeNumber;
        this.itemNumber = itemNumber;
    }

    public string ItemNumber
    {
        get { return itemNumber; }
        set { itemNumber = value; }
    }

    public string IncrementedNumber
    { 
        get { return incrementedNumber; }
        set { incrementedNumber = value; } 
    }

    public string CheckDigit
    {
        get { return checkDigit; }
        set { checkDigit = value; }
    }

    public string WholeNumber
    {
        get { return wholeNumber; }
        set { wholeNumber = value; }
    }

    public string WholeCodeNumber
    {
        get { return wholeCodeNumber; }
        set { wholeCodeNumber = value; }
    }

}

Then I got started on my code, but the issue is that the process is incremental, meaning I get the item number from a gridview via checkboxes and put them in the list. Then I get the last UPC from the database, strip the checkdigit, then increment the number by one and put it in the list. Then I calculate the checkdigit for the new number and put that in the list. And here I already get an Out of Memory Exception.
Here is the code I have so far:

List<Codes> ItemNumberList = new List<Codes>();


    private void buttonSearch2_Click(object sender, EventArgs e)
    {
        //Fill the datasets
        this.immasterTableAdapter.FillByWildcard(this.alereDataSet.immaster, (textBox5.Text));
        this.upccodeTableAdapter.FillByWildcard(this.hangtagDataSet.upccode, (textBox5.Text));
        this.uPCTableAdapter.Fill(this.uPCDataSet.UPC);
        string searchFor = textBox5.Text;
        int results = 0;
        DataRow[] returnedRows;
        returnedRows = uPCDataSet.Tables["UPC"].Select("ItemNumber = '" + searchFor + "2'");
        results = returnedRows.Length;
        if (results > 0)
        {
            MessageBox.Show("This item number already exists!");
            textBox5.Clear();
            //clearGrids();
        }
        else
        {
            //textBox4.Text = dataGridView1.Rows[0].Cells[1].Value.ToString();
            MessageBox.Show("Item number is unique.");
        }
    }

    public void checkMarks()
    {

        for (int i = 0; i < dataGridView7.Rows.Count; i++)
        {
            if ((bool)dataGridView7.Rows[i].Cells[3].FormattedValue)
            {
                {
                    ItemNumberList.Add(new Codes(dataGridView7.Rows[i].Cells[0].Value.ToString(), "", "", "", ""));
                }
            }
        }
    }

    public void multiValue1()
    {
        _value = uPCDataSet.UPC.Rows[uPCDataSet.UPC.Rows.Count - 1]["UPCNumber"].ToString();//get last UPC from database
        _UPCNumber = _value.Substring(0, 11);//strip out the check-digit
        _UPCNumberInc = Convert.ToInt64(_UPCNumber);//convert the value to a number

        for (int i = 0; i < ItemNumberList.Count; i++)
        {
            _UPCNumberInc = _UPCNumberInc + 1;
            _UPCNumberIncrement = Convert.ToString(_UPCNumberInc);//assign the incremented value to a new variable
            ItemNumberList.Add(new Codes("", _UPCNumberIncrement, "", "", ""));//**here I get the OutOfMemoreyException**
        }

        for (int i = 0; i < ItemNumberList.Count; i++)
        {
            long chkDigitOdd;
            long chkDigitEven;
            long chkDigitSubtotal;
            chkDigitOdd = Convert.ToInt64(_UPCNumberIncrement.Substring(0, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(2, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(4, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(6, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(8, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(10, 1));
            chkDigitOdd = (3 * chkDigitOdd);
            chkDigitEven = Convert.ToInt64(_UPCNumberIncrement.Substring(1, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(3, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(5, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(7, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(9, 1));
            chkDigitSubtotal = (300 - (chkDigitEven + chkDigitOdd));
            _chkDigit = chkDigitSubtotal.ToString();
            _chkDigit = _chkDigit.Substring(_chkDigit.Length - 1, 1);
            ItemNumberList.Add(new Codes("", "",_chkDigit, "", ""));
        }

Is this the right way to go about it, using a list, or should I be looking at a different way?

Best Answer

I'll expand my comment:

... if you're adding or removing elements, you want a list (or other flexible data structure). Arrays are only really good when you know exactly how many elements you need at the start.

A Quick Breakdown

Arrays are good when you have a fixed number of elements that is unlikely to change, and you wish to access it in a non-sequential fashion.

  • Fixed Size
  • Fast Access - O(1)
  • Slow Resize - O(n) - needs to copy every element to a new array!

Linked-Lists are optimized for quick additions and removals at either end, but are slow to access in the middle.

  • Variable Size
  • Slow Access at middle - O(n)
    • Needs to traverse each element starting from the head in order to reach the desired index
  • Fast Access at Head - O(1)
  • Potentially fast access at Tail
    • O(1) if a reference is stored at the tail end (as with a doubly-linked list)
    • O(n) if no reference is stored (same complexity as accessing a node in the middle)

Array Lists (such as List<T> in C#!) are a mixture of the two, with fairly fast additions and random access. List<T> will often be your go-to collection when you're not sure what to use.

  • Uses an array as a backing structure
  • Is smart about its resizing - allocates the double of its current space when it runs out of it.
    • This leads to O(log n) resizes, which is better than resizing every time we add/remove
  • Fast Access - O(1)

How Arrays Work

Most languages model arrays as contiguous data in memory, of which each element is the same size. Let's say we had an array of ints (shown as [address: value], using decimal addresses because I'm lazy)

[0: 10][32: 20][64: 30][96: 40][128: 50][160: 60]

Each of these elements is a 32-bit integer, so we know how much space it takes up in memory (32 bits!). And we know the memory address of the pointer to the first element.

It's trivially easy to get to the value of any other element in that array:

  • Take the address of the first element
  • Take the offset of each element (its size in memory)
  • Multiply the offset by the desired index
  • Add your result to the address of the first element

Let's say our first element is at '0'. We know our second element is at '32' (0 + (32 * 1)), and our third element is at 64 (0 + (32 * 2)).

The fact that we can store all these values next to each other in memory means our array is as compact as it can possibly be. It also means that all our elements need to stay together for things to continue working!

As soon as we add or remove an element, we need to pick up everything else, and copy them over to some new place in memory, to make sure there are no gaps between elements, and everything has enough room. This can be very slow, especially if you're doing it every time you want to add a single element.

Linked Lists

Unlike arrays, Linked Lists don't need all their elements to be next to each other in memory. They are composed of nodes, that store the following info:

Node<T> {
    Value : T      // Value of this element in the list
    Next : Node<T> // Reference to the node that comes next
}

The list itself keeps a reference to the head and tail (first and last nodes) in most cases, and sometimes keeps track of its size.

If you want to add an element to the end of the list, all you need to do is get the tail, and change its Next to reference a new Node containing your value. Removing from the end is equally simple - just dereference the Next value of the preceding node.

Unfortunately, if you have a LinkedList<T> with 1000 elements, and you want element 500, there's no easy way to jump right to the 500th element like there is with an array. You need to start at the head, and keep going to the Next node, until you've done it 500 times.

This is why adding and removing from a LinkedList<T> is fast (when working at the ends), but accessing the middle is slow.

Edit: Brian points out in the comments that Linked Lists have a risk of causing a page fault, due to not being stored in contiguous memory. This can be hard to benchmark, and can make Linked Lists even a bit slower than you might expect given just their time complexities.

Best of Both Worlds

List<T> compromises for both T[] and LinkedList<T> and comes up with a solution that's reasonably fast and easy to use in most situations.

Internally, List<T> is an array! It still has to jump through the hoops of copying its elements when resizing, but it pulls some neat tricks.

For starters, adding a single element doesn't usually cause the array to copy. List<T> makes sure there's always enough room for more elements. When it runs out, instead of allocating a new internal array with just one new element, it will allocate a new array with several new elements (often twice as many as it currently holds!).

Copy operations are expensive, so List<T> cuts down on them as much as possible, while still allowing fast random access. As a side effect, it may end up wasting slightly more space than a straight-up array or linked list, but it's usually worth the tradeoff.

TL;DR

Use List<T>. It's normally what you want, and it seems to be correct for you in this situation (where you're calling .Add()). If you're unsure of what you need, List<T> is a good place to start.

Arrays are good for high-performance, "I know I need exactly X elements" things. Alternatively, they're useful for quick, one-off "I need to group these X things I've already defined together so I can loop over them" structures.

There are a number of other collection classes. Stack<T> is like a linked list that only operates from the top. Queue<T> works as a first-in-first-out list. Dictionary<T, U> is an unordered, associative mapping between keys and values. Play with them and get to know the strengths and weaknesses of each. They can make or break your algorithms.