Efficiency of C# Dictionaries Explained

cdictionaryefficiency

C# dictionaries are a simple way to find if something exists etc etc. I have a question though on how they work. Let's say instead of a dictionary I use an ArrayList. Instead of using ContainsKey (or an equivalent method in another language) I loop through the ArrayList to check if something exists there (or performing binary search if data is sorted or something similar). What's the difference in efficiency? Is the ContainsKey method using some more efficient way rather than looping through the keys and check if what I am searching exists?

If let's say I had created a specific hash function which corresponds to the type of data that I am having and is specifically designed for that set of data then yes, that hash function is indeed faster than looping through data. But dictionaries are general. ContainsKey method is not specific to the data that it gets, it's a general searching method.

Basically what I am asking is. Dictionaries are helpful to programmers. They include methods that help with many things and they combine strings with integers,(keys and values) and many more. But concerning efficiency, what do they offer? What's the difference in having a dictionary vs an ArrayList of structs(string,int)

Best Answer

You've got to dig a bit to see how the Dictionary is implemented in C# - Its not as obvious as HashMap (a hash table) or TreeMap (a sorted tree) (or ConcurrentSkipListMap - a skip list).

If you dig down into the "Remarks" section:

The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.

And there we have it. It's a hash table. Note that I've linked the Wikipedia article there - its a fairly good read. You may wish to read the section on collision resolution. It is possible to get a pathological data set where the lookup devolves to O(N) (for example everything you insert falls to the same hash value or index in the hash table for some reason and you're left with linear probing).

While the Dictionary is a general purpose solution you shouldn't be passing around concrete types (such as the Dictionary) - you should be passing around the interfaces. In this case, that interface is IDictionary (docs). To this, you are perfectly capable of writing your own dictionary implementation that does things optimally for the data you have.

As to the efficiency of various lookup/contains?

  • Walking an unsorted list: O(N)
  • Binary search of a sorted array: O(log N)
  • Sorted tree: O(log N)
  • Hash table: O(1)

For most people, the hash table is what they want.

You may find that the SortedDictionary is what you want instead:

The SortedDictionary<TKey, TValue> generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this respect, it is similar to the SortedList<TKey, TValue> generic class. The two classes have similar object models, and both have O(log n) retrieval.

Though, again, if the data structure isn't one that works with your data ideally, you are given the tools (the interfaces) to be able to write one that works best for your data.

The dictionary itself is an abstract data type. You give me a Dictionary and I know what i can do with it and all the tools there there for me to use by the nature of it being a Dictionary. If you gave me an ArrayList, I would find myself writing my own code for searching, inserting, or deleting items from the list. This wastes my time and also means that there's more likelihood for a bug as I copy the code again and again from spot to spot.

Related Topic