Algorithms Array Loops – Add Unique Items to an Array Without Many Comparisons

algorithmsarrayloops

Please bare with me, I want this to be as language agnostic as possible becuase of the languages I am working with (One of which is a language called PowerOn). However, most languanges support for loops and arrays.

Say I have the following list in an aray:

0x  0  Foo
1x  1  Bar
2x  0  Widget
3x  1  Whatsit
4x  0  Foo
5x  1  Bar

Anything with a 1 should be uniqely added to another array with the following result:

0x  1  Bar
1x  1  Whatsit

Keep in mind this is a very elementry example. In reality, I am dealing with 10's of thousands of elements on the old list. Here is what I have so far.

Pseudo Code:

For each element in oldlist
 For each element in newlist
  Compare
  If values oldlist.element equals newlist.element, break new list loop
  If reached end of newlist with with nothing equal from oldlist, add value from old list to new list
 End
End

Is there a better way of doing this? Algorithmicly, is there any room for improvement? And as a bonus qeustion, what is the O notation for this type of algorithm (if there is one)?

Best Answer

If you keep your newlist sorted, then you can use binary search to determine whether a duplicate exists. This will only make a difference when your newlist starts to get large, so if it never gets longer than (say) 64 entries you may not see any improvement. It may also not work if it's expensive to insert into the middle of an array -- shifting all the elements may eat up a lot of your time savings. You could mitigate this somewhat by keeping multiple lists, maybe your starting list and a separate list of the elements you've added. You'll have to search each separately but with binary search that should be manageable. Then, after your loops have finished, merge the new elements into the second list.

Another improvement might be to traverse your oldlist after inserting a new value into newlist and removing any duplicates. Hard to say, though, without knowing anything about the data.