If you keep your newlist
sorted, then you can use binary search to determine whether a duplicate exists. This will only make a difference when your newlist
starts to get large, so if it never gets longer than (say) 64 entries you may not see any improvement. It may also not work if it's expensive to insert into the middle of an array -- shifting all the elements may eat up a lot of your time savings. You could mitigate this somewhat by keeping multiple lists, maybe your starting list and a separate list of the elements you've added. You'll have to search each separately but with binary search that should be manageable. Then, after your loops have finished, merge the new elements into the second list.
Another improvement might be to traverse your oldlist
after inserting a new value into newlist
and removing any duplicates. Hard to say, though, without knowing anything about the data.
I hope it's ok to answer my own question.
I believe I have found the optimal (without overcomplicating the problem) data structure for my problem. There was at least minor idiocy on my part for not recognising this earlier. The data doesn't need to be accessed by (x,y,z) but instead by (x, y, range of z (say 0 - 3)). This give a C++ struct as follows:
struct node {
struct node *next;
int zGroup;
int z;
50 bytes of misc data };
I can then address this through a 3D dynamic array (vectors):
vector< vector < vector < node* > > > Data;
Any given Data[x][y][zGroup]
points to the first element of a linked list, the entirety of which is needed every time one element of it is needed. No value of this array is NULL, every one contains a linked-list of at least one element.
The third dimension of the array - the zGroup has jagged dimensions, but with dynamic arrays this isn't an issue. Given the data and computations being performed on it, I know that the max x and y values are set when the file is read and do not change, neither does the number of z groups on any given (x,y) line, the actual z-values of nodes may change, but they will remain inside the same z-groups, giving a constant-sized, fully populated array.
With the way that the file is structured it is also easy enough to page it in and out of memory if I am brought to do this with much larger data sets.
Best Answer
Programming languages:
Array[1]
uses an implicit mapping between the index1
a specific array element.This mapping is language specific. Many languages start at 0, some at 1. Some languages allow to start at an other offset. Some language implement sparse arrays. Some languages don't have general purpose arrays as a fundamental data structure and use list and mappings instead.
Linguistics:
The word "first" is an ordinal. So, when you say "the first element", you don't mean the element number 1, but you mean the element that is at the start of the ordered sequence of elements.
Every programmer will therefore map "first" to what is really the first element in the mapping he knows (e.g.
Array[0]
if the indexing is zero based andArray[1]
if indexes start at 1).Algorithms:
Many algorithms use the word "first" in a language neutral description. So I think your question is definitively relevant in scope of software engineering, even if it's about linguistic.