Algorithms – Detecting Duplicates in a Subset of Elements

algorithms

I have a set of numbers say : 1 1 2 8 5 6 6 7 8 8 4 2…

I want to detect the duplicate element in subsets(of given size say k) of the above numbers…
For example :
Consider the increasing subsets(for example consider k=3)

Subset 1 :{1,1,2}

Subset 2 :{1,2,8}

Subset 3 :{2,8,5}

Subset 4 :{8,5,6}

Subset 5 :{5,6,6}

Subset 6 :{6,6,7}

….

….
So my algorithm should detect that subset 1,5,6 contains duplicates..

My approach : 1)Copy the 1st k elements to a temporary array(vector) 2) using #include file in C++ STL…using unique() I would determine if there's any change in size of vector..
Any other clue how to approach this problem..

Best Answer

Take a subset, add each element into a hash table. If the hash table already contains a value - print out saying there is a duplicate.

A hash table will only allocate 1 memory block per entry. Hence its easy to find if the number already exists. Since these are just regular entries your table will be something like this:

struct Hashtable{
   int number;
};

static struct Hashtable hashTable[10];

int getHash(int x){ return x; }

void addHash(int number)
{
    if(hashTable[number].number == -1)
    { /*OK*/
        hashTable[number].number = number;
    }else{
        printf("DUPLICATE FOUND! BAILING OUT!");
    }
}

void initTable(){ for(int i = 0; i < 10; i++){hashTable[i].number = -1;}}
Related Topic