C++ Algorithms – Finding Unique Elements in Vectors

algorithmsc

I am trying find the number of unique elements in a vector compared against multiple vectors using C++. The vectors are in sorted order and it can be of size 2,000,000.

Suppose I have,

v1: 5, 8, 13, 16, 20
v2: 2, 4, 6, 8
v3: 20
v4: 1, 2, 3, 4, 5, 6, 7
v5: 1, 3, 5, 7, 11, 13, 15

The number of unique elements in v1 is 1 (i.e. number 16).

I tried two approaches.

Added vectors v2,v3,v4 and v5 into a vector of vector. For each element in v1, checked if the element is present in any of the other vectors.
Combined all the vectors v2,v3,v4 and v5 using merge sort into a single vector and compared it against v1 to find the unique elements.

Note: sample_vector = v1 and all_vectors_merged contains v2,v3,v4,v5

//Method 1
unsigned int compute_unique_elements_1(vector<unsigned int> sample_vector,vector<vector<unsigned int> > all_vectors_merged)
{
    unsigned int duplicate = 0;
    for (unsigned int i = 0; i < sample_vector.size(); i++)
    {
        for (unsigned int j = 0; j < all_vectors_merged.size(); j++)
        {
            if (std::find(all_vectors_merged.at(j).begin(), all_vectors_merged.at(j).end(), sample_vector.at(i)) != all_vectors_merged.at(j).end())
            {
                duplicate++;
            }
        }
    }
    return sample_vector.size()-duplicate;
}

// Method 2
unsigned int compute_unique_elements_2(vector<unsigned int> sample_vector, vector<unsigned int> all_vectors_merged)
{
    unsigned int unique = 0;
    unsigned int i = 0, j = 0;
    while (i < sample_vector.size() && j < all_vectors_merged.size())
    {
        if (sample_vector.at(i) > all_vectors_merged.at(j))
        {
            j++;
        }
        else if (sample_vector.at(i) < all_vectors_merged.at(j))
        {
            i++;
            unique ++;
        }
        else
        {
            i++;
            j++;
        }
    }
    if (i < sample_vector.size())
    {
        unique += sample_vector.size() - i;
    }
    return unique;
}

Of these two techniques, I see that Method 2 gives faster results.

1) Method 1: Is there a more efficient way to find the elements than running std::find on all the vectors for all the elements in v1.

2) Method 2: Extra overhead in comparing vectors v2,v3,v4,v5 and sorting them.

How can I do this in a better way?

[edit]
Vectors are in sorted order.

Best Answer

Use hash tables. Elements are the key, number of occurrences are the values.

Best Answer

Related Solutions

Algorithms – How Can Deterministic Linear Time Selection Algorithm Be Linear?

C++ – the point of using lists over vectors, in C++

Related Topic