Graph Traversal – Improving Breadth First Search and Adjacency List Creation

arraycgraphgraph-traversal

We are given an array of integers where all elements are between 0-9. have to start from the 1st position and reach end in minimum no of moves such that we can from an index i move 1 position back and forward i.e i-1 and i+1 and jump to any index having the same value as index i.

Time Limit : 1 second

Max input size : 100000

I have tried to solve this problem use a single source shortest path approach using Breadth First Search and though BFS itself is O(V+E) and runs in time the adjacency list creation takes O(n2) time and therefore overall complexity becomes O(n2). is there any way i can decrease the time complexity of adjacency list creation? or is there a better and more efficient way of solving the problem?

int main(){

    vector<int> v;
    string str;
    vector<int> sets[10];

    cin>>str;
    int in;
    for(int i=0;i<str.length();i++){
        in=str[i]-'0';
        v.push_back(in);
        sets[in].push_back(i);
    }

    int n=v.size();
    if(n==1){
        cout<<"0\n";
        return 0;
    }
    if(v[0]==v[n-1]){
        cout<<"1\n";
        return 0;
    }


    vector<int> adj[100001];
    for(int i=0;i<10;i++){
        for(int j=0;j<sets[i].size();j++){
            if(sets[i][j]>0)
                adj[sets[i][j]].push_back(sets[i][j]-1);
            if(sets[i][j]<n-1)  
                adj[sets[i][j]].push_back(sets[i][j]+1);
            for(int k=j+1;k<sets[i].size();k++){
                if(abs(sets[i][j]-sets[i][k])!=1){
                    adj[sets[i][j]].push_back(sets[i][k]);
                    adj[sets[i][k]].push_back(sets[i][j]);
                }
            }
        }
    }

    queue<int> q;
    q.push(0);

    int dist[100001];
    bool visited[100001]={false};
    dist[0]=0;
    visited[0]=true;

    int c=0;
    while(!q.empty()){
        int dq=q.front();
        q.pop();
        c++;

        for(int i=0;i<adj[dq].size();i++){
            if(visited[adj[dq][i]]==false){
                dist[adj[dq][i]]=dist[dq]+1;
                visited[adj[dq][i]]=true;
                q.push(adj[dq][i]);
            }
        }

    }

    cout<<dist[n-1]<<"\n";

    return 0;
}

Best Answer

Key insight: there's no need to eagerly construct the adjacency list. In fact, there's no need to ever construct the adjacency list; you just need a way to find the next nodes. With this insight, we can construct an efficient version of the search.

Lazy edges

For any given node, finding indices i+1 and i-1 is trivial. Finding nodes that have the same value merely requires a lookup in the sets datastructure you've already created.

The rest of your breadth first search still applies equally; you mark off nodes you've already seen to make sure you don't revisit them.

Grouped values

Note that there's really no reason to return to a value once you've left it. Any path that reaches an X-valued index can reach any other X-valued index on the next step. That means your maximum path is 19 (each value can occur at most twice for a 20 nodes i.e. 19 edges along the path).

However, it it also means you don't want the following to occur:

  • you're at index K with value X, and iterate over all other indices with value X, marking them as visited, and adding them to the queue
  • you eventually reach each one of those X-valued indices in the queue, and iterate over everything it can reach, i.e. all those other X's - even though they've all been visited, and you already know you're not going to add them to the queue.

So you're doing quadratic work (each instance of a value iterates over each instance of the same value) and it's obviously pointless to boot. Once you process an X-valued index and thus visit all other X's, you know you'll never need to follow an X->X edge again. Easy fix? Just empty the list of indices by that value once you've processed one.

This new algorithm is linear time:

  • reading the input is linear
  • building sets is linear (each index is inserted once)
  • each index can be "processed" (added to the queue) at most once, so that's linear.
  • each index can be "visited" (checking the visited array) at most three times: once by i-1, once by 1+1, and once by some other identically valued index.

If you really want, you can possibly shave off a bit of the constant factor here. After all, the final path will pass through each value once (and when it does, touching one or two indexes for that value). You can therefore construct a much, much smaller graph of how values can reach each other; solving that for the shortest paths is essentially free (at most 10 nodes, after all), and a shortest path through the indexes must be the shortest path through the values (proof left to reader, but note that any shortcut via i+1/i-1 costs at least a step, but can never win more than a step). Then, when you solve the bigger graph of indices, you don't need to bother taking any steps that conflict with what you already know of the solution; i.e. don't enqueue any indices that have a value other than your own or the "next" value along any shortest path in the value graph. (In essence, this optimization is A*).

Similarly, you could segment your queue by value and distance, and skip processing the rest of the current value with greater distance once you've found a way to the next.

Such trickery is almost certainly not worth it, but hey, it's possible :-).


Coding Style: I'd suggest naming your variables after what they represent. Your code is a little hard to follow because it contains things like sets (of what? why? - oh it's a way to lookup indices having a value), and a queue q (again, what's this mean? oh it's the reachable set in order of distance). Give meaningful concepts (that you repeat in the code) like adj[dq][i] a name. It's the neighbor you're going to visit. This algorithm is so simple, it's eventually clear, but there's no reason it shouldn't be clear right away.


TL;DR

  • Don't precompute edges, instead compute reachability when needed.
  • Empty the set of indices you use to compute value-to-value reachability after using it once to avoid quadratic behavior.
  • The result is O(N).