Big O Complexity – Problem on Calculating Big O Complexity

algorithmsbig ojava

i have this function on which i have to calculate the time complexity with the Big O notation:

public void print(ArrayList<String> operations, ArrayList<LinkedHashSet<String>> setOfStrings) {
    int numberOfStrings = 0;
    int numberOfLetters = 0;
    String toPrint = operations.get(1);
    for (Iterator<LinkedHashSet<String>> iteratorSets = setOfStrings.iterator(); iteratorSets.hasNext();) {
        LinkedHashSet<String> subSet = iteratorSets.next();
        if (subSet.contains(toPrint)) {
        for (Iterator<String> iterator = subSet.iterator(); iterator.hasNext();) {
            numberOfLetters = numberOfLetters + iterator.next().length();
        }
        numberOfStrings = subSet.size();
        break;
        }
    }
}

the method does this operation:

For example, if have as operation print foo, I have to do these steps,first of all, I have to find where foo is:

Inside setOfStrings, I can have this situation:

        position 1 : [car, tree, hotel]
        ...
        position n : [lemon, coffee, tea, potato, foo]

When I find the string foo, I have to save the number of strings inside that position and the number of letters of each string, so in this case, I will save:
```
   5(number of strings) 23(sum of number of letters)
```

some considerations:

For the arrayList of operations, I get always a specific position, so I don't iterate. It is always O(1).
For the ArrayList<LinkedHashSet<String>> , I have to iterate, so the complexity in the worst case is O(n)
the operation if (subSet.contains(toPrint)), it will be O(1),because hashSet has mapped all objects inside it.
the iteration inside the hashset made with for (Iterator<LinkedHashSet<String>> iteratorSets = setOfStrings.iterator(); iteratorSets.hasNext();) , it will be O(m),because i have to iterate inside the entire hashset to sum the letters of each words

so in conclusion i think the time complexity of this algorithm is(O(n)*O(m))

are these considerations all corrects?
thanks.

Best Answer

It is a bit more complicated than that. The worst-case complexity is O(M * N), and the best-case complexity is O(N).

There are two worst-case scenarios:

when every subset contains the toPrint String, or
when the subsets contain strings that all has to the same value.

(The second one is extremely unlikely, unless someone deliberately populates the data structure with data with that property. But it is still a case that needs to be considered in a thorough complexity analysis.)

The best-case scenario is when strings in the subsets hash "nicely" AND the probability of the contains test returning true tends to zero.

Finally, N is the size of setOfStrings, and M is the average size of a subset.

Related Solutions

Algorithm Complexity – Problems Calculating Big-O Complexity

The first one looks linear to me; given an input array, a result array twice as large is created, and then every element of the input is traversed to produce two elements of the result. Assuming constant access time for arrays in Java (which is true) and constant array/list length checking (also true AFAIK), given N input elements, all N input elements are traversed, 3N input accesses are made, and 2N assignments are made. This is O(N) plain and simple; if this was the code from the test and you said O(N), you are right and your prof is wrong. The fact that there are multiple steps happening for each iteration of the loop is immaterial; it's not an O(NlogN) function just because each element is accessed three times and the list happens to have 1000 elements.

The second one also appears linear. Given an input collection, the loop traverses half of it, and swaps each of those elements with the "mirrored" element on the other half (basically reversing the collection). Again, neither the fact that 3 reads/writes are made to list elements, nor the fact that only half the items are traversed by the outer loop, makes this a logarithmically-based algorithm; it, like the first, is O(N)-complexity, provided that the collection is constant-time to access.

In fact, I cannot see any way that either of these would be O(logN). Both of them might be O(NlogN), if there's something you're not telling us; that the input collection which you are showing us as an array is actually a Map, using a red-black tree for its internal structure and thus providing log(N) access time. Then the complexity of both of these would be O(NlogN), because for each of the elements traversed by the loop, you spend some constant multiple of logN time reading or writing to elements of the collection.

However, if the problems are exactly as you have stated, your prof is wrong, you are right, and you should make a stink about it to him and to anyone above him that you can get to care, depending on how much of your grade this test represents.

EDIT FROM UPDATE: OK, now we see that the prof marked the wrong two questions wrong:

public static void mystery3(List<String> list) {
   for (int i = 0; i < list.size() – 1; i += 2) {
      String first = list.remove(i);
      list.add(i + 1, first);
   }
}

This one basically swaps each pair of elements; it does so by removing and then re-adding each element to a mutable List collection. Now, if it were a true swap, with a temp variable like you'd use with an array, it would indeed be linear. However, with a List, when you remove or add items in the middle of the collection, the List class automatically rearranges the existing items in the array that it uses behind the scenes to store the elements, by "shifting" each element above the removed element to the next lower index. A removal of element in index X from a collection of N elements will result in N-(X+1) shifts. Same thing in reverse when an element is inserted in between two existing elements; the existing elements to the right of the insertion point are each shifted one place further right to make room.

So, for N elements, it performs N/2 traversals, but N inserts/removals, each of which results in N-(X+1) shifts. As X will, at some point, have the value of every index in the list, this produces a triangular number of total value assignments, N(N-1)/2, which makes it O(N²) complexity. This is a very common breakdown for a quadratic-complexity algorithm; for each index, do something with each higher (or lower) index. Most of the quadratic sorts (e.g. SelectionSort, InsertionSort, BubbleSort) behave in this general way.

public static void mystery4(List<String> list) {
   for (int i = 0; i < list.size() – 1; i += 2) {
      String first = list.get(i);
      list.set(i, list.get(i + 1));
      list.set(i + 1, first);
   }
}

This one performs a similar pair-swapping operation on the elements of the List. However, this implementation is a true "temp swap" as mentioned above. The algorithm swaps pairs, but does so without changing the number of elements in the List; instead, a simple variable first is used to store the variable that is then overwritten in its current position in the List. So, for N/2 traversals, 2N reads/writes are performed, making this algorithm, which does exactly the same thing as the previous one, linear instead of quadratic.

Now, you said you marked this as linear. That's the right answer. Where it might not seem so (and this will be important elsewhere) is that strings in Java are immutable. Once created, they're never changed "in-place"; if the value of a string is modified, it results in the creation of a new String object in memory, and the old one, if nothing else references it, is orphaned for the garbage collector to deal with. So, the modification of a String is an operation linearly time-bound to the total number of characters in the string, because each of those characters must be copied from its current location into the proper new location in the new String. If that were happening, then this would be a linear operation of linear operations, producing a quadratic total time-complexity.

However, that's not happening in this algorithm. Java Strings are also "reference types", meaning that in an assignment of a string from one variable to another (including elements of an array), without any modification, the string is not "cloned"; the memory reference to the string, which is basically a number, is copied between the two variables, and both of them point to the new value. So, the swapping being done is only rearranging these "pointers" actually contained in the array, not creating and deleting the actual strings in memory. That's a constant-time operation, so the dominant term for the Big-Oh notation remains linear.

Java – Complexity of ArrayList of LinkedHashSet

Your main question is whether you could have a collision, but it's first important to determine where you might have a collision. An ArrayList does not have collisions because it is an ordered list on top of an array. Your concern there would be more about how often it has to extend the capacity of the underlying array. You would feel this when you are reading the input only, though.

Where you might have collisions is in the LinkedHashSets. Since hash collisions are a function of the capacity of the underlying hash store, the hashing algorithm, and the data itself, there is always a possibility for a collisions, and there is no way to know if you'll have one except to run all of your data through. In the end, since these are all different strings, and the hashing algorithm for strings built into Java is pretty effective, the remaining factor to determine how many collisions you could potentially have is the capacity of the underlying hash store.

A default LinkedHashSet has an initial capacity of 16 and a load factor of 0.75. This means that once there are 12 elements in the LinkedHashSet, it will double its capacity to accommodate more elements. If we can assume that hashes are evenly distributed, a larger capacity means a lower probability of a collision. Resizing a LinkedHashSet is actually the most expensive activity. In order to resize the underlying hash store, it essentially must iterate through the current elements, rehash them, and re-insert them into the larger hash store. Your best strategy to avoid collisions and rehashing is to allocate large LinkedHashSets when you instantiate them. What "large" is depends on how much data you are inputting and how much of it will be combined. If you know the size and general shape of your data before hand, you could try to estimate a proper initial size large enough to accommodate the largest set (needed size / 0.75 to account for the load factor). This essentially becomes the old time vs. space trade-off, as larger initial sizes means more memory consumed.

Other general suggestions

StringTokenizer is deprecated, and string.split(" ") is generally recommended in its place. However, since you are getting your input from the console, I would suggest disposing of the buffered reader completely and wrapping System.in in a Scanner. Scanner has a next() method that fetches the next token, implicitly tokenizing the input for you with less effort.

Also, while you are using ArrayList and LinkedHashSet for their performance characteristics, it is probably still better to rely on their interfaces (List and Set respectively) than those specific implementations.

Third, instead of writing out all of the code to get an iterator and then looping over it, you can use the for-each construct to loop over any Iterable.

Fourth, you don't need boolean flags when you can just as easily get the same information from a null check. We can eliminate two variables right off the bat.

Here's a slightly rewritten version incorporating those suggestions:

Main.java

import java.io.*;
import java.util.*;

public class Main
{
    public static void main(String[] args) throws IOException {
        Scanner input = new Scanner(System.in);
        List<Set<String>> setOfStrings = new ArrayList<>();

        while (input.hasNext()) {
            Set<String> tokenSet = new LinkedHashSet<>();
            tokenSet.add(input.next());
            setOfStrings.add(tokenSet);
        }

        SetMerger sm = new SetMerger();
        sm.mergeSets("foo", "bar", setOfStrings);
        sm.mergeSets("bar", "baz", setOfStrings);
        System.out.println(setOfStrings.toString());
    }
}

SetMerger.java

import java.util.*;

public class SetMerger
{
    public void mergeSets(String toMerge, String fromMerge, List<Set<String>> stringSets) {
        Set<String> setToMerge = null, targetSet = null;

        for (Set<String> subset : stringSets) {
            if(subset.contains(toMerge)) {
                if(subset.contains(fromMerge)) {
                    return;
                }

                setToMerge = subset;
                break;
            }
        }

        for (Set<String> subset : stringSets) {
            if(subset.contains(fromMerge)) {
                targetSet = subset;
                break;
            }
        }

        if(setToMerge != null && targetSet != null) {
            targetSet.addAll(setToMerge);
            stringSets.remove(setToMerge);
        }
    }
}

Best Answer

Related Solutions

Algorithm Complexity – Problems Calculating Big-O Complexity

Java – Complexity of ArrayList of LinkedHashSet

Other general suggestions

Related Topic