C# – Algorithm to optimize grouping

algorithmsclinq

I would like to know if there's a known algorithm or best practice way to do the following:

I have a collection with a subcollection, for example:

R1 R2 R3
-- -- --
M  M  M
N  N
L  L
   A

What i need is an algorithm to get the following result:

R1, R2: M N L
R2: A
R3: M

This is -not- what i want, it has more repeating values for R than the above:

R1, R2, R3: M
R1, R2: N L
R2: A

I need to group in way that i get the most optimized groups of R.
The least amount of groups of R the better so i get the largest sub collections.

Another example (with the most obvious result):

R1 R2 R3
-- -- --
M  M  A
V  V  B
L  L  C

Should result in:

R1, R2: M V L
R3: A B C

I need to do this in LINQ/C#.

Any solutions? Tips? Links?

Best Answer

I think since you want the least total R's, you want to reverse your K/V and use the values as keys so your initial list becomes:

  • M: R1, R2, R3
  • N: R1, R2
  • L: R1, R2
  • A: R2

Then you can say the count of members intersecting ( http://msdn.microsoft.com/en-us/library/system.linq.enumerable.intersect.aspx ) of M and N is 2, which means you get n-(2 sets * 2 members) number of R's, and if you intersect L you get n-(3*2) number of R's. and if you intersect A as well you get n-(4*1) number of R's, so obviously your best choice is the second intersection set. Unfortunately this is a naive implementation with O(n!) time because you have to interset all possible sets to find the greatest set of intersections which will yield the largest minimization of R's to portray all sets.

After you find the best optimization in this technique where the largest intersection count * sets crossed, remove all the R's in the intersection set from the sets you intersected, then repeat the terribly inefficient combinatoric intersection comparisons against the sets as they are left. Rinse and repeat the comparisons to find largest intersections and remove them from the sets until no Rs are left in any sets.

There is surely a more efficient comparison technique than to intersect each set combination as I'm suggesting. I'm just suggesting an algorithm that should work (I believe) which is likely the naive approach.