Algorithms – Creating Cluster Groups Based on Two Criteria

algorithms

I would like to group a population based on two criteria. I will use an analogy to simplify my question.

Let's say I want n number of groups. I want to populate those groups based on person's age and weight, so that all groups contain about the same total age and are evenly distributed by weight (so that there are about same number of heavy and light people in each group).

What kind of algorithm could I use to automate this process? Is there a simple Excel formula or some other method?

UPDATE

Here is the motivation for this statistical analysis. I would like to set up partitioning in a database which will have the best performance. I need to store a lot of data which is grouped by county. I do not know ahead of time, what would be the best number of partitions. Partitions should be uniform, so that they contain about the same number of rows. A partition should hold data rows for one or more counties. Each county will be ranked on the frequency and possibly quantity of updates. Partitions should be built so that frequently updated county data is uniformly distributed.

It does not seem as if there is a simple way to do this. So what kind of algorithm would work for this? I probably would not use VBA for coding, instead most likely I would use perl to write the program for doing the analysis. Are there any ready-made statistical tools that do these type of analysis?

Let me clarify what I mean when I say n number of groups. I will basically pick a number of groups (partitions), plug it into the formula or analysis tool, or custom program. Then I will repeat the process for a different number of groups (partitions) until I find by trial and error, the number of partitions that yields best performance.

Maybe there is a name for this type of analysis? Something that I could try to research via a search engine?

Best Answer

@Anony-Mousse, usually (or rather in its simplest form) "cluster analysis" is used to build clusters of similar objects.

I would suggest @dabest1 to consider looking into biclustering - this wikipedia article seems to be a bit weak at the time of the post.

I have discussed Biclustering in another post in CV.

To further help you in your research, here are a few links that will help get you started in Biclustering from the aforementioned post:

HTH!

Related Topic