Here is what I would do:
- Divide the space into a regular lattice of cubes. The the length of the side of each cube should be half the minimum distance between points.
- For the first point, see which cube it is in. Add a flag to the cube that contains the point to indicate that that cube contains a point of that color.
- For the next point, see which cube is in. If none of the adjacent cubes are flagged with a point of that color, flag the cube with the color of the point like in (2). Otherwise throw the point away.
- If you still have unprocessed points, go to 3.
Advantages
The advantages of this approach are:
- It is fast
- It is easy to understand and to implement
Limitations
This algorithm has several limitations:
Use of Cubes
The algorithm will throw away too many points because it uses cubes instead of spheres. The question suggests that this is OK.
It it is a problem, however, this limitation can be overcome.
One way is to add the coordinates of the point to the flags. Then, when you test points in step (3) and you find that there is already a point in an adjacent cube, check the distance to that point before you decide whether to throw the point away.
Of course, this will slow things down / add complexity / increase the memory requirements of the algorithm.
Order of Testing Points
The number of points you end up with will depend on the order in which you test the points. To prove this, imagine the situation where you have just 3 red points, and that the cubes they fall into happen to be in a row:
+-----+-----+-----+
| | . | |
|. | | |
| | | . |
+-----+-----+-----+
In this case, you could either throw away the point in the centre cube or the two points in the adjacent cubes.
To improve on the original algorithm, then, you could use the cubes as a basis to count the points of a given color that are tooo close to each other, then throw away the points that have the most points too close. I have not explored this in much depth: I'd need to work through a set of scenarios on paper to figure out the best way to do this.
Memory Use
Another potential problem is that the algorithm could use up quite a lot of memory.
A sensible first step would be to limit the number of cubes created to the minimum by calculate the bounding cuboid for all the points and then only using enough cubes to contain the bounding cuboid.
Another improvement would be to divide the problem into larger cubic regions that contain sub-sets of the full set of points, and processing each of these regions separately. Naturally, special care will need to be taken at the boundaries of thes regions.
My thoughts, based on not knowing your actual algorithms:
- Do the scoring in your application code, unless it is very simple
- Keep running information that will help you generate scores quickly
- For example, if you need an average, you can keep the sum and count and calculate the average as needed. (Note, for simple things like average, you could just use the Database)
- Also, take a look at moving averages (http://en.wikipedia.org/wiki/Moving_average) they can allow you to store very little data about the past and still display historically useful information
- I wouldn't worry too much about caching data, rather i would think about keeping the running information and being able to calculate the result quickly. However, if you have a lot of viewers for the same values, you might start caching.
If the total number of data points is 250k, you can do a fair bit of math on that in a very small amount of time.
I am not sure what you are planning to do for the web app architecture, but i would probably write the application you describe as a single page web app with as few server API points to get data as JSON. Then I would do the graphic/display all on the browser in JavaScript using something like D3, Raphael or another chart library.
Best Answer
Just sort the players by team and number them starting from 0, so e.g.
Then reverse the bits in each number to fill out your bracket sheet