Java – OPTICS Clustering algorithm. How to get the best epsilon

algorithmcluster-analysisdata miningjavaoptics-algorithm

I am implementing a project which needs to cluster geographical points. OPTICS algorithm seems to be a very nice solution. It needs just 2 parameters as input(MinPts and Epsilon), which are, respectively, the minimum number of points needed to consider them as a cluster, and the distance value used to compare if two points are in can be placed in same cluster.

My problem is that, due to the extreme variety of the points, I can't set a fixed epsilon.
Just look at the image below.

the problem

The same points structure but in a different scale would result very different. Suppose to set MinPts=2 and epsilon = 1Km.
On the left, the algorithm would create 2 clusters(red and blue), but on the right it would create one single cluster containing all of the points(red), but I would like to obtain 2 clusters even on the right.

So my question is: is there any kind of way to calculate dynamically the epsilon value to get this result?

EDIT 05 June 2012 3.15pm:
I thought I was using the OPTICS algorithm implementation from the javaml library, but it seems it is actually a DBSCAN algorithm implementation.
So the question now is: does anybody know a java based implementation of OPTICS algorithm?

Thank you very much and excuse my for my poor english.

Marco

Best Answer

The epsilon value in OPTICS is solely to limit the runtime complexity when using index structures. If you do not have an index for acceleration, you can set it to infinity.

To quote Wikipedia on OPTICS

The parameter \varepsilon is strictly speaking not necessary. It can be set to a maximum value. When a spatial index is available, it does however play a practical role when it comes to complexity.

What you seem to have looks much more like DBSCAN than OPTICS. In OPTICS, you should not need to choose epsilon (it should have been called max-epsilon by the authors!), but your cluster extraction method will take care of that. Are you using the Xi extraction proposed in the OPTICS paper?

minPts is much more important. You should try a value of at least 5 or 10, not 2. With 2, you are essentially performing single-linkage clustering!

The example you gave above should work fine once you increase minPts!

Re: edit: As you can even see in the Wikipedia article, ELKI has a proper OPTICS implementation and it's in Java.