Algorithms – Extracting Blob Outer Perimeter/Contour Length

algorithms

I have labelled the connected components in a binary image and found their areas and bounding boxes. The components are not necessarily filled, and may contain holes. I wish to identify the component that resembles a pupil the most. For this, I would also like to extract (only) their outer perimeter lengths, for calculating circularity, since these are good features for pupil detection.

I plan to do this sequentially and then move the algorithm to CUDA afterwards, so the algorithm should be parallelisable to some extent. I should note that this work is for my thesis, and I am not asking you to solve anything for me, just provide some feedback on my research so far.

I investigated tons of articles for this problem, but it seems most of them are concerned with connected component labelling and not feature extraction. Alas, I found three candidates, and two one of my own design:

  1. The Marching Squares algorithm. It sounds promising (also embarassingly parallel), but it appears to extract all perimeter lengths, including inner contours, without modification, which will likely overestimate perimeter lengths. However, since I am looking for the pupil, a homogenously colored area, it will likely not overestimate the pupil. The overestimation might also yield bad results for other irregularly shaped blobs, which should be fine if they are then not selected.

  2. The Chain Code algorithm (used by OpenCV's findContours function): Seems pertty good as well, and parallel solutions do exist, but I worry it might fail if the stopping criterion is not good enough (see here, at the bottom near Jacob's stopping criterion). However, it should be able to extract only the outer contour and give good approximations.

  3. The Convex Hull algorithms: While parallel solutions exist, I worry that it might make a blob more circular than it really is, if points are scattered in a way that favors this. Should give good results for the pupil blob though.

  4. Algorithm 1: You could launch some threads that trace from each side of the blob's bounding box towards the opposite side. When the threads "hit" a pixel with the blob's label, they mark it as visited and sum the hits. When another side is traced, visited pixel are ignored, hit pixels are summed again etc., and the total is returned.

  5. Algorithm 2: I also tried counting the number of pixels with a background pixel in their Moore neighborhood, but this overestimates the contour if enough holes are present.

I would appreciate some suggestions before I try to code everything since I am on a schedule. Again, I'm just asking for
advice, not solutions.

Best Answer

I assume that you are not pursuing GPU computation just for the sake of it. In other words, you are willing to consider CPU or "traditional techniques", comparing multiple techniques, and finally choosing whatever gives the highest performance, instead of having to stick to using GPU in order to make a thesis.


I find that Chain Code (contour tracing) algorithm is sufficient for the task, if you smooth the contour coordinates afterwards.


In terms of concurrency and parallelism

If a collection of starting points for contour tracing is supplied, then contour tracing can be run independently by each agent working on one starting point. This is possible because contour tracing is non-destructive on the image or label matrix - it only reads neighbor values around the current position.

However, contour tracing is not data-parallel. In other words, it cannot be vectorized in a SIMD or SPMD machine.

  • Contours can have drastically different lengths. So, some agents will finish early, some will finish late. They will not have coherent flow-control patterns.
  • Also, the memory access patterns will be scattered all over the image. It will also be very wasteful on CPU caches - from each cache line, maybe only up to three pixel values will actually be meaningful for the contour tracing algorithm.

Is there a divide-and-conquer algorithm for contour tracing of a single blob?

I don't know. Please leave a comment if you find one, because I'd like to learn about that too. I haven't done a serious literature search so I may be ignorant. (Of course you should conceal it if it will form the backbone of your thesis.)

On the other hand, I have implemented a tile-based algorithm for connected-component labeling, based on labeling each tile independently and follow up with a "seam stitching" process, and finish off with a final label assignment. It is not data-parallel (not SIMD/SPMD) but it is highly parallelizable - the image can be divided into hundreds or thousands of tiles.


Regarding performance.

In a previous project where I used contour tracing, I was pleasantly surprised that contour tracing algorithms is in general much faster than the execution time for connected-component labeling algorithms, for the type of images I processed, because contour tracing does not perform as many memory operations as would be required by connected-component labeling.

Note that in general, contour-tracing and connected-component labeling aren't direct substitutes for each other - they give outputs in different representations, despite the outputs corresponding to the same blob. You may have to run both algorithms - label the whole image first, sample the "contour starting points", and trace out the contours of each.


Finding nested connected components.

There is an enhanced contour tracing algorithm which resolves nested connected component relationship.

  • Given the contour of the outer connected component,
  • Convert the chain code sequence into a sequence of markers.
    • There are two types of markers:
      1. Marks the leftmost edge (begin X-position) of a horizontal run of pixels
      2. Marks the rightmost edge (end X-position) of a horizontal run of pixels
        • Which marker is generated for the current chain code depends on the orientation of the current chain code.
  • Sort the marker sequence by vertical Y-position (major), and then by X-position (minor).
  • The sorted marker sequence then becomes a run-length raster descriptor of the blob surrounded by the contour.

Smoothing the contour coordinates into floating point.

The smoothing of contour coordinates will cause it to change from discrete (integers) to floating point. After that, you will find that the summation of the pairwise Euclidean distance is good enough for most purposes.

The exact detail of smoothing isn't important - for example, you can use Gaussian smoothing, applying it to the sequence of contour X and Y coordinates just like a 1D convolution. Keep in mind the circular (periodic / wraparound) nature of the sequence.

Related Topic