Opencv – Detect banana or apple among the bunch of fruits on a plate with > 90% success rate. (See image)

computer visionimage processingopencv

enter image description here enter image description here

Hi! I'm kinda new to OpenCV and Image processing. I've tried following approaches until now, but I believe there's gotta be a better approach.

1). Finding color range (HSV) manually using GColor2/Gimp tool/trackbar manually from a reference image which contains a single fruit (banana)
with a white background. Then I used inRange(), findContour(),
drawContour() on both reference banana image & target
image(fruit-platter) and matchShapes() to compare the contours in the
end.

It works fine as long as the color range chosen is appropriate. (See 2nd image). But since these fruits doesn’t have uniform solid color, this approach didn't seem like an ideal approach to me. I don't want to hard-code the color-range (Scalar values) inside inRange().

2). Manual thresholding and contour matching.

Same issue as (1). Don't wanna hard-code the threshold value.

3). OTSU thresholding and canny edge detection.

Doesn't work well for banana, apple and lemon.

4). Dynamically finding colors. I used the cropped banana reference
image. Calculated the mean & standard deviation of the image.

Don't know how to ignore the white background pixels in my mean/std-dev calculation without looping through each x,y pixels. Any suggestions on this are welcome.

5). Haar Cascade training gives inaccurate results. (See the image below). I believe proper training might give better results. But not interested in this for now.

enter image description here

Other approaches I’m considering:

6). Using floodfill to find all the connected pixels and
calculating
the average and standard deviation of the same.

Haven't been successful in this. Not sure how to get all the connected pixels. I dumped the mask (imwrite) and got the banana (from the reference banana image) in black & white form. Any suggestions on this are welcome.

7). Hist backprojection:- not sure how it would help me.

8). K-Means , not tried yet. Let me know, if it’s better than step
(4).

9). meanshift/camshift → not sure whether it will help. Suggestions are welcome.

10). feature detection — SIFT/SURF — not tried yet.

Any help, tips, or suggestions will be highly appreciated.

Best Answer

Answers to such generic questions (object detection), especially to ones like this that are very active research topics, essentially boil down to a matter of preference. That said, of the 10 "approaches" you mentioned, feature detection/extraction is probably the one deserving the most attention, as it's the fundamental building block of a variety of computer vision problems, including but not limited to object recognition/detection.

A very simple but effective approach you can try is the Bag-of-Words model, very commonly used in early attempts at fast object detection, with all global spatial relationship information lost.

Late object detection research trend from what I observed from annual computer vision conference proceedings is that you encode each object by a graph that store feature descriptors in the nodes and store the spatial relationship information in the edges, so part of the global information is preserved, as we can now match not only the distance of feature descriptors in feature space but also the spatial distance between them in image space.

One common pitfall specific to this problem you described is that the homogeneous texture on banana and apple skins may not warrant a healthy distribution of features and most features you detect will be on the intersections of (most commonly) 3 or more objects, which in itself isn't a commonly regarded "good" feature. For this reason I suggest looking into superpixel object recognition (Just Google it. Seriously.) approaches, so the mathematical model of class "Apple" or "Banana" will be a block of interconnecting superpixels, stored in a graph, with each edge storing spatial relationship information and each node storing information concerning the color distribution etc. of the neighborhood specified by the superpixel. Then recognition will be come a (partial) graph matching problem or a problem related to probabilistic graphical model with many existing research done w.r.t it.

Related Topic