I am working on a project where I have a to detect a known picture in a scene in "real time" in a mobile context (that means I'm capturing frames using a smartphone camera and resizing the frame to be 150×225). The picture itself can be rather complex. Right now, I'm processing each frame in 1.2s in average (using OpenCV). I'm looking for ways to improve this processing time and global accuracy. My current implementation work as follow :
- Capture the frame
- Convert it to grayscale
- Detect the keypoint and extract the descriptors using ORB
- Match the descriptor (2NN) (object -> scene) and filter them with ratio test
- Match the descriptor (2NN) (scene -> object) and filter them with ratio test
- Non-symmetrical matching removal with 4. and 5.
- Compute the matching confidence (% of matched keypoints against total keypoints)
My approach might not be the right one but the results are OK even though there's a lot of room for improvement. I already noticed that SURF extraction is too slow and I couldn't manage to use homography (it might be related to ORB). All suggestions are welcome!
Best Answer
Performance is always an issue on mobiles :)
There are a few things you can do. OpenCV: C++ and C performance comparison explains generic methods on processing time improvements.
And some specifics for your project:
EDIT
Brad Larsen question is illuminating - if the matcher stays 900ms to process, then that's a problem! Check this post by Andrey Kamaev How Does OpenCV ORB Feature Detector Work? where he explains the possible combinations between descriptors and matchers. Try the FLANN-based uchar matcher.
And also, I suppose you get an awful lot of detections - hundreds or thousands - if it takes that much to match them. Try to limit the detections, or select only the first n best values.