I'm trying to get a depth map with an uncalibrated method.
I can obtain the fundamental matrix by finding correspondent points with SIFT and then using cv2.findFundamentalMat
. I then use cv2.stereoRectifyUncalibrated
to get the homography matrices for each image. Finally I use cv2.warpPerspective
to rectify and compute the disparity, but this doesn't create a good depth map. The values are very high so I'm wondering if I have to use warpPerspective
or if I have to calculate a rotation matrix from the homography matrices I got with stereoRectifyUncalibrated
.
I'm not sure of the projective matrix with the case of homography matrix obtained with the stereoRectifyUncalibrated
to rectify.
A part of the code:
#Obtainment of the correspondent point with SIFT
sift = cv2.SIFT()
###find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(dst1,None)
kp2, des2 = sift.detectAndCompute(dst2,None)
###FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
good = []
pts1 = []
pts2 = []
###ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.8*n.distance:
good.append(m)
pts2.append(kp2[m.trainIdx].pt)
pts1.append(kp1[m.queryIdx].pt)
pts1 = np.array(pts1)
pts2 = np.array(pts2)
#Computation of the fundamental matrix
F,mask= cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS)
# Obtainment of the rectification matrix and use of the warpPerspective to transform them...
pts1 = pts1[:,:][mask.ravel()==1]
pts2 = pts2[:,:][mask.ravel()==1]
pts1 = np.int32(pts1)
pts2 = np.int32(pts2)
p1fNew = pts1.reshape((pts1.shape[0] * 2, 1))
p2fNew = pts2.reshape((pts2.shape[0] * 2, 1))
retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,F,(2048,2048))
dst11 = cv2.warpPerspective(dst1,rectmat1,(2048,2048))
dst22 = cv2.warpPerspective(dst2,rectmat2,(2048,2048))
#calculation of the disparity
stereo = cv2.StereoBM(cv2.STEREO_BM_BASIC_PRESET,ndisparities=16*10, SADWindowSize=9)
disp = stereo.compute(dst22.astype(uint8), dst11.astype(uint8)).astype(np.float32)
plt.imshow(disp);plt.colorbar();plt.clim(0,400)#;plt.show()
plt.savefig("0gauche.png")
#plot depth by using disparity focal length `C1[0,0]` from stereo calibration and `T[0]` the distance between cameras
plt.imshow(C1[0,0]*T[0]/(disp),cmap='hot');plt.clim(-0,500);plt.colorbar();plt.show()
Here are the rectified pictures with the uncalibrated method (and warpPerspective
):
Here are the rectified pictures with the calibrated method:
I don't know how the difference is so important between the two kind of pictures. And for the calibrated method, it doesn't seem aligned.
The disparity map using the uncalibrated method:
The depths are calculated with : C1[0,0]*T[0]/(disp)
with T from the stereoCalibrate
. The values are very high.
———— EDIT LATER ————
I tried to "mount" the reconstruction matrix ([Devernay97], [Garcia01]) with the homography matrix obtained with "stereoRectifyUncalibrated", but the result is still not good. Am I doing this correctly?
Y=np.arange(0,2048)
X=np.arange(0,2048)
(XX_field,YY_field)=np.meshgrid(X,Y)
#I mount the X, Y and disparity in a same 3D array
stock = np.concatenate((np.expand_dims(XX_field,2),np.expand_dims(YY_field,2)),axis=2)
XY_disp = np.concatenate((stock,np.expand_dims(disp,2)),axis=2)
XY_disp_reshape = XY_disp.reshape(XY_disp.shape[0]*XY_disp.shape[1],3)
Ts = np.hstack((np.zeros((3,3)),T_0)) #i use only the translations obtained with the rectified calibration...Is it correct?
# I establish the projective matrix with the homography matrix
P11 = np.dot(rectmat1,C1)
P1 = np.vstack((np.hstack((P11,np.zeros((3,1)))),np.zeros((1,4))))
P1[3,3] = 1
# P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))
P22 = np.dot(np.dot(rectmat2,C2),Ts)
P2 = np.vstack((P22,np.zeros((1,4))))
P2[3,3] = 1
lambda_t = cv2.norm(P1[0,:].T)/cv2.norm(P2[0,:].T)
#I define the reconstruction matrix
Q = np.zeros((4,4))
Q[0,:] = P1[0,:].T
Q[1,:] = P1[1,:].T
Q[2,:] = lambda_t*P2[1,:].T - P1[1,:].T
Q[3,:] = P1[2,:].T
#I do the calculation to get my 3D coordinates
test = []
for i in range(0,XY_disp_reshape.shape[0]):
a = np.dot(inv(Q),np.expand_dims(np.concatenate((XY_disp_reshape[i,:],np.ones((1))),axis=0),axis=1))
test.append(a)
test = np.asarray(test)
XYZ = test[:,:,0].reshape(XY_disp.shape[0],XY_disp.shape[1],4)
Best Answer
TLDR; Use StereoSGBM (Semi Global Block Matching) for images with smoother edges and use some post filtering if you want it even smoother
OP didn't provide original images, so I'm using
Tsukuba
from the Middlebury data set.Result with regular StereoBM
Result with StereoSGBM (tuned)
Best result I could find in literature
See the publication here for details.
Example of post filtering (see link below)
Theory/Other considerations from OP's question
The large black areas of your calibrated rectified images would lead me to believe that for those, calibration was not done very well. There's a variety of reasons that could be at play, maybe the physical setup, maybe lighting when you did calibration, etc., but there are plenty of camera calibration tutorials out there for that and my understanding is that you are asking for a way to get a better depth map from an uncalibrated setup (this isn't 100% clear, but the title seems to support this and I think that's what people will come here to try to find).
Your basic approach is correct, but the results can definitely be improved. This form of depth mapping is not among those that produce the highest quality maps (especially being uncalibrated). The biggest improvement will likely come from using a different stereo matching algorithm. The lighting may also be having a significant effect. The right image (at least to my naked eye) appears to be less well lit which could interfere with the reconstruction. You could first try brightening it to the same level as the other, or gather new images if that is possible. From here out, I'll assume you have no access to the original cameras, so I'll consider gathering new images, altering the setup, or performing calibration to be out of scope. (If you do have access to the setup and cameras, then I would suggest checking calibration and using a calibrated method as this will work better).
You used
StereoBM
for calculating your disparity (depth map) which does work, butStereoSGBM
is much better suited for this application (it handles smoother edges better). You can see the difference below.This article explains the differences in more depth:
Without any explicit intrinsic camera parameters, specifics about the camera setup (like focal distance, distance between the cameras, distance to the subject, etc.), a known dimension in the image, or motion (to use structure from motion), you can only obtain 3D reconstruction up to a projective transform; you won't have a sense of scale or necessarily rotation either, but you can still generate a relative depth map. You will likely suffer from some barrel and other distortions which could be removed with proper camera calibration, but you can get reasonable results without it as long as the cameras aren’t terrible (lens system isn't too distorted) and are set up pretty close to canonical configuration (which basically means they are oriented such that their optical axes are as close to parallel as possible, and their fields of view overlap sufficiently). This doesn't however appear to be the OPs issue as he did manage to get alright rectified images with the uncalibrated method.
Basic Procedure
findFundamentalMat
stereoRectifyUncalibrated
andwarpPerspective
StereoSGBM
The results are much better:
Matches with ORB and FLANN
Undistorted images (left, then right)
Disparity
StereoBM
This result looks similar to the OPs problems (speckling, gaps, wrong depths in some areas).
StereoSGBM (tuned)
This result looks much better and uses roughly the same method as the OP, minus the final disparity calculation, making me think the OP would see similar improvements on his images, had they been provided.
Post filtering
There's a good article about this in the OpenCV docs. I'd recommend looking at it if you need really smooth maps.
The example photos above are frame 1 from the scene
ambush_2
in the MPI Sintel Dataset.Full code (Tested on OpenCV 4.2.0):