It is not possible to give an unambiguous solution to this problem. However, here's how I would extract the different solutions:
1) Solve for the camera position and direction using the P3P (Perspective-3-Point) algorithm from the original RANSAC paper, which give up to four possible feasible solutions (with the points in front of the camera).
2) Project a ray with the camera position as origin having (X,Y) as projection in the camera and calculate its intersection with the plane.
This gives you two sets, each of three equations in 3 variables:
a*x0+b*y0+c*z0 = x0'
a*x1+b*y1+c*z1 = x1'
a*x2+b*y2+c*z2 = x2'
d*x0+e*y0+f*z0 = y0'
d*x1+e*y1+f*z1 = y1'
d*x2+e*y2+f*z2 = y2'
Just use whatever method of solving simultaneous equations is easiest in your situation (it isn't even hard to solve these "by hand"). Then your transformation matrix is just ((a,b,c)(d,e,f)).
...
Actually, that is over-simplified and assumes a camera pointed at the origin of your 3D coordinate system and no perspective.
For perspective, the transformation matrix works more like:
( a, b, c, d ) ( xt )
( x, y, z, 1 ) ( e, f, g, h ) = ( yt )
( i, j, k, l ) ( zt )
( xv, yv ) = ( xc+s*xt/zt, yc+s*yt/zt ) if md < zt;
but the 4x3 matrix is more constrained than 12 degrees of freedom since we should have
a*a+b*b+c*c = e*e+f*f+g*g = i*i+j*j+k*k = 1
a*a+e*e+i*i = b*b+f*f+j*j = c*c+g*g+k*k = 1
So you should probably have 4 points to get 8 equations to cover the 6 variables for camera position and angle and 1 more for scaling of the 2-D view points since we'll be able to eliminate the "center" coordinates (xc,yc).
So if you have 4 points and transform your 2-D view points to be relative to the center of your display, then you can get 14 simultaneous equations in 13 variables and solve.
Unfortunately, six of the equations are not linear equations. Fortunately, all of the variables in those equations are restricted to the values between -1 and 1 so it is still probably feasible to solve the equations.
Best Answer
Alright, I came here looking for an answer and didn't find something simple and straightforward, so I went ahead and did the dumb but effective (and relatively simple) thing: Monte Carlo optimisation.
Very simply put, the algorithm is as follows: Randomly perturb your projection matrix until it projects your known 3D coordinates to your known 2D coordinates.
Here is a still photo from Thomas the Tank Engine:
Let's say we use GIMP to find the 2D coordinates of what we think is a square on the ground plane (whether or not it is really a square depends on your judgment of the depth):
I get four points in the 2D image:
(318, 247)
,(326, 312)
,(418, 241)
, and(452, 303)
.By convention, we say that these points should correspond to the 3D points:
(0, 0, 0)
,(0, 0, 1)
,(1, 0, 0)
, and(1, 0, 1)
. In other words, a unit square in the y=0 plane.Projecting each of these 3D coordinates into 2D is done by multiplying the 4D vector
[x, y, z, 1]
with a 4x4 projection matrix, then dividing the x and y components by z to actually get the perspective correction. This is more or less what gluProject() does, exceptgluProject()
also takes the current viewport into account and takes a separate modelview matrix into account (we can just assume the modelview matrix is the identity matrix). It is very handy to look at thegluProject()
documentation because I actually want a solution that works for OpenGL, but beware that the documentation is missing the division by z in the formula.Remember, the algorithm is to start with some projection matrix and randomly perturb it until it gives the projection that we want. So what we're going to do is project each of the four 3D points and see how close we get to the 2D points we wanted. If our random perturbations cause the projected 2D points to get closer to the ones we marked above, then we keep that matrix as an improvement over our initial (or previous) guess.
Let's define our points:
We need to start with some matrix, identity matrix seems a natural choice:
We need to actually implement the projection (which is basically a matrix multiplication):
This is basically what
gluProject()
does, 720 and 576 are the width and height of the image, respectively (i.e. the viewport), and we subtract from 576 to count for the fact that we counted y coordinates from the top while OpenGL typically counts them from the bottom. You'll notice we're not calculating z, that's because we don't really need it here (though it could be handy to ensure it falls within the range that OpenGL uses for the depth buffer).Now we need a function for evaluating how close we are to the correct solution. The value returned by this function is what we will use to check whether one matrix is better than another. I chose to go by sum of squared distances, i.e.:
To perturb the matrix, we simply pick an element to perturb by a random amount within some range:
(It's worth noting that our
project()
function doesn't actually usemat[2]
at all, since we don't compute z, and since all our y coordinates are 0 themat[*][1]
values are irrelevant as well. We could use this fact and never try to perturb those values, which would give a small speedup, but that is left as an exercise...)For convenience, let's add a function that does the bulk of the approximation by calling
perturb()
over and over again on what is the best matrix we've found so far:Now all that's left to do is to run it...:
I find this already gives a pretty accurate answer. After running for a while, the matrix I found was:
with an error of around
2.6e-5
. (Notice how the elements we said were not used in the computation have not actually been changed from our initial matrix; that's because changing these entries would not change the result of the evaluation and so the change would never get carried along.)We can pass the matrix into OpenGL using
glLoadMatrix()
(but remember to transpose it first, and remember to load your modelview matrix with the identity matrix):Now we can for example translate along the z axis to get different positions along the tracks:
For sure this is not very elegant from a mathematical point of view; you don't get a closed form equation that you can just plug your numbers into and get a direct (and accurate) answer. HOWEVER, it does allow you to add additional constraints without having to worry about complicating your equations; for example if we wanted to incorporate height as well, we could use that corner of the house and say (in our evaluation function) that the distance from the ground to the roof should be so-and-so, and run the algorithm again. So yes, it's a brute force of sorts, but works, and works well.