This is the standard problem known as "Multidimensional Scaling".
There are lots of varieties, so I recommend reading the wikipedia page on the subject (it's not so long), but in your case, it looks like plain PCA (Principal component analysis) is the way to go.
In particular, a (squared) distance matrix D is closely related to B, the product of the transposed position matrix X with itself:
B = -0.5 * J D J
where J = I - 1/n
(i.e. 1-1/n on the diagonal, -1/n elsewhere)
B = X^T X
(by construction)
PCA will give you the eigenvalues and eigenvectors such that
B = U^T L U
with the columns of U being the eigenvectors, and the diagonal of L being the eigenvalues.
Then it's just a question of multiplication:
X = L^(1/2) U
If you're not used to math this may sound a little scary, but PCA is almost certainly available as a library whatever platform you're on, and then it's just a question of a multiplying a diagonal matrix with matrix, which is easy to do manually if necessary. PCA will reconstruct a full-dimensional space, so if you want just the 2 or 3 most significant dimensions, simply crop the resultant matrix. More significant dimensions are those with the highest eigenvalue; most PCA algorithms list them in order, so you can just take the top few rows.
There are various documents online that explain this in more detail; alas wikipedia falls short here. See, for instance: Wickelmaier, F. (2003). An introduction to MDS. Reports from the Sound Quality Research Unit (SQRU), No. 7 pages 9-11 for more info.
The above is a general pattern you might apply in many languages. To verify this works, I tested it using Eigen which is the fastest linear algebra library I know of that can express this kind of thing simply.
I'll walk you through it:
#include <iostream>
#include <Eigen/Core>
#include <Eigen/Eigenvalues>
using namespace Eigen;
using namespace std;
#define DBG(s) (cout<< #s<<":\n"<<(s)<<"\n\n" )
int main(int argc, char* argv[])
{
The above is just boilerplate referring to Eigen and simple console output function.
auto const n = 4;//4 points.
MatrixXd points(3, n);
points <<
0.0, 4.0, 2.0, 3.0,
3.0, 1.0, 0.0, -4.0,
0.01, -0.1, -0.05, -0.01;
MatrixXd D(n, n);
for (int i = 0; i < n; i++)
D.col(i) = (points.colwise() - points.col(i)).colwise().squaredNorm().transpose();
DBG(D);
// 0 20.0121 13.0036 58.0004
//20.0121 0 5.0025 26.0081
//13.0036 5.0025 0 17.0016
//58.0004 26.0081 17.0016 0
I'm initializing the input (squared) distance matrix D with real distances to make sure they sort of make sense. Notice that the third dimensions values are all really small - I intend to make a 2D reconstruction and I'd like data that's not quite perfect :-).
MatrixXd J = MatrixXd::Identity(n, n) - MatrixXd::Constant(n, n, 1.0 / 4.0);
DBG(J);
// 0.75 -0.25 -0.25 -0.25
//-0.25 0.75 -0.25 -0.25
//-0.25 -0.25 0.75 -0.25
//-0.25 -0.25 -0.25 0.75
MatrixXd B = -0.5 * J * D * J;
DBG(B);
// 14.0648 -0.940469 0.561906 -13.6862
//-0.940469 4.06641 -0.436719 -2.68922
// 0.561906 -0.436719 0.0626563 -0.187844
// -13.6862 -2.68922 -0.187844 16.5633
Note that B has a non-zero diagonal (you worried about this in the comments below).
SelfAdjointEigenSolver<MatrixXd> eigendecomp(B);
DBG(eigendecomp.eigenvalues());
//-1.77725e-015
// 0.000568306
// 5.61749
// 29.139
Note that Eigen returns eigenvalues in ascending order; also note that the first eigenvalue is negative. This is a problem, because we're going to take the square root. Of course, this is just a numeric artifact; since our input data was 3D, it makes sense that the least-significant dimention of a 4D representation is just noise. It's tiny too. In general, you need to make sure that you pick the right number of dimensions to use, because too few and you'll introduce error; too many and you'll run into negative eigenvalues. Note that if any of these negative eigenvalues were large, that would indicate your input is bad, such as that it violates the triangle inequality that A to C via B is always longer than A to C directly.
Your input above has such a violation: points 0 and 2 are the same (have 0 distance), yet point 0 is at distance 0.4 from point 1, but point 2 is at distance 0.2. Clearly regardless of dimensionality or where you put the points, this is impossible. That's going to cause negative eigenvalues that aren't just numerical noise. There are various hacks to mitigate that, but let's get back on track and use those eigenvalues:
MatrixXd X = eigendecomp.eigenvalues().bottomRows(2).cwiseSqrt().asDiagonal()
* eigendecomp.eigenvectors().transpose().bottomRows(2);
That's just the wordy way of saying X = L^(1/2) U
while only considering the most significant 2 dimensions.
DBG(X.transpose() * X);
// 14.0647 -0.940508 0.562078 -13.6863
//-0.940508 4.06638 -0.436623 -2.68925
// 0.562078 -0.436623 0.0622356 -0.187691
// -13.6863 -2.68925 -0.187691 16.5632
Right, so we can verify that the decomposition works: this is quite similar to B. It's not exactly the same because we chopped off two dimensions, after all.
DBG(X);
//-0.999577 1.99533 -0.232164 -0.763594
// 3.61463 0.291588 0.0912992 -3.99751
These points turn out not to be very similar to the original points we started from. That's not actually surprising; after all, distances are invariant under rotation and translation, so you'll find that these points are zero-centered, and they're arbitrarily rotated compared to the points we started with. You could actually reconstruct the rotation matrix effectively used with another PCA, but let's not :-).
MatrixXd D2(n, n);
for (int i = 0; i < n; i++)
D2.col(i) = (X.colwise() - X.col(i)).colwise().squaredNorm().transpose();
DBG(D2);
// 0 20.0121 13.0028 58.0004
//20.0121 0 5.00187 26.0081
//13.0028 5.00187 0 17.0008
//58.0004 26.0081 17.0008 0
return 0;
}
As a final verification, note that the distances these new points generate are pretty much identical to the distances we started with. Numerical inaccuracies and more importantly our 3D -> 2D reduction mean it's not completely perfect, but you should be able to see that D is almost the same as D2.
Note for when your data really isn't very clean and/or doesn't actually represent Euclidean distances: MDS (using PCA) solves the optimization problem of minimizing the squared error of the distance. That's all fine and dandy, but notice that errors in large distances are (over-)emphasized here (12 - 02 = 1, but 112 - 102 = 21, so MDS will try 21 times as hard to fix the second error). If your distances aren't perfect, PCA will try to make the "most significant" i.e. largest distance fit best. This is often actually exactly the wrong thing to do; usually, you care more about local structure than global structure. IF your distance matrix is inaccurate and you cannot reconstruct a good overall position matrix, you might want to look at techniques that at least reconstruct local neighbourhoods best. State of the art there (AFAIK) is t-SNE: you won't get an absolutely accurate map from that technique, but you will get the right things close to each other, at least.
Good luck - sounds like a fun problem!
Best Answer
First, since we only care about the y-distance and will be drawing a horizontal line, we only need to think about the y-coordinates of the points and the y-coordinate that defines the line. The distance between a point and our line will be the absolute difference between the y-coordinate of the point and the y-coordinate that defines the line.
So, rephrasing the problem, we have a set of numbers, y_1 to y_n, and need a number, z, that minimizes the aggregate of the absolute differences between z and the points y_1 to y_n. Instead of minimizing the aggregate, we can just minimize the sum and get the correct result (aggregate = sum / number_of_points).
It turns out that it is the median that does this, not the mean.
https://math.stackexchange.com/questions/113270/the-median-minimizes-the-sum-of-absolute-deviations
For intuition, have points at y-coordinates 10, 10, 10 and 110. The median is 10, aggregate distance is (0+0+0+100)/4=25. The mean is 140/4=35, aggregate distance is (25+25+25+75)/4=37,5. In fact moving the line any distance, d, away from y-coordinate 10 towards 100 increased the distance to 3 points (with d) while only decreasing the distance to 1 point (with d) and hence increasing the aggregate.
(If we would take squares of distances then the mean would be the correct answer)