I'm developing backend for a dating app, in which each user has
-
a profile of his/her characteristics
-
a profile of ideal match's characteristics
There are dozens of characteristics like gender, height, looks and so on.
Some characteristics are strings, others are numbers or arrays.
Each characteristics has ascribed an importance factor, ranging from 0 to 4.
0 means not important at all
and 4 means absolutely necessary
.
so a user's match objects are like this:
{
{
gender: 'female',
importance: 4
}
{
eyeColor: ['blue', 'green'],
importance: 2
} ,
{
ethnicity: [],
importance: 0
}
heightMin: 150,
heightMax: 200,
heightImportance: 3,
....
}
The data are saved in mongodb and the backend is in node.js.
I'm new to data science. I just know that there are some formulas to find similarities/distances between vectors, like Euclidean or cosine similarities. But I'm not sure which method (if any) is the most relevant in this circumstances?
Appreciate your hints.
Best Answer
Identify the different kind of characteristics
Your sample data illustrates very well that different kind of characteristics need to be handled in a different way:
Define a scoring function
Once all the characteristics properly categorized in this way, you are ready to build a general scoring function that:
Improve performance
You then have to complement your scoring with:
Future improvements
You could thing of the following, but at a later stage:
score(ideal 1, profile2)
withscore(ideal 2, profile1)