Python: Cosine Similarity m * n matrices

cosine-similaritynumpypythonvector

I have two M X N matrices which I construct after extracting data from images. Both the vectors have lengthy first row and after the 3rd row they all become only first column.
for example raw vector looks like this

1,23,2,5,6,2,2,6,2,
12,4,5,5,
1,2,4,
1,
2,
2
:

Both vectors have a similar pattern where first three rows have lengthy row and then thin out as it progress. Do do cosine similarity I was thinking to use a padding technique to add zeros and make these two vectors N X N. I looked at Python options of cosine similarity but some examples were using a package call numpy. I couldn't figure out how exactly numpy can do this type of padding and carry out a cosine similarity. Any guidance would be greatly appreciated.

Best Answer

If both arrays have the same dimension, I would flatten them using NumPy. NumPy (and SciPy) is a powerful scientific computational tool that makes matrix manipulations way easier.

Here an example of how I would do it with NumPy and SciPy:

import numpy as np
from scipy.spatial import distance

A = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
B = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )

Aflat = np.hstack(A)
Bflat = np.hstack(B)

dist = distance.cosine(Aflat, Bflat)

The result here is dist = 1.10e-16 (i.e., 0).

Note that I've used here the dtype=object because that's the only way I know to be able to store different shapes into an array in NumPy. That's why later I used hstack() in order to flatten the array (instead of using the more common flatten() function).