Every once in a while, I get to manipulate a csr_matrix
but I always forget how the parameters indices
and indptr
work together to build a sparse matrix.
I am looking for a clear and intuitive explanation on how the indptr
interacts with both the data
and indices
parameters when defining a sparse matrix using the notation csr_matrix((data, indices, indptr), [shape=(M, N)])
.
I can see from the scipy documentation that the data
parameter contains all the non-zero data, and the indices
parameter contains the columns associated to that data (as such, indices
is equal to col
in the example given in the documentation). But how can we explain in clear terms the indptr
parameter?
Best Answer
Maybe this explanation can help understand the concept:
data
is an array containing all the non zero elements of the sparse matrix.indices
is an array mapping each element indata
to its column in the sparse matrix.indptr
then maps the elements ofdata
andindices
to the rows of the sparse matrix. This is done with the following reasoning:indptr
is an array containing M+1 elements[indptr[i]:indptr[i+1]]
returns the indices of elements to take fromdata
andindices
corresponding to row i. So supposeindptr[i]=k
andindptr[i+1]=l
, the data corresponding to row i would bedata[k:l]
at columnsindices[k:l]
. This is the tricky part, and I hope the following example helps understanding it.EDIT : I replaced the numbers in
data
by letters to avoid confusion in the following example.Note: the values in
indptr
are necessarily increasing, because the next cell inindptr
(the next row) is referring to the next values indata
andindices
corresponding to that row.