R – Heatmap of Microarray Data using Pearson Distance

dendrogramheatmapr

I have been trying to generate a heatmap in R for some microarray data and for the most part have been successful in producing one, based on online instruction, but it does not do exactly what I want. What I would like is to cluster data based on Pearson distance, rather than euclidean distance, but I have run into some difficulties.

Using heatmap2 (from the gplots package) I use the following code to make my initial heat map:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")   [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))

Test402 is a matrix with 402 rows (genes) and 31 columns (patients), and data.test.factors are indicators of the outcome group each patient belongs to. Using hclustfun works fine here and the heatmap seems to be responsive to change in method and overall works. The problem is, the clustering distance is all Euclidean distance, I would like to change that to Pearson distance. So I attempt the following:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )

the above command fails. That is because Test402 needs to be a square matrix. So looking at some additional advice I tried the following:

cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )

That works, BUT here is the problem. The heatmap, rather than having the original expression values in TEST402, now only displays the correlations. This is NOTwhat I want! I want this, and I only want the dendrogram to cluster differently, I don't want to change what data is actually represented in the heatmap! Is this possible?

Best Answer

Ok...I think you are simply confused about how cor and dist operate. From the documentation on dist:

This function computes and returns the distance matrix computed by using the specified 
    distance measure to compute the distances between the rows of a data matrix.

And from the documentation on cor:

If x and y are matrices then the covariances (or correlations) 
    between the columns of x and the columns of y are computed.

See the difference? dist (and dist objects, which is what heatmap.2 is assuming it's getting) assume that you've calculated the distance between rows, while using cor you are essentially calculating the distance between columns. Adding a simple transpose to your distance function allows this (non-square) example to run for me:

TEST <- matrix(runif(100),nrow=20)
heatmap.2(t(TEST), trace="none", density="none", 
            scale="row",
            labRow="",
            hclust=function(x) hclust(x,method="complete"),
            distfun=function(x) as.dist((1-cor(t(x)))/2))
Related Topic