I have been trying to generate a heatmap in R for some microarray data and for the most part have been successful in producing one, based on online instruction, but it does not do exactly what I want. What I would like is to cluster data based on Pearson distance, rather than euclidean distance, but I have run into some difficulties.
Using heatmap2 (from the gplots package) I use the following code to make my initial heat map:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue") [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))
Test402 is a matrix with 402 rows (genes) and 31 columns (patients), and data.test.factors are indicators of the outcome group each patient belongs to. Using hclustfun works fine here and the heatmap seems to be responsive to change in method and overall works. The problem is, the clustering distance is all Euclidean distance, I would like to change that to Pearson distance. So I attempt the following:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )
the above command fails. That is because Test402 needs to be a square matrix. So looking at some additional advice I tried the following:
cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )
That works, BUT here is the problem. The heatmap, rather than having the original expression values in TEST402, now only displays the correlations. This is NOTwhat I want! I want this, and I only want the dendrogram to cluster differently, I don't want to change what data is actually represented in the heatmap! Is this possible?
Best Answer
Ok...I think you are simply confused about how
cor
anddist
operate. From the documentation ondist
:And from the documentation on
cor
:See the difference?
dist
(anddist
objects, which is whatheatmap.2
is assuming it's getting) assume that you've calculated the distance between rows, while usingcor
you are essentially calculating the distance between columns. Adding a simple transpose to your distance function allows this (non-square) example to run for me: