R – Boxplot with axis limits in R

boxplotr

I have data in tab delimited format with nearly 400 columns filled with values ie

         X             Y         Z               A       B               C  
        2.34          .89       1.4             .92     9.40            .82
        6.45          .04       2.55            .14     1.55            .04
        1.09          .91       4.19            .16     3.19            .56
        5.87          .70       3.47            .80     2.47            .90

Now I want visualize the data using box plot method.Though it is difficult to view 400 in single odf,I want split into 50 each.ie(50 x 8).Here is the code I used:

boxplot(data[1:50],xlab="Samples",xlim=c(0.001,70),log="xy", 
        pch='.',col=rainbow(ncol(data[1:50))) 

but I got the following error:

In plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs)
: nonfinite axis limits [GScale(-inf,4.4591,2, .); log=1]

I want to view the box plots for 400 samples with 50 each in a 8 different pdf….Please do help me in getting better visualization.

Best Answer

Others have already pointed out that actual boxplots are not going to work well. However, there is a very efficient way to visually scan all of your variables: Simply plot their distributions as an image (i.e. heatmap). Here is an example showing how it is really quite easy to get the gist of 400 variables and 80,000 individual data points!

# Simulate some data
set.seed(12345)
n.var = 400
n.obs = 200
data  = matrix(rnorm(n.var*n.obs), nrow=n.obs)

# Summarize data
breaks = seq(min(data), max(data), length.out=51)
histdata = apply(data, 2, function(x) hist(x, plot=F, breaks=breaks)$counts)

# Plot
dev.new(width=4, height=4)
image(1:n.var, breaks, t(histdata), xlab='Variable Index', ylab='Histogram Bin')

enter image description here

This will be most useful if all your variables are comparable, or are at least sorted into rational groups. hclust and the heatmap functions can also be helpful here for more complicated displays. Good luck!