An Alternative to R pairs with reshape2 and ggplot2

Base plot function: “pairs”

You may be aware of the pairs plotting function in R. Here is an example with the Boston data set. We will just use a few variables since there isn’t much room here to show the plots for all of them.

miniBoston <- Boston[, c("crim", "lstat", "rm", "medv")]

Let us inspect this new data frame:

> head(miniBoston)
     crim lstat    rm medv
1 0.00632  4.98 6.575 24.0
2 0.02731  9.14 6.421 21.6
3 0.02729  4.03 7.185 34.7
4 0.03237  2.94 6.998 33.4
5 0.06905  5.33 7.147 36.2
6 0.02985  5.21 6.430 28.7

Let’s plot this with the pairs function now:

pairs(miniBoston)

This produces the following plot:
MiniBostonPairs

This is great for seeing associations between all the variables. While this looks OK, it takes too long to get generated and wastes a lot of space (each plot is there twice) when you have a lot of variables and too much data and all you are interested is in seeing how a specific variable is related to all the other variables.

Sensible Alternative with reshape2 and ggplot2

If you do not have reshape2 and ggplot2 installed already, install them first:

install.packages("reshape2")
install.packages("ggplot2")

Let us say we are interested in how the variable medv is related to the other variables. We proceed with:

library(reshape2)
meltBoston <- melt(miniBoston, "medv")

If you inspect meltBoston, you see

> str(meltBoston)
'data.frame':	1518 obs. of  3 variables:
 $ medv    : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
 $ variable: Factor w/ 3 levels "crim","lstat",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...

> head(meltBoston)
  medv variable   value
1 24.0     crim 0.00632
2 21.6     crim 0.02731
3 34.7     crim 0.02729
4 33.4     crim 0.03237
5 36.2     crim 0.06905
6 28.7     crim 0.02985

Now we can plot like this:

library(ggplot2)
ggplot(data = meltBoston, aes(x = value, y = medv)) + 
  geom_point(size = 0.3, pch = 1) + 
  facet_wrap(~ variable, 
             ncol = 3, 
             scales = "free_x")

You get a plot like this which uses the space more effectively:

BostonGgplot

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *