Installing lme4a in R (on Mac OSX, 10.6.4)

Douglas Bates has draft chapters out for a new book lme4: Mixed-effects Modeling with R up and I was trying to play with the development version lme4a. See this post for a note on how lme4a differs from lme4. It currently cannot be installed automatically using a command like

install.packages("lme4a", repos="http://R-Forge.R-project.org")

and I installed it via svn checkout following these posts. It requires the latest version of R (2.12.0), as well as at least Rcpp 0.8.8.1 (from R-forge, via svn checkout; I obtained Rcpp 0.8.9.3 today), and RcppArmadillo, and the packages minqa and MatrixModels, available from CRAN respositories. (RcppArmadillo is also available from CRAN respositories, but I went ahead and installed it from the Rcpp SVN repository).

Here are the steps I followed (on Mac OSX, 10.6.4)

  1. I downloaded and installed the most recent Mac binary for R, R 2.12.0.
  2. I updated all my packages in R using update.packages()
  3. I installed some dependencies for lme4a in R:
    install.packages(c("minqa", "MatrixModels"))
    
  4. I obtained Rcpp/RcppArmadillo and lme4 repositories via svn checkout and installed Rcpp and lme4a (all commands entered in Terminal)
    cd [your R sources directory]
    svn checkout svn://svn.r-forge.r-project.org/svnroot/rcpp
    svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4
    cd rcpp/pkg
    sudo R CMD INSTALL Rcpp # without sudo, I couldn't get permission to access a necessary directory
    sudo R CMD INSTALL RcppArmadillo
    cd ../..
    cd lme4/pkg
    sudo R CMD INSTALL lme4a
    

Missing data and data aggregation in R

Faceted barplot with doubled bars

Note the doubled bars in facet 2, 19. This is because of missing rows in the data frame.

I was puzzled why my bar plots (using the R package ggplot2 and geom_bar()) were showing up with doubled bars (see facet 2/19).

When I looked at the data used for the plotting, it turned out that the data frame I was plotting data from had suppressed rows with missing data (e.g. the data frame has no row for subj.new 2 for some experimental conditions):


dat.subj <- ddply(mono, c("is.creak","response","subj.new"), function(d) data.frame(mean.log.rt=mean(d[,"log.rt"])))

> head(dat.subj)
  is.creak response subj.new mean.log.rt
1        0       T4        1  0.20970491
3        0       T4        3 -0.35065706
4        0       T4        4 -0.02450301
5        0       T4        5 -0.20948722
6        0       T4        6  0.72335601

Then I learned about the .drop argument for ddply() from this post. Read the post for more information on how other data aggregation functions behave with respect to missing data.

By default, ddply() assigns .drop = TRUE. So I assigned .drop = FALSE and now the missing row appears with NaN.

dat.subj <- ddply(mono, c("is.creak","response","subj.new"), function(d) data.frame(mean.log.rt=mean(d[,"log.rt"])), .drop=FALSE)

> head(dat.subj)
  is.creak response subj.new mean.log.rt
1        0       T4        1  0.20970491
2        0       T4        2         NaN
3        0       T4        3 -0.35065706
4        0       T4        4 -0.02450301
5        0       T4        5 -0.20948722
6        0       T4        6  0.72335601

Here’s the revised plot, which prints correctly.

The plot shows missing data correctly when the data frame indicates missing data explicitly with NaN

Postscript: the last thing I needed to fix was that for calculating standard error over the subjects in further data analysis, I used the sd() function in aggregation. Because the data frame for subjects, dat.subj, now included rows with missing data, I needed to call sd() to ignore missing values, like this:

sd(d[,"mean.log.rt"], na.rm = TRUE)