Some resources for R help (especially for GLMMs)

A quick pointer to some on-line resources for help on stats, especially on GLMMs (from the R user community):

  • The GLMM Wiki is a resource especially for researchers working with GLMMs and includes a FAQ from R-sig-ME.
  • RSeek allows searches by different categories, including support lists, functions, code, and blogs, includes R-sig-ME, and appears to be up-to-date.
  • R Site Search also includes R-sig-ME through 2010.
  • The R-lang archives are searchable, too.

/bin/sh: latex: command not found: setting R path to include TeX binaries path

In integrating statistical analysis in R with writing up documents in LaTeX, some useful functions for generating LaTeX tables are latex() from the Hmisc package and xtable() from the xtable package.

You also need R to be able to access TeX binaries! Sometimes, R might not be able to do this, if your path environmental variable in R isn’t set to include the path to TeX binaries. This may be a problem especially if R doesn’t inherit the path from the shell environment, e.g. if you load R from Finder in Mac OS X.

Here, we can see that the R path does not include the path to TeX binaries:

#                                          PATH 

So if you try to call an R function that calls TeX binaries, it will give an error message:

library(Hmisc) # Load Hmisc to load latex()
x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','this that'))) # From latex() examples
#/bin/sh: latex: command not found
#Error in system(cmd, intern = TRUE, wait = TRUE) : 
 # error in running command
#sh: xdvi: command not found

To explicitly set the R path, you can create a text file called .Rprofile in your user directory (for Mac OS X, this is in the ~/ directory). In this text file, you can add to the path so that R can access LaTeX commands (following this post):

Sys.setenv(PATH=paste(Sys.getenv("PATH"),"/usr/texbin",sep=":")) # this adds /usr/texbin to the R path

This should work if you have some TeX Live distribution on a UNIX system.

(In .Rprofile you can also explicitly set the entire path, not just adding on /usr/texbin to the existing path, and define any functions you would like to always have in your R environment, etc.)

#                                                      PATH 

x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','this that')))
#This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010)
 #restricted \write18 enabled.
#entering extended mode

Now R can access the TeX binaries!

Formatting numbers for printing in R: rounding and trailing zeroes

In journal articles, descriptive statistics results are often presented as a table of means, with standard deviation/standard error given in parentheses following the mean. Here is how you can prepare the print formatting when working in R: it’s rather trivial, but I wasted a sufficient amount of time on it that I thought it was worth mentioning. (I follow this by using latex() from the Hmisc package to generate LaTeX output of such a table.)

Suppose I have a vector of means for percent correct from 5 different experimental conditions, and a vector of the corresponding standard errors, and both are sorted in the same order:

all.res.corr$per.correct # Vector of mean percent correct, comes from a data frame called all.res.corr
#[1] 52.53561 60.51282 64.13105 66.38177 67.46439
all.res.corr$se.per.corr # Vector of standard errors, from same data frame
#[1] 2.409955 2.763525 2.831450 2.909737 2.902924
  1. First, use round() to round the numbers to two decimal places.
    #[1] 52.54 60.51 64.13 66.38 67.46
    #[1] 2.41 2.76 2.83 2.91 2.90
  2. Then use formatC() to access C-like formatting options to format the two vectors to print up to 2 digits after the decimal point. The argument specification format = "f" allows you to set the number of digits after the decimal point to print. This is a crucial step! If you don’t do it, you won’t get trailing zeroes to print, even though it looks like in the preceding code block, for instance, that round() preserves the trailing zero. It gets lost in a numeric to character data type conversion. See this post for an alternative using sprintf() (which may be familiar from many other programming languages).
    formatC(round(all.res.corr$per.correct,2), 2, format = "f"
    #[1] "52.54" "60.51" "64.13" "66.38" "67.46" # note conversion to character type
    formatC(round(all.res.corr$se.per.corr,2), 2, format = "f")
    [1] "2.41" "2.76" "2.83" "2.91" "2.90"
  3. Finally, use paste() to concatenate the vectors for publication, together with parentheses in the appropriate place. The code below concatenates the arguments given: the formatted mean vector all.res.corr$per.correct, a space and opening parentheses (, the formatted SE vector all.res.corr$se.per.corr, and a closing parentheses ). The separator sep is specified to be an empty string, for there to be no separator between the arguments concatenated.
  4. (per.all.res.corr.print &amp;lt;- paste(formatC(round(all.res.corr$per.correct,2), 2, format = "f"), " (", formatC(round(all.res.corr$se.per.corr,2), 2, format = "f"), ")", sep="") # n.b. Parentheses around an assignment command print the assigned value
    #[1] "52.54 (2.41)" "60.51 (2.76)" "64.13 (2.83)" "66.38 (2.91)" "67.46 (2.90)"

Missing data and data aggregation in R

Faceted barplot with doubled bars

Note the doubled bars in facet 2, 19. This is because of missing rows in the data frame.

I was puzzled why my bar plots (using the R package ggplot2 and geom_bar()) were showing up with doubled bars (see facet 2/19).

When I looked at the data used for the plotting, it turned out that the data frame I was plotting data from had suppressed rows with missing data (e.g. the data frame has no row for 2 for some experimental conditions):

dat.subj <- ddply(mono, c("is.creak","response",""), function(d) data.frame(mean.log.rt=mean(d[,"log.rt"])))

> head(dat.subj)
  is.creak response mean.log.rt
1        0       T4        1  0.20970491
3        0       T4        3 -0.35065706
4        0       T4        4 -0.02450301
5        0       T4        5 -0.20948722
6        0       T4        6  0.72335601

Then I learned about the .drop argument for ddply() from this post. Read the post for more information on how other data aggregation functions behave with respect to missing data.

By default, ddply() assigns .drop = TRUE. So I assigned .drop = FALSE and now the missing row appears with NaN.

dat.subj <- ddply(mono, c("is.creak","response",""), function(d) data.frame(mean.log.rt=mean(d[,"log.rt"])), .drop=FALSE)

> head(dat.subj)
  is.creak response mean.log.rt
1        0       T4        1  0.20970491
2        0       T4        2         NaN
3        0       T4        3 -0.35065706
4        0       T4        4 -0.02450301
5        0       T4        5 -0.20948722
6        0       T4        6  0.72335601

Here’s the revised plot, which prints correctly.

The plot shows missing data correctly when the data frame indicates missing data explicitly with NaN

Postscript: the last thing I needed to fix was that for calculating standard error over the subjects in further data analysis, I used the sd() function in aggregation. Because the data frame for subjects, dat.subj, now included rows with missing data, I needed to call sd() to ignore missing values, like this:

sd(d[,"mean.log.rt"], na.rm = TRUE)

R: do not have nlme() and lmer() packages simultaneously loaded

I noticed when I was trying to display the summary of an lmer object using display() from the arm() package, I was getting this error:

Error in UseMethod("fixef") :
no applicable method for 'fixef' applied to an object of class "mer"

And I found from this post that you should not simultaneously have the nlme() and lmer() packages loaded. To detach a package, for instance, nlme(), you can do this (see R FAQ 5.2):