In integrating statistical analysis in R with writing up documents in LaTeX, some useful functions for generating LaTeX tables are
latex() from the Hmisc package and
xtable() from the xtable package.
You also need R to be able to access TeX binaries! Sometimes, R might not be able to do this, if your path environmental variable in R isn’t set to include the path to TeX binaries. This may be a problem especially if R doesn’t inherit the path from the shell environment, e.g. if you load R from Finder in Mac OS X.
Here, we can see that the R path does not include the path to TeX binaries:
Sys.getenv("PATH") # PATH #"/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin"
So if you try to call an R function that calls TeX binaries, it will give an error message:
library(Hmisc) # Load Hmisc to load latex() x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','this that'))) # From latex() examples latex(x) #/bin/sh: latex: command not found #Error in system(cmd, intern = TRUE, wait = TRUE) : # error in running command #sh: xdvi: command not found
To explicitly set the R path, you can create a text file called
.Rprofile in your user directory (for Mac OS X, this is in the
~/ directory). In this text file, you can add to the path so that R can access LaTeX commands (following this post):
Sys.setenv(PATH=paste(Sys.getenv("PATH"),"/usr/texbin",sep=":")) # this adds /usr/texbin to the R path
This should work if you have some TeX Live distribution on a UNIX system.
.Rprofile you can also explicitly set the entire path, not just adding on
/usr/texbin to the existing path, and define any functions you would like to always have in your R environment, etc.)
Sys.getenv("PATH") # PATH #"/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/texbin" library(Hmisc) x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','this that'))) latex(x) #This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010) #restricted \write18 enabled. #entering extended mode
Now R can access the TeX binaries!
When installing Matlab 2008b Student Version on Mac OSX 10.6.6, I got an exception error that said something like:
Cant load library: /Applications/MATLAB_R2008bSV.app/bin/maci64/libinstutil.jnilib
It turns out that Matlab 2008b is 32 bit, but the installer was running as 64 bit because of the Java preferences. Installation worked after I followed notes from this Matlab Central post.
Using the Java Preferences application (in /Applications/Utilities) go to the General tab, and at the bottom, under Java Applications, drag to reorder “Java SE 6 32-bit” to come before “Java SE 6 64-bit”. I only needed to do this while I was trying to run the activation program that is required before Matlab will run; after that you can reset the Java preferences to their original values.
In journal articles, descriptive statistics results are often presented as a table of means, with standard deviation/standard error given in parentheses following the mean. Here is how you can prepare the print formatting when working in
R: it’s rather trivial, but I wasted a sufficient amount of time on it that I thought it was worth mentioning. (I follow this by using
latex() from the
Hmisc package to generate
LaTeX output of such a table.)
Suppose I have a vector of means for percent correct from 5 different experimental conditions, and a vector of the corresponding standard errors, and both are sorted in the same order:
all.res.corr$per.correct # Vector of mean percent correct, comes from a data frame called all.res.corr # 52.53561 60.51282 64.13105 66.38177 67.46439
all.res.corr$se.per.corr # Vector of standard errors, from same data frame # 2.409955 2.763525 2.831450 2.909737 2.902924
- First, use
round()to round the numbers to two decimal places.
round(all.res.corr$per.correct,2) # 52.54 60.51 64.13 66.38 67.46 round(all.res.corr$se.per.corr,2) # 2.41 2.76 2.83 2.91 2.90
- Then use
formatC()to access C-like formatting options to format the two vectors to print up to 2 digits after the decimal point. The argument specification
format = "f"allows you to set the number of digits after the decimal point to print. This is a crucial step! If you don’t do it, you won’t get trailing zeroes to print, even though it looks like in the preceding code block, for instance, that
round()preserves the trailing zero. It gets lost in a numeric to character data type conversion. See this post for an alternative using
sprintf()(which may be familiar from many other programming languages).
formatC(round(all.res.corr$per.correct,2), 2, format = "f" # "52.54" "60.51" "64.13" "66.38" "67.46" # note conversion to character type formatC(round(all.res.corr$se.per.corr,2), 2, format = "f")  "2.41" "2.76" "2.83" "2.91" "2.90"
- Finally, use
paste()to concatenate the vectors for publication, together with parentheses in the appropriate place. The code below concatenates the arguments given: the formatted mean vector
all.res.corr$per.correct, a space and opening parentheses
(, the formatted SE vector
all.res.corr$se.per.corr, and a closing parentheses
). The separator
sepis specified to be an empty string, for there to be no separator between the arguments concatenated.
(per.all.res.corr.print &lt;- paste(formatC(round(all.res.corr$per.correct,2), 2, format = "f"), " (", formatC(round(all.res.corr$se.per.corr,2), 2, format = "f"), ")", sep="") # n.b. Parentheses around an assignment command print the assigned value # "52.54 (2.41)" "60.51 (2.76)" "64.13 (2.83)" "66.38 (2.91)" "67.46 (2.90)"