Jonathan Olmsted

RcppTN v0.1-4

| Comments

A new release of RcppTN is up at https://github.com/olmjo/RcppTN. For general information on the package, see here Installation instructions are on the github page.

Changes in this release are minimal and focus on a more complete C++ API.

From NEWS:

RcppTN NEWS File
1
2
3
RcppTN 0.1-4
------------
* Added vtn1() to C++ API.

Questions and comments are welcomed on github or by email (see below).

R Packages in Subdirectories on Travis CI

| Comments

Travis CI is a really nice tool to provide automated builds of projects. It supports different languages (e.g., C++, python, Ruby) and allows user configuration of the build environments before testing the build. Effectively, you give Travis CI a set of instructions on how to set up a virtual machine. The VM then follows these instructions and attempts to build your project. This is all done in nearly real time. What I like the most about this setup is that it’s simultaneously testing the build and your explicated assumptions about dependencies. Officially, R isn’t “supported”. But, the platform can and has been used for R package development.

  • Yihui Xie wrote a post about this last year, but I did not really appreciate the potential of the system then.

  • I had noticed the Travis CI status badge (e.g. “Build Status Example”) on a few of Hadley Wickham’s github repos, but did not really dive into it.

  • Then, after looking at the github repo for Dirk Eddelbuettel and Co’s new headers-only R package for the Boost codebase, I realized I wanted to look into Travis CI more.

So, I did.

In particular, I wanted to start testing builds of RcppTN beyond my local machine in the lead up to an initial CRAN release.

Simple Tweak for Subdirectory Builds

Installing an R package that is published in a github repo is made insanely simple with Hadley Wickham’s devtools package. And, it even handles R packages deeper than the root level. The instructions here show use of that feature.

However, even though Travis CI has tight github integration (e.g. testing the build on every commit), all of the examples I had seen used a Travis CI configuration (i.e. .travis.yml) for an R package whose contents were at the root of the github repo. Not so for RcppTN!

I have not taken the time to think about the cleanest way to generalize the “fix”, but instead of the default

Sample Configlink
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Sample .travis.yml for R projects.
#
# See README.md for instructions, or for more configuration options,
# see the wiki:
#   https://github.com/craigcitro/r-travis/wiki

language: c

before_install:
  - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh
  - chmod 755 ./travis-tool.sh
  - ./travis-tool.sh bootstrap
install:
  - ./travis-tool.sh install_deps
script: ./travis-tool.sh run_tests

on_failure:
  - ./travis-tool.sh dump_logs

notifications:
  email:
    on_success: change
    on_failure: change

I needed the following:

Example of Change
1
2
3
4
5
6
7
install:
  - cd ./myRpkg
  - ../travis-tool.sh install_deps
script: ../travis-tool.sh run_tests

on_failure:
  - ../travis-tool.sh dump_logs

From this, it’s clear to see that the Travis CI build system is less like make and more what you’d do if you were typing the commands manually at a terminal. With make, any changes to the working directory path are reset after each command in a recipe. So, with make

Make Example 1
1
2
target :
    cd src ; python ./script.py

is different from

Make Example 2
1
2
3
target :
    cd src
    python ./script.py

But, with Travis CI, it’s just like typing things in a shell. Once you’ve cd-ed into a directory, you stay there until you explicitly say to navigate elsewhere.

Getting Started

If you need information on getting started with Travis CI and R, see here.

RcppTN v0.1-3

| Comments

With the release of RcppTN version 0.1-3 (see https://github.com/olmjo/RcppTN), the RcppTN package has reached a new level of feature complete-ness (which is not to say it’s complete). Installation instructions are on the github page.

Features

  • Fast R-level RNG from arbitrary Truncated Normal distributions (respecting R’s RNG seed).
  • Calculation of expectations and variances for arbitrary Truncated Normal distributions.
  • And, most importantly, the same functionality is exposed at the C++ level.

More Detail

From NEWS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RcppTN 0.1-3
------------

* Added etn() function for calculation of 
  theoretical expectations from Truncated Normal distributions.
* The rtn() function warns on NA return values.
* Checks for valid inputs include a finite mean and finite sd of the 
  parent Normal distribution. Warns on invalid.
* Checks for correctly sized inputs. Stops on invalid.
* Added vtn() function for calculation of 
  theoretical variances from Truncated Normal distributions.

RcppTN 0.1-2
------------

* The rtn() function checks for valid inputs and returns NA for bad inputs.

RcppTN 0.1-0
------------

* initial public release

From ChangeLog:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2013-12-04 Jonathan Olmsted <jpolmsted@gmail.com>
  * src/vtn1.cpp, src/vtn.cpp, src/vtnRcpp.cpp, R/vtn.R: Add
  variance calculation functionality.
  * R/rtn.R, R/etn.R: Errors on wrongly sized inputs.
  * src/rtnRcpp.cpp: Memory bug-inducing scope issue fixed.
  * src/check1.cpp: Parens added for compiler happiness.

2013-12-02  Jonathan Olmsted <jpolmsted@gmail.com>

  * DESCRIPTION: version 0.1-3
  * R/rtn.R: R warning on NA return values. Negligible effect on
  performance.
  * R/check1.cpp: Check for finite mean. Check for finite sd.
  * src/etn.cpp: Add expectation calculation at the C++ level.
  * R/etn.R: Add expectation calculation at the R level.

2013-11-27  Jonathan Olmsted <jpolmsted@gmail.com>

  * ChangeLog: ChangeLog added. Previous changes not explicitly
  documented beyond commits.
  * src/rtn.cpp: rtn() checks for valid inputs and returns NA_REAL
  for invalid inputs.
  * vignettes/using.Rnw: shows new NA return values.
  * DESCRIPTION: version 0.1-2

Rcpp Updates: the sample function

| Comments

Two posts on the rcpp-gallery came out of some work I had to do using an accept-reject sampler. I came to the project after the sampler had already been implemented in R. Given the performance of the code as it was, the job was going to take too long, even on powerful machines. Trying some go-to R tricks didn’t result in enough of a performance boost, so I decided to implement in C++ using Rcpp

The sampler itself used R’s sample() function, so I went looking for the corresponding functionality in C++ (assuming Rcpp would have exposed it). And, while much of R’s random number generation is easily accessed, there was no clean way to hook into the C code underlying sample().

Christian Gunning addressed this by contributing a patch to RcppArmadillo which exposes much of the functionality of R’s sample(). We write about it here.

None of the examples in the above link really allow the Rcpp implementation to shine, so I put together an example accept-reject sampler that runs in C++-land thanks to Rcpp. You can see the results with fully working code here.

Altogether, you may not need to use Rcpp if all you want to do is call sample() once. But, if you have to repeatedly make calls to sample(), the code and the documentation exist for you to implement it readily in Rcpp.

Installing Rmpi

| Comments

After having to search for the same website several times this past year, re-setting Rmpi up on my work machine (Mac Pro) was the final straw. Thanks to this and likely others over the last two years, I use the following for Rmpi with my macports install of OpenMPI:

Rmpi Installation Code
1
2
3
4
5
6
install.packages("Rmpi", configure.args="
--with-Rmpi-include=/opt/local/include/openmpi/
--with-Rmpi-libpath=/opt/local/lib/openmpi/
--with-Rmpi-type=OPENMPI
"
)

@ Princeton

If you happen to be setting up Rmpi on one of the TIGRESS systems at Princeton, I suggest a slightly different solution. It has the same effect, but is a bit more self-documenting which allows for users to more obviously edit the setup as their environment changes.

Assuming you use Bash as a shell (and if you don’t know what that means, on the TIGRESS systems, it means you use Bash), add the following to .bashrc.

.bashrc Addition
1
2
3
4
5
6
7
8
## MPI START ##
MPI_ROOT=/usr/local/openmpi/1.4.5/gcc/x86_64/
export MPI_ROOT
LD_LIBRARY_PATH=/usr/local/openmpi/1.4.5/gcc/x86_64/lib64
export LD_LIBRARY_PATH
PATH=/usr/local/openmpi/1.4.5/gcc/x86_64/bin:$PATH
export PATH
## MPI END ##

This sets the values of several environmental variables. This is not a flexible approach. It is requiring you to use OpenMPI version 1.4.5 (compiled withg GCC). Should you want something else, you need to edit these values accordingly.

Deviates from a Truncated Normal Distribution via Rcpp

| Comments

Some edits on April 24 2013

Motivation

Much of the code that I use for my dissertation project is in C++. I need to generate random variables from a Truncated Normal Distribution and I could not find anything in the Boost libraries or the like. So, I had to implement my own generator. This led me an old article: Robert (1995).

After implementing the sampler described in the paper, I decided to compare its performance to that provided by the truncnorm package in R. While this involved me wrapping some functions around my C++ code and creating some function definitions in R, it wasn’t much additional work.

OpenMPI + InfiniBand

| Comments

This post has been edited and the updates are noted below.

I’ve been working on a periodic rebuilding of the Beowulf cluster that we use in the star lab. I stumbled upon a strange message in setting up the OpenMPI implementation of MPI. In case it matters, but I can’t imagine it will, the cluster is being developed on Fedora Core 16.

RNG Performance with Rcpp

| Comments

Background

Much of the work in my dissertation focuses on generalizing the models used in Political Science for the estimation of ideal points. Despite some shortcomings, there hasn’t been much development of the original model using the canonical roll call voting data. Instead, the innovation has occurred by using novel kinds of data (e.g. speech text or campaign contributions).

In working on some of the shortcomings in how the canonical approach is used in practice, it became necessary to implement mechanics of the Bayesian inference problem—-which uses MCMC methods—-in C++. Now, since R is a very nice interface relative to C++, the real goal is to just perform the “hard” operations in C++ and keep the rest in R. Enter Rcpp, of course.

Motivation

I was making some speed improvements to my C++ code (which heavily uses Rcpp) and I had to make the decision of how to call the RNG in C++. I could use the old R API with something like Rf_rnorm(0.0,1.0) which would return a double or I could use rnorm(1, 0.0, 1.0)[0] which would also return a double. Now, this second option is not really in the spirit of the syntactic sugar provided by Rcpp. The new API would just as easily let me construct an Rcpp::NumericVector of length N populated with N draws from the Standard Normal distribution with an rnorm(N, 0.0, 1.0) expression. However, short of rewriting a lot of code, I wanted (and needed) to take single draws many times.

The important question to me became, how does performance differ when taking draws one-by-one via the old R API and the new sugary Rcpp API. And, while I was at it, I decided to throw in comparisons to the new Rcpp API used as intended and calling the whole thing from R.

Translated Exponential Distribution

| Comments

The first thing I did on this Day of Independence was run a 5k in Brighton, NY. I was quite happy with the result. As someone who has only recently learned how to run intelligently (regardless of the fact that I thought I knew how to run), I’m beginning to see my average pace get faster and faster. And, this is rewarding.

But who titles a post about running “Translated Exponential Distribution”? Not even me. So, let’s move on.

The second thing I did today was fix some C++ code for random number generation from a Truncated Normal Distribution. The naive Accept-Reject algorithm is great in some cases and terrible in others. None of the R packages that offered an implementation really met my needs, so I implemented the sampler described in Robert (1995) (gated).

Although there is an R interface for my own end-use, everything important happens in C++ via Rcpp. For some parameter settings (see the paper, of course), the algorithm calls for simulating from a Translated Exponential distribution. Personally, I’d never come across this distribution (which maybe should be a point of embarrassment?). Google wasn’t terribly helpful, although I probably could have derived the solution if I felt so inclined. But, that’s more work than the problem really merits given the simplicity of the relationship.

ggplot Snippet

| Comments

I had no idea where on my computer to save the following R snippet to use with ggplot. And, there is no chance I’d remember it because I’ve only used it a handful of times (and rarely “combine” figures within R). Hopefully, this will be a useful storage location.

Below, the objects “A” and “B” must otherwise exist as the output of a ggplot(data) + ... expression.

ggplot Snippet
1
2
3
4
5
6
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 2)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(A, vp = vplayout(1, 1))
print(B, vp = vplayout(1, 2))