Jonathan Olmsted

Rcpp Updates: the sample function

| Comments

Two posts on the rcpp-gallery came out of some work I had to do using an accept-reject sampler. I came to the project after the sampler had already been implemented in R. Given the performance of the code as it was, the job was going to take too long, even on powerful machines. Trying some go-to R tricks didn’t result in enough of a performance boost, so I decided to implement in C++ using Rcpp

The sampler itself used R’s sample() function, so I went looking for the corresponding functionality in C++ (assuming Rcpp would have exposed it). And, while much of R’s random number generation is easily accessed, there was no clean way to hook into the C code underlying sample().

Christian Gunning addressed this by contributing a patch to RcppArmadillo which exposes much of the functionality of R’s sample(). We write about it here.

None of the examples in the above link really allow the Rcpp implementation to shine, so I put together an example accept-reject sampler that runs in C++-land thanks to Rcpp. You can see the results with fully working code here.

Altogether, you may not need to use Rcpp if all you want to do is call sample() once. But, if you have to repeatedly make calls to sample(), the code and the documentation exist for you to implement it readily in Rcpp.

Installing Rmpi

| Comments

After having to search for the same website several times this past year, re-setting Rmpi up on my work machine (Mac Pro) was the final straw. Thanks to this and likely others over the last two years, I use the following for Rmpi with my macports install of OpenMPI:

Rmpi Installation Code
1
2
3
4
5
6
install.packages("Rmpi", configure.args="
--with-Rmpi-include=/opt/local/include/openmpi/
--with-Rmpi-libpath=/opt/local/lib/openmpi/
--with-Rmpi-type=OPENMPI
"
)

@ Princeton

If you happen to be setting up Rmpi on one of the TIGRESS systems at Princeton, I suggest a slightly different solution. It has the same effect, but is a bit more self-documenting which allows for users to more obviously edit the setup as their environment changes.

Assuming you use Bash as a shell (and if you don’t know what that means, on the TIGRESS systems, it means you use Bash), add the following to .bashrc.

.bashrc Addition
1
2
3
4
5
6
7
8
## MPI START ##
MPI_ROOT=/usr/local/openmpi/1.4.5/gcc/x86_64/
export MPI_ROOT
LD_LIBRARY_PATH=/usr/local/openmpi/1.4.5/gcc/x86_64/lib64
export LD_LIBRARY_PATH
PATH=/usr/local/openmpi/1.4.5/gcc/x86_64/bin:$PATH
export PATH
## MPI END ##

This sets the values of several environmental variables. This is not a flexible approach. It is requiring you to use OpenMPI version 1.4.5 (compiled withg GCC). Should you want something else, you need to edit these values accordingly.

Deviates from a Truncated Normal Distribution via Rcpp

| Comments

Some edits on April 24 2013

Motivation

Much of the code that I use for my dissertation project is in C++. I need to generate random variables from a Truncated Normal Distribution and I could not find anything in the Boost libraries or the like. So, I had to implement my own generator. This led me an old article: Robert (1995).

After implementing the sampler described in the paper, I decided to compare its performance to that provided by the truncnorm package in R. While this involved me wrapping some functions around my C++ code and creating some function definitions in R, it wasn’t much additional work.

OpenMPI + InfiniBand

| Comments

This post has been edited and the updates are noted below.

I’ve been working on a periodic rebuilding of the Beowulf cluster that we use in the star lab. I stumbled upon a strange message in setting up the OpenMPI implementation of MPI. In case it matters, but I can’t imagine it will, the cluster is being developed on Fedora Core 16.

RNG Performance with Rcpp

| Comments

Background

Much of the work in my dissertation focuses on generalizing the models used in Political Science for the estimation of ideal points. Despite some shortcomings, there hasn’t been much development of the original model using the canonical roll call voting data. Instead, the innovation has occurred by using novel kinds of data (e.g. speech text or campaign contributions).

In working on some of the shortcomings in how the canonical approach is used in practice, it became necessary to implement mechanics of the Bayesian inference problem—which uses MCMC methods—in C++. Now, since R is a very nice interface relative to C++, the real goal is to just perform the “hard” operations in C++ and keep the rest in R. Enter Rcpp, of course.

Motivation

I was making some speed improvements to my C++ code (which heavily uses Rcpp) and I had to make the decision of how to call the RNG in C++. I could use the old R API with something like Rf_rnorm(0.0,1.0) which would return a double or I could use rnorm(1, 0.0, 1.0)[0] which would also return a double. Now, this second option is not really in the spirit of the syntactic sugar provided by Rcpp. The new API would just as easily let me construct an Rcpp::NumericVector of length N populated with N draws from the Standard Normal distribution with an rnorm(N, 0.0, 1.0) expression. However, short of rewriting a lot of code, I wanted (and needed) to take single draws many times.

The important question to me became, how does performance differ when taking draws one-by-one via the old R API and the new sugary Rcpp API. And, while I was at it, I decided to throw in comparisons to the new Rcpp API used as intended and calling the whole thing from R.

Translated Exponential Distribution

| Comments

The first thing I did on this Day of Independence was run a 5k in Brighton, NY. I was quite happy with the result. As someone who has only recently learned how to run intelligently (regardless of the fact that I thought I knew how to run), I’m beginning to see my average pace get faster and faster. And, this is rewarding.

But who titles a post about running “Translated Exponential Distribution”? Not even me. So, let’s move on.

The second thing I did today was fix some C++ code for random number generation from a Truncated Normal Distribution. The naive Accept-Reject algorithm is great in some cases and terrible in others. None of the R packages that offered an implementation really met my needs, so I implemented the sampler described in Robert (1995) (gated).

Although there is an R interface for my own end-use, everything important happens in C++ via Rcpp. For some parameter settings (see the paper, of course), the algorithm calls for simulating from a Translated Exponential distribution. Personally, I’d never come across this distribution (which maybe should be a point of embarrassment?). Google wasn’t terribly helpful, although I probably could have derived the solution if I felt so inclined. But, that’s more work than the problem really merits given the simplicity of the relationship.

ggplot Snippet

| Comments

I had no idea where on my computer to save the following R snippet to use with ggplot. And, there is no chance I’d remember it because I’ve only used it a handful of times (and rarely “combine” figures within R). Hopefully, this will be a useful storage location.

Below, the objects “A” and “B” must otherwise exist as the output of a ggplot(data) + ... expression.

ggplot Snippet
1
2
3
4
5
6
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 2)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(A, vp = vplayout(1, 1))
print(B, vp = vplayout(1, 2))

New Site Launch

| Comments

Over the last few weeks, I’ve been porting my old static site which I created with a pretty bare-bones templater (htp, to be specific) to a new static site where I use jekyll. In the future, I might consider switching over to jekyll bootstrap, but the marginal improvement will likely be much smaller from that.

The reason I switched was that the old templater was so inelegant and clunky that I ended up not updating content as often as I should have. This switch should remove some of that friction.