Jonathan P. Olmsted
Blog >  

RNG Performance with Rcpp

22 Aug 2012

Background

Much of the work in my dissertation focuses on generalizing the models used in Political Science for the estimation of ideal points. Despite some shortcomings, there hasn’t been much development of the original model using the canonical roll call voting data. Instead, the innovation has occurred by using novel kinds of data (e.g. speech text or campaign contributions).

In working on some of the shortcomings in how the canonical approach is used in practice, it became necessary to implement mechanics of the Bayesian inference problem—which uses MCMC methods—in C++. Now, since R is a very nice interface relative to C++, the real goal is to just perform the “hard” operations in C++ and keep the rest in R. Enter Rcpp, of course.

Motivation

I was making some speed improvements to my C++ code (which heavily uses Rcpp) and I had to make the decision of how to call the RNG in C++. I could use the old R API with something like Rf_rnorm(0.0,1.0) which would return a double or I could use rnorm(1, 0.0, 1.0)[0] which would also return a double. Now, this second option is not really in the spirit of the syntactic sugar provided by Rcpp. The new API would just as easily let me construct an Rcpp::NumericVector of length N populated with N draws from the Standard Normal distribution with an rnorm(N, 0.0, 1.0) expression. However, short of rewriting a lot of code, I wanted (and needed) to take single draws many times.

The important question to me became, how does performance differ when taking draws one-by-one via the old R API and the new sugary Rcpp API. And, while I was at it, I decided to throw in comparisons to the new Rcpp API used as intended and calling the whole thing from R.

Design

I use the inline, Rcpp, and rbenchmark packages for the test. The first two packages are used to facilitate use of C++ code. The last provides a nice infrastructure to time and compare the evaluation of expressions. Each of 4 approaches is used to take 100,000 draws from a Standard Normal distribution:

  1. loop over 100,000 singleton draws from the old R API in C++ via Rcpp (src1)

  2. loop over 100,000 draws singleton draws from the new Rcpp API in C++ via Rcpp (src2)

  3. a single call to the new Rcpp API asking for a vector of 100,000 draws in C++ via Rcpp (src3)

  4. a single call to the standard RNG in R asking for a vector of 100,000 draws

Now, I was certain that (3) would outperform (2), but I didn’t really have any a priori expectations beyond that. You can see my source below for the specifics.

Results

The results are below, lazily formatted as comments in R code. Like we might expect, using the new Rcpp-provided API as intended (Rcpp/Inline/New RNG API/No Loop) is faster than looping over it many times while taking singleton draws (Rcpp/Inline/New RNG API/Loop). Interestingly, though, the new Rcpp-provided API is the fastest method considered. Looping over the old R API (Rcpp/Inline/Old RNG API/Loop) is about 25\% slower. The old R API method and the straight R approach (Straight R) are very close, with the call to the RNG beginning in R (Straight R) being slightly slower. Both of these two approaches outperform the loop over the new Rcpp API (Rcpp/Inline/New RNG API/Loop).

For my purposes, I just needed to see that Rcpp/Inline/Old RNG API/Loop outperformed Rcpp/Inline/New RNG API/Loop, but I couldn’t find any of these benchmarks anywhere online, so I thought I’d put up the full version. I always love it when I come across pareto optimal-type results like this. If you are working in C++, Rcpp/Inline/New RNG API/No Loop not only saves development time (i.e. one line of code, not several), but it saves computational time.

##                              test replications elapsed relative
## 3 Rcpp/Inline/New RNG API/No Loop         1000  16.398 1.000000
## 1    Rcpp/Inline/Old RNG API/Loop         1000  20.191 1.231309
## 4                      Straight R         1000  23.163 1.412550
## 2    Rcpp/Inline/New RNG API/Loop         1000  34.193 2.085193

Edit: Previous results that were posted here and have since been replaced had no optimization performed by the compiler for the three Rcpp-based functions I used. The results that are currently displayed above were compiled with -O3. The rank ordering of the original results does not change. However, some of the discussion of the results reflects new relative performance. Additionally, the number of replications was increased as a modest attempt to make the relative performance results more stable.

Source

library(inline)
library(Rcpp)
library(rbenchmark)

src1 <- "
int N_ = as<int>(n) ;
Rcpp::NumericVector draws(N_) ;
RNGScope scope ;
for(int i = 0; i < N_; i++) {
  draws(i) = Rf_rnorm(0.0, 1.0) ;
}
return draws ;
"

src2 <- "
int N_ = as<int>(n) ;
Rcpp::NumericVector draws(N_) ;
RNGScope scope ;
for(int i = 0; i < N_; i++) {
  draws(i) = rnorm(1, 0.0, 1.0)[0] ;
}
return draws ;
"

src3 <- "
int N_ = as<int>(n) ;
Rcpp::NumericVector draws(N_) ;
RNGScope scope ;
draws = rnorm(N_, 0.0, 1.0) ;
return draws ;
"

GetDraws1 <- cxxfunction(signature(n = "integer"),
                         body = src1,
                         plugin = "Rcpp"
                         )

GetDraws2 <- cxxfunction(signature(n = "integer"),
                         body = src2,
                         plugin = "Rcpp"
                         )

GetDraws3 <- cxxfunction(signature(n = "integer"),
                         body = src3,
                         plugin = "Rcpp"
                         )


.n <- 1e5

benchmark("Rcpp/Inline/Old RNG API/Loop" = {set.seed(1); GetDraws1(.n)},
          "Rcpp/Inline/New RNG API/Loop" = {set.seed(1); GetDraws2(.n)},
          "Rcpp/Inline/New RNG API/No Loop" = {set.seed(1); GetDraws3(.n)},
          "R" = {set.seed(1); rnorm(.n)},
          columns=c("test",
            "replications",
            "elapsed",
            "relative"
            ),
          order="relative",
          replications=1000
          )