Background
Much of the work in my dissertation focuses on generalizing the models used in Political Science for the estimation of ideal points. Despite some shortcomings, there hasn’t been much development of the original model using the canonical roll call voting data. Instead, the innovation has occurred by using novel kinds of data (e.g. speech text or campaign contributions).
In working on some of the shortcomings in how the canonical approach is used in practice, it became necessary to implement mechanics of the Bayesian inference problem—which uses MCMC methods—in C++. Now, since R is a very nice interface relative to C++, the real goal is to just perform the “hard” operations in C++ and keep the rest in R. Enter Rcpp, of course.
Motivation
I was making some speed improvements to my C++ code (which heavily
uses Rcpp) and I had to make the decision of how to call the RNG in
C++. I could use the old R API with something like Rf_rnorm(0.0,1.0)
which would return a double or I could use rnorm(1, 0.0, 1.0)[0]
which would also return a double. Now, this second option is not
really in the spirit of the syntactic sugar provided by Rcpp. The new
API would just as easily let me construct an Rcpp::NumericVector of
length N populated with N draws from the Standard Normal distribution
with an rnorm(N, 0.0, 1.0) expression. However, short of rewriting a
lot of code, I wanted (and needed) to take single draws many times.
The important question to me became, how does performance differ when taking draws one-by-one via the old R API and the new sugary Rcpp API. And, while I was at it, I decided to throw in comparisons to the new Rcpp API used as intended and calling the whole thing from R.
Design
I use the inline, Rcpp, and rbenchmark packages for the
test. The first two packages are used to facilitate use of C++
code. The last provides a nice infrastructure to time and compare the
evaluation of expressions. Each of 4 approaches is used to take
100,000 draws from a Standard Normal distribution:
loop over 100,000 singleton draws from the old R API in C++ via Rcpp (
src1)loop over 100,000 draws singleton draws from the new Rcpp API in C++ via Rcpp (
src2)a single call to the new Rcpp API asking for a vector of 100,000 draws in C++ via Rcpp (
src3)a single call to the standard RNG in R asking for a vector of 100,000 draws
Now, I was certain that (3) would outperform (2), but I didn’t really have any a priori expectations beyond that. You can see my source below for the specifics.
Results
The results are below, lazily formatted as comments in R code. Like we might expect, using the new Rcpp-provided API as intended (Rcpp/Inline/New RNG API/No Loop) is faster than looping over it many times while taking singleton draws (Rcpp/Inline/New RNG API/Loop). Interestingly, though, the new Rcpp-provided API is the fastest method considered. Looping over the old R API (Rcpp/Inline/Old RNG API/Loop) is about 25\% slower. The old R API method and the straight R approach (Straight R) are very close, with the call to the RNG beginning in R (Straight R) being slightly slower. Both of these two approaches outperform the loop over the new Rcpp API (Rcpp/Inline/New RNG API/Loop).
For my purposes, I just needed to see that Rcpp/Inline/Old RNG API/Loop outperformed Rcpp/Inline/New RNG API/Loop, but I couldn’t find any of these benchmarks anywhere online, so I thought I’d put up the full version. I always love it when I come across pareto optimal-type results like this. If you are working in C++, Rcpp/Inline/New RNG API/No Loop not only saves development time (i.e. one line of code, not several), but it saves computational time.
## test replications elapsed relative
## 3 Rcpp/Inline/New RNG API/No Loop 1000 16.398 1.000000
## 1 Rcpp/Inline/Old RNG API/Loop 1000 20.191 1.231309
## 4 Straight R 1000 23.163 1.412550
## 2 Rcpp/Inline/New RNG API/Loop 1000 34.193 2.085193
Edit: Previous results that were posted here and have since been
replaced had no optimization performed by the compiler for the three
Rcpp-based functions I used. The results that are currently
displayed above were compiled with -O3. The rank ordering of the
original results does not change. However, some of the discussion of
the results reflects new relative performance. Additionally, the
number of replications was increased as a modest attempt to make the
relative performance results more stable.
Source
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |