Background
R dogma is that for loops are bad because they are slow but this is not the case in C++. I had never programmed a line of C++ as of last week but my beloved firstborn started university last week and is enrolled in a C++ intro course, so I thought I would try to learn some and see if it would speed up Passing Bablok regression.
Passing Bablok Regression
As mentioned in the past, the field of Clinical Chemistry has a peculiar devotion to Passing Bablok regression… and hey, why not?
Here is the code for a minimal implementation of Passing Bablok regression as discussed in this paper. This is the scale-invariant version.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
PB.reg <- function(x,y){ lx <- length(x) l <- lx*(lx - 1)/2 k <- 0 S <- rep(NA, lx) for (i in 1:(lx - 1)) { for (j in (i + 1):lx) { k <- k + 1 S[k] <- (y[i] - y[j])/(x[i] - x[j]) } } S.sort <- sort(S) N <- length(S.sort) neg <- length(subset(S.sort,S.sort < 0)) K <- floor(neg/2) if (N %% 2 == 1) { b <- S.sort[(N+1)/2+K] } else { b <- sqrt(S.sort[N / 2 + K]*S.sort[N / 2 + K + 1]) } a <- median(y - b * x) return(as.vector(c(a,b))) } |
So let’s make some fake data and see what we get:
1 2 3 4 5 |
set.seed(314) x <- seq(1, 100, length.out = 100) y <- 1.1* x + 20 + x*rnorm(100,0,0.10) reg <- PB.reg(x,y) round(reg,2) |
1 |
## [1] 18.90 1.14 |
Just to sanity check, we can get the coefficients from the mcr()
package.
1 2 3 |
library(mcr) reg.mcr <- mcreg(x,y) round(reg.mcr@glob.coef,2) |
1 |
## [1] 18.90 1.14 |
Ok, looks good.
Rcpp
The Rcpp package permits compiling and execution of C++ code from within R. This can be good for computationally intensive tasks. Here is my child-like attempt at recapitulation of the R script above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
#include <iostream> #include <vector> #include <cmath> #include <limits> #include <algorithm> #include <numeric> #include "Rcpp.h" using namespace std; using namespace Rcpp; // [[Rcpp::plugins("cpp11")]] // [[Rcpp::export]] vector <double> PB(vector<double> x,vector<double> y){ int lx=x.size(); int l=lx*(lx-1)/2; int k=-1; vector <double> S(l,numeric_limits<double>::quiet_NaN()); vector <double> sortS; vector <double> neg; double b; double a; vector <double> aVec(x.size()); vector <double> coef(2); // Calculate the components of the S Matrix for(int i=0; i<lx-1;i++){ for(int j=i+1; j<lx;j++){ k+=1; S[k]=(y[i]-y[j])/(x[i]-x[j]); } } // Sort the S Matrix sort(S.begin(),S.end()); // Delete any undefined values from S for(vector <double> :: iterator it = S.begin(); it != S.end(); ++it){ if (!isnan(*it)) sortS.push_back((*it)); } // Put all the negative values into a vector called neg for(vector <double> :: iterator it = S.begin(); it != S.end(); ++it){ if ((*it)<0) neg.push_back((*it)); } // Calculate N and K int K=int(floor(float(neg.size()/2))); int N=sortS.size(); // Calculate the slope if (N%2==1) { b=S[static_cast<int> ((N+1)/2+K-1)]; } else { b=sqrt(S[static_cast<int> (N/2+K-1)]*S[static_cast<int> (N/2+K)]); } // Make a vector aVec of the estimates of the intercept for(int i=0; i<x.size(); i++){ aVec[i]=y[i]-b*x[i]; } sort(aVec.begin(),aVec.end()); // Calculate the median from the sorted values of aVec if(x.size()%2==1){ a=aVec[static_cast<int>((x.size()+1.0)/2.0-1.0)]; }else{ a=(aVec[static_cast<int> (x.size()/2.0-1.0)] + aVec[static_cast<int> (x.size()/2.0)])/2.0; } // Report results coef[0]=a; coef[1]=b; return coef; } int main(){ return 0; } |
Saving this to a local directory as PB_Reg.cpp, we can compile it and call the function into the R session as follows:
1 2 |
library(Rcpp) sourceCpp("PB_Reg.cpp") |
And run it:
1 |
round(PB(x,y),2) |
1 |
## [1] 18.90 1.14 |
Looks good! But is it faster? Like really faster? Let’s find out. And let’s also compare it to what the compiler()
package produces:
1 2 3 4 5 6 7 8 9 10 11 12 |
library(compiler) PB.reg.cmpfun <- cmpfun(PB.reg) library(rbenchmark) bmark <- benchmark( Rscript = PB.reg(x,y), `Compiler Package` = PB.reg.cmpfun(x,y), Rcpp = PB(x,y), replications = 100, order = "relative" ) knitr::kable(bmark) |
test | replications | elapsed | relative | user.self | sys.self | user.child | sys.child | |
---|---|---|---|---|---|---|---|---|
3 | Rcpp | 100 | 0.024 | 1.000 | 0.024 | 0.000 | 0 | 0 |
2 | Compiler Package | 100 | 0.214 | 8.917 | 0.214 | 0.000 | 0 | 0 |
1 | Rscript | 100 | 0.217 | 9.042 | 0.213 | 0.004 | 0 | 0 |
So this makes C++ code about 9x faster than R source code. Not bad but in this case it was probably not worth the effort. Oh well. I learned something. Incidentally, the compiler()
package did not help much in this case as it is only marginally better than raw R source. I have seen better results from compiler()
in other cases.
Final thought
Speaking of speed:
“Free yourself, like a gazelle from the hand of the hunter, like a bird from the snare of the fowler.”
Proverbs 6:5
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});
The dogma to avoid
for
loops in R is because, most of the time, it’s better to rely on vectorized operations.For this function, a vectorized approach in R is much faster than the loop in R:
PB.reg.vectorized <- function(x, y) {
xdiffs <- outer(x, x, "-")
xdiffs <- xdiffs[lower.tri(xdiffs)]
ydiffs <- outer(y, y, "-")
ydiffs <- ydiffs[lower.tri(ydiffs)]
S <- ydiffs / xdiffs
S.sort <- sort(S)
N <- length(S.sort)
neg <- length(subset(S.sort,S.sort < 0))
K <- floor(neg/2)
if (N %% 2 == 1) {
b <- S.sort[(N+1)/2+K]
} else {
b <- sqrt(S.sort[N / 2 + K]*S.sort[N / 2 + K + 1])
}
a <- median(y - b * x)
return(as.vector(c(a,b)))
}
library(microbenchmark)
set.seed(314)
x <- seq(1, 100, length.out = 100)
y <- 1.1* x + 20 + x*rnorm(100,0,0.10)
microbenchmark(
original = PB.reg(x, y),
vectorized = PB.reg.vectorized(x, y)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# original 1824.6 2007.95 3380.896 2269.0 3780.20 59922.6 100
# vectorized 518.0 571.60 763.019 612.6 957.75 3125.0 100
In the vectorized version, the loop is replaced by calls to
outer
to create two matrices. Matrix operations are vectorized in R. They're faster despite over half of the matrices not being used.Of course, the C++ version would be faster still.
Wow – cool… that’s awesome. Thanks for the quick reply and working code.
I’m also quite new to C++, although in my case I’ve taken to it out of necessity.
I’ve found the armadillo library (provided by the RcppArmadillo package) to be helpful while I’m getting started. As an example:
#include
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec PB_ARMA(arma::vec const &x, arma::vec const &y){
arma::uword lx = x.n_rows;
arma::uword l = lx * (lx – 1) / 2;
int k = -1;
double b = arma::datum::nan;
double a = arma::datum::nan;
// Calculate and sort the components of the S Matrix
arma::vec S(l);
S.fill(arma::datum::nan);
for(int i = 0; i < lx-1; i++){
for(int j = i+1; j < lx; j++){
k += 1;
S(k) = (y(i) – y(j)) / (x(i) – x(j));
}
}
// Calculate the number of negative elements as well as the number of
// elements with finite values.
arma::uvec idxNeg = arma::find(S < 0);
arma::uvec idxFinite = arma::find_finite(S);
arma::uword K = idxNeg.n_rows / 2;
arma::uword N = idxFinite.n_rows;
// Remove the non-finite values (if any) and sort the result
S = arma::sort(S(idxFinite));
// Calculate the slope
if ((N % 2) == 1) {
b = S((N+1)/2 + K – 1);
} else {
b = sqrt(S(N/2 + K – 1) * S(N/2 + K));
}
// Make a vector aVec of the estimates of the intercept
arma::vec aVec = arma::sort(y – b * x);
// Calculate the median from the sorted values of aVec
a = arma::median(aVec);
// Report results
arma::vec coef = {a, b};
return coef;
}
I do agree that, in the absence of a major bottleneck, involving C++ in ones R code is probably overkill; outside of an R package anyway.
The lack of a basic element-wise addition operator for std::vector had me baffled as well, but I’ve also seen some reasonable arguments that ‘+’ could be interpreted as a concatenation operator (which had never occurred to me). So I guess the designers decided that the behaviour of ‘+’ on vectors was ambiguous and that other libraries would implement their own operators where the context made the intention obvious (e.g. a linear algebra library).
I can definitely skip sorting aVec, good catch; taking it one step further aVec becomes unnecessary and the parameter ‘a’ can be calculated as ‘a = arma::median(y – b*x);’.
A strange one, but vectors are a generic container that can contain types such as strings, stacks or queues (not to say that they should) and in these cases I understand why someone would get the impression that the addition operator implies concatenation.
If you can rely on vectorization, normally R is fast enough. Looping but using pre-allocated matrices is also very fast, sometimes with no time penalty. However, when I’ve had to implement algorithms that involve recursion or that cannot be vectorized properly, using Rcpp + Armadillo can give you huge gains. Last week I sped up code 200x moving to Rcpp, because it involved recursion plus some odd calculations.