Rcpp Notes From a Newbie
Intro
You probably already know that C++ has speed advantages compared to R, and Rcpp is the package for exposing that fast and efficient C++ code to R. For me, the motive to learn Rcpp (and C++) stems from xgboost and lightgbm - two prominent machine learning models used in Kaggle competitions. At their core, they’re both written in C++. However, they both have R and Python interfaces which, in my opinion, is a huge part of their popularity. For years I’ve had a nagging desire to understand how this works and how I can build my own cross-platform models. Here I outline some notes as I get my hands dirty with C++ and Rcpp.
Prerequisites
- Learn R
- Learn C++ (I found Frank Mitropoulous’s course on Udemy fantastically helpful.)
Setup
I’ll assume you have R, RStudio, Rcpp, and a C++ compiler installed…
- Open a fresh R session and run
Rcpp::evalCpp("1+1")
. If R doesn’t return 2 in the console, something’s wrong. - In RStudio, File > New File > C++, and save as cpp_functions.cpp
- In RStudio, File > New File > R Script, and save as r_functions.R
- By default, RStudio prefills cpp_functions.cpp with starter code and leaves r_functions.R empty Let’s modify these files as follows
// rcpp_functions.cpp
//--- Load header files --------------------------------------
#include <Rcpp.h>
# r_functions.R
#---- Load Rcpp ---------------------------------------------
library(Rcpp) # version 1.0.0
1. Hello World
Create a C++ function hello_world()
that simply prints “Hello World” and call it from R.
// cpp_functions.cpp
#include <Rcpp.h>
//---- 1. Hello World ---------------------------------------------
// Print "Hello World!" to the console
// [[Rcpp::export]]
void hello_world(){
Rcpp::Rcout << "Hello World!" << std::endl;
}
# r_functions.R
#---- Load Rcpp ---------------------------------------------
library(Rcpp) # version 1.0.0
#---- Compile cpp_functions.cpp ---------------------------------------------
sourceCpp('cpp_functions.cpp')
#---- 1. Hello World ---------------------------------------------
# Create a function hello_world() that prints "Hello World" to the console
hello_world()
# What does hello_world() return?
foo <- hello_world()
foo # NULL
A few notes about this…
- In order to expose our C++
hello_world()
function to R, we have to put the special tag// [[Rcpp::export]]
just above the function definition. - We use
Rcpp::Rcout
instead of the more commonstd::cout
as reccomended by Dirk Eddelbuettel (Rcpp’s primary author and maintainer). - In r_functions.R we use Rcpp’s
sourceCpp()
function to compile our C++ code and expose it to R.
2. User Input
Create a function hello_master()
that prompts the user with “Enter your name”. After entering your name and hitting enter, “Hello Master your_name!” should be printed to the console.
// cpp_functions.cpp
//---- 2. Hello Master ---------------------------------------------
// Prompt the user to enter their name
// Print "Hello Master <user's name>!"
// [[Rcpp::export]]
void hello_master(){
Rcpp::Environment base = Rcpp::Environment("package:base");
Rcpp::Function readline = base["readline"];
// Prompt the user to enter a string
std::string mystring = Rcpp::as<std::string>(readline("Enter your name: "));
Rcpp::Rcout << "Hello Master " << mystring << "!" << std::endl;
}
# r_functions.R
#---- 2. User Input ---------------------------------------------
hello_master()
Notes
- In this example, C++ is actually calling base R’s
readline()
function to get the user input
3. Add Numbers
Create a function add_numbers(a, b)
that adds two numbers a and b.
// cpp_functions.cpp
//---- 3. Add Numbers ---------------------------------------------
// Add two numbers and return the result
// [[Rcpp::export]]
double add_numbers(double a, double b){
// Add two numbers and return the result
return a + b;
}
# r_functions.R
#---- 3. Add Numbers ---------------------------------------------
# Add two numbers and return the result
add_numbers(a = 1, b = 1) # 2
add_numbers(1L, 1L) # 2
class(add_numbers(1L, 1L)) # numeric, not integer!
add_numbers(1, NA_integer_) # NA
add_numbers(1) # Error in add_numbers(1) : argument "b" is missing, with no default
Notes
- When we add two integers, Rcpp casts them to doubles automatically and returns a double
- When we try adding a number and
NA_integer_
, we get backNA_real_
- We get a nice error message if we forget one of the arguments
4. Random Number Generation
Create a function roll_die()
that returns a random integer between 1 and 6.
// cpp_functions.cpp
//---- 4. Random Number Generation ---------------------------------------------
// Simulate rolling a fair die
// [[Rcpp::export]]
int roll_die(){
// Returns a random integer between 1 and 6
// Create a vector of possible values
Rcpp::IntegerVector vals = Rcpp::IntegerVector::create(1, 2, 3, 4, 5, 6);
// Roll the die
int result = Rcpp::as<int>(Rcpp::sample(vals, 1));
// Return the result
return result;
}
# r_functions.R
#---- 4. Random Number Generation ---------------------------------------------
# Simulate rolling a fair die
roll_die() # 3
roll_die() # 1
roll_die() # 4
set.seed(2016); roll_die() # 2
set.seed(2016); roll_die() # 2
set.seed(2016); roll_die() # 2
Notes
- Rcpp’s
sample()
pays attention to the random seed in R so that we can get reproducible results
5. Function Prototypes
- Create a function called
hi_mom()
that prints “hi mom” to the console - Create a function called
hi_dad()
that prints “hi dad” to the console - Modify
hi_mom()
so that after printing “hi mom”, the function randomly decides whether to callhi_dad()
with 50% probability - Modify
hi_dad()
so that after printing “hi dad”, the function randomly decides whether to callhi_mom()
with 50% probability
// cpp_functions.cpp
//---- 5. Function Prototypes ---------------------------------------------
void hi_mom();
void hi_dad();
// [[Rcpp::export]]
void hi_mom(){
// Print "hi mom" to the console
// Then randomly decide whether to call hi_dad()
Rcpp::Rcout << "hi mom" << std::endl;
if(Rcpp::runif(1)[0] > 0.5) hi_dad();
}
// [[Rcpp::export]]
void hi_dad(){
// Print "hi dad" to the console
// Then randomly decide whether to call hi_mom()
Rcpp::Rcout << "hi dad" << std::endl;
if(Rcpp::runif(1)[0] > 0.5) hi_mom();
}
# r_functions.R
#---- 5. Function Prototypes ---------------------------------------------
hi_mom()
hi_dad()
Notes
- If we exclude the function prototypes
void hi_mom()
andvoid hi_dad()
, then when wesourceCpp('cpp_functions.cpp')
we get the erroruse of undeclared identifier 'hi_dad'
. When C++ is compiling the functionhi_mom()
, it sees that the function calls another function namedhi_dad
, but at that moment of compilation, the functionhi_dad
doesn’t exist (since its declared belowhi_mom
). So, the prototypesvoid hi_mom()
andvoid hi_dad()
simply tell the C++ compiler these functions exist even though we haven’t defined them yet.
6. Pass by Value vs Reference
Create functions add_one(int x)
, add_two(double x)
, add_three(Rcpp::IntegerVector x)
, … with slightly different implementations. Observe how, if we call these functions from R, some of them actually change the value of the variable we pass into them.
// cpp_functions.cpp
//---- 6. Pass by value/reference ---------------------------------------------
// [[Rcpp::export]]
int add_one(int x){
x = x + 1;
return x;
}
// [[Rcpp::export]]
int add_two(int &x){
x = x + 2;
return x;
}
// [[Rcpp::export]]
Rcpp::IntegerVector add_three(Rcpp::IntegerVector x){
x = x + 3;
return x;
}
// [[Rcpp::export]]
Rcpp::IntegerVector add_four(Rcpp::IntegerVector x){
x = clone(x);
x = x + 4;
return x;
}
# r_functions.R
#---- 6. Pass by value/reference ---------------------------------------------
x <- 1L; cbind(add_one(x), x) # 2 1
x <- 1L; cbind(add_two(x), x) # 3 1
x <- 1L; cbind(add_three(x), x) # 4 4
x <- 1; cbind(add_three(x), x) # 4 1 (type conversion)
x <- 1L; cbind(add_four(x), x) # 5 1
Notes
- By default, Rcpp passes objects from R to C++ by reference, so any changes you make to the input parameter in C++ should be reflected in R. This is why
add_three(x = 1L)
results in changing the value ofx
from 1L to 4L. However, if Rcpp has to coerce the input from one type to another, then the original object will not be modified. When we calladd_one(x = 1L)
, Rcpp convertsx
from an IntegerVector to a plain old int, thusx
in R’s environment is not modified. Similarly,add_two(x = 1L)
andadd_three(x = 1)
both result in type changes. Lastly,add_four(x = 1L)
uses Rcpp’sclone()
function to force a copy so that the originalx
variable is not modified.
7. Mean of a Vector
Create a function my_mean(x, na_rm = false)
that returns the mean of a vector. my_mean()
should behave just like base R’s mean()
// cpp_functions.cpp
//---- 7. Mean of a Vector ---------------------------------------------
// [[Rcpp::export]]
double my_mean(Rcpp::NumericVector x, bool na_rm = false){
double s = 0;
double count = 0;
double val_i = 0;
bool hasPosInf = false;
bool hasNegInf = false;
// Loop through x
for(int i = 0; i < x.size(); i++){
val_i = x[i];
// Check if val_i is NaN...
if(R_IsNaN(val_i)){
if(na_rm){
continue;
} else{
return R_NaN;;
}
}
// Check if val_i is NA...
if(R_IsNA(val_i)){
if(na_rm){
continue;
} else{
return R_NaReal;
}
}
// Check if val_i is +Inf
if(val_i == R_PosInf){
hasPosInf = true;
if(hasNegInf) return R_NaN;
}
// Check if val_i is -Inf
if(val_i == R_NegInf){
hasNegInf = true;
if(hasPosInf) return R_NaN;
}
// Update the current sum and count
s += x[i];
count++;
}
// Special cases
if(hasPosInf) return R_PosInf;
if(hasNegInf) return R_NegInf;
if(count == 0) return R_NaN;
// Calculate the mean
double result = s/count;
return result;
}
# r_functions.R
my_mean(c(1, 2, 3)) # 2
my_mean(c(1, NA, 3)) # NA
my_mean(c(1, NaN, 3)) # NaN
my_mean(c(1, NA, 3), na_rm = TRUE) # 2
my_mean(c(1, NaN, 3), na_rm = TRUE) # 2
my_mean(NA_real_, na_rm = TRUE) # NaN
my_mean(c(1, Inf)) # Inf
my_mean(c(1, -Inf)) # -Inf
my_mean(c(1, -Inf, Inf)) # NaN
Notes
- We can’t use a dot “.” in C++ variable names, so I’ve changed
na.rm
tona_rm
- Use
R_IsNaN()
to check check for NaN - Use
R_IsNA()
to check check for NA - The constants
R_NaN
,R_NaInt
,R_NaReal
,R_NaString
,R_PosInf
, andR_NegInf
correspond to R’sNaN
,NA_integer_
,NA_real_
,NA_character_
,Inf
, and-Inf