cppally is a high-performance header-only library providing a rich C++20 API for advanced R data manipulation. Leveraging C++20 Concepts, custom R-based classes, templated functions and Single-Instruction-Multiple-Data (SIMD) vectorisation, cppally enables type-safety, performance, flexible templates and readable code.
For info on using cppally see Getting started with cppally
Acknowledgements
I first want to thank the authors and contributors of the fantastic cpp11 R package, without which I would not have been inspired to write this package. I’d also like to thank the authors and contributors of Rcpp for developing this ecosystem that has laid much of the groundwork for C++ and R integration.
Installation
Install the CRAN release
install.packages("cppally")or the development version
pak::pak("NicChr/cppally")Basic usage: Sum using a C++ template
template <RMathType T>
[[cppally::register]]
r_dbl cpp_sum(r_vector<T> x){
r_size_t n = x.length();
r_dbl out(0);
for (r_size_t i = 0; i < n; ++i){
out += x.get(i);
}
return out;
}
Register the C++ function to R
cpp_source(code = '
#include <cppally.hpp>
using namespace cppally;
template <RMathType T>
[[cppally::register]]
r_dbl cpp_sum(r_vector<T> x){
r_size_t n = x.length();
r_dbl out(0);
for (r_size_t i = 0; i < n; ++i){
out += x.get(i);
}
return out;
}
')
cpp_sum(1:5)
#> [1] 15NA values are handled like in R. In this case NA is returned if the vector contains one or more NA values.
cpp_sum(c(1, NA, 3))
#> [1] NADesign choices
Templates
cppally makes heavy use of templates for powerful generic programming. While this offers a flexible framework for writing generic functions, it comes at the cost of slower compile times and larger binary sizes.
Users can write and optionally register their own templates (to R). There are two main limitations to be aware of. The first is that templates must be written in header files if they are to be used across multiple compilation units. The other big limitation is that template specialisations cannot be called from R, so when calling C++ template functions from R, we always rely on automatic deduction from the function inputs. There is a workaround discussed in the main vignette Getting started with cppally
Scalar R types and custom methods
cppally offers R-based C++ scalar types that are NA aware. To achieve this multiple methods such as binary arithmetic operators have been written to ensure NA is propagated correctly. While every attempt has been made to make this as fast as possible, it adds some overhead and in some cases can prevent effective vectorisation (via e.g. SIMD instructions). If you find that this is slowing things down too much you can work with the underlying C/C++ types using unwrap_t<> and unwrap().
Automatic protection
Like the excellent cpp11 package, cppally also handles automatic protection for R objects. For more info see Automatic Protection
ALTREP
For performance reasons, ALTREP materialisation is eager by default, which means that ALTREP vectors are materialised on construction. To preserve ALTREP compact representations, one can enable the package-wide ‘CPPALLY_PRESERVE_ALTREP’ flag. This can be done through cppally::use_preserve_altrep_flag() or cppally::cpp_source(..., preserve_altrep = TRUE). You can also manually add the ‘-DCPPALLY_PRESERVE_ALTREP’ flag to Makevars.
Using both cppally and the R C API
Using the R C API alongside cppally is strongly discouraged for the following reasons.
If R throws an error via Rf_error() a ‘longjmp’ will occur, meaning C++ destructors won’t run and memory that should have been released will not be released.
Furthermore, due to the way cppally caches vector names, using R C API functions like Rf_setAttrib() will set the vector’s names without informing cppally, leading to synchronisation issues. cppally needs to keep the names cache in sync with the R names attribute and the only way it can do that is by detecting changes to the names via cppally::r_vector::set_names() or cppally::attr::set_attr() (or cppally::set_old_names()).
Attributes
Attribute manipulation is possible and helpers can be found in the attr namespace via cppally::attr::
Views
To avoid the overhead associated with automatic protection entirely, one can use view types like e.g. r_str_view, a non-owning class for R strings. For more info on views see Automatic Protection
Opt-in copy-on-modify
Copy-on-modify can be enabled via cppally::use_copy_on_modify() or by setting the CPPALLY_COPY_ON_MODIFY Makevars flag directly. When this is enabled, all in-place modifications check that the object being modified isn’t referenced or owned by another object. If it is referenced, a copy is taken first before modifying, otherwise it directly modifies.
This safety check is inherently single-threaded which effectively disables almost all parallelisation. Enable this if prevention of accidental modification is a high concern. On the other hand, leaving it disabled may be preferable when performance is important.
By default, copy-on-modify is disabled and hence all element setting is done in-place via r_vector::set(). It is up to the user to ensure that a fresh vector is created before further manipulation or that it’s safe to modify the existing vector.
Lossy coercion
Any coercion that results in complete information loss is an error (partial is allowed, e.g. double -> int).
For example, string -> int may not be possible without complete information loss
#> Error:
#> ! Implicit NA coercion detected from r_str to r_int, please ensure data can be coerced without complete loss of informationThis is in contrast to R which returns an NA with a warning
as.integer("a")
#> Warning: NAs introduced by coercion
#> [1] NAThe benefit of cppally’s approach is that when registering C++ functions to R, inputs can be supplied flexibly without unexpected behaviour.
Let’s say you have a function foo that expects an r_int but you give it an r_dbl without realising - this will implicitly coerce to r_int without throwing an error.
foo(1.2345)
#> [1] 1The double 1.2345 was implicitly converted to 1, an example of partial lossy coercion. cppally allows this.
What cppally doesn’t allow is total lossy coercion, which can result in ambiguity. Take the following example of counting the occurrence of a value in a vector.
x <- c(rep(0, 20), rep(1, 30), rep(NA, 40))
count_val(x, 0)
#> [1] 20
count_val(x, 1)
#> [1] 30
count_val(x, NA)
#> [1] 40
# value can also implicitly coerce a string to a double
count_val(x, "0")
#> [1] 20
count_val(x, "1")
#> [1] 30So far so good. But what happens if we implicitly coerced to NA and then counted occurrences?
count_val(x, as.double("1. 0")) # Wrong
#> Warning in count_val(x, as.double("1. 0")): NAs introduced by coercion
#> [1] 40as.double("1. 0") was coerced to NA, count_val() then counted the number of NA values and returned 40, even though we didn’t ask for a count of NA, so the result should have been 0.
This is the main issue with allowing total lossy coercion to NA - count_val() can’t distinguish between a true NA and an NA that has been produced from a lossy coercion.
With cppally this ambiguity is impossible.
count_val(x, "1. 0")
#> Error:
#> ! Implicit NA coercion detected from r_str to r_dbl, please ensure data can be coerced without complete loss of information64-bit integers
On the C++ side, 64-bit integers are fully supported, including vectors. To return 64-bit integers to R we need the bit64 package to be loaded. cppally delegates the handling of 64-bit integer vectors to bit64 by marking them with the “integer64” class.
as_int64(.Machine$integer.max) + 1L
#> integer64
#> [1] 2147483648Please note that other signed 64-bit integer types like int64_t, R_xlen_t and cppally’s r_size_t will convert to 64-bit integer vectors when returned to R.
Using R’s NULL
The cppally version of R’s R_NilValue is r_null which is of type r_sexp. In an attempt to avoid the use of additional meta-programming tactics to deal with r_null, we allow vectors to be able to contain r_null which makes programming with R attributes easier. This means r_vector<T> objects can be r_null. To detect this, use the is_null() member function.
Useful Makevars flags
Because cppally is a template-heavy library, binary sizes can sometimes get large. This is primarily an issue on windows which will throw a compiler error if a single .o file gets too big. In this case you may want to consider adding the following flag to Makevars.win
To benefit from OMP SIMD vectorisation and parallelisation, it is recommended to add these flags to Makevars
And these flags to Makevars.win (including the windows specific binary size flags)
C++20 and RStudio
At the moment C++20 is not fully supported via RStudio, so I would recommend using vscode with the C/C++ for Visual Studio Code extension. Positron may also be an option but since I haven’t used it, I can’t speak to its capabilities.
While I personally use vscode for C++ code and RStudio for R code and package development, you can also use vscode (or Positron) for both these things, but again, I haven’t personally used vscode for writing R code so I can’t say much about it.
To get vscode’s intellisense to work correctly, you will likely need to set some parameters in c_cpp_properties.json.
My json file looks like this:
{
"configurations": [
{
"name": "Win32",
"includePath": [
"${workspaceFolder}/**",
"${workspaceFolder}/src",
"${workspaceFolder}/inst/include",
"C:/Program Files/R/R-4.*/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/cpp11/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/Rcpp/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/cppally/include"
],
"defines": [
"_DEBUG",
"UNICODE",
"_UNICODE",
"STRICT_R_HEADERS"
],
"compilerPath": "C:\\rtools45\\x86_64-w64-mingw32.static.posix\\bin\\g++.exe",
"cppStandard": "gnu++20",
"intelliSenseMode": "gcc-x64"
}
],
"version": 4
}As your R installation path may differ, you can find the exact path with
normalizePath(Sys.getenv("R_HOME"), winslash = "/")Your R libraries can be found with
The compiler bundled with RTools is likely found here
cxx <- system2(file.path(R.home("bin"), "R"),
c("CMD", "config", "CXX20"), stdout = TRUE)
cxx_bin <- trimws(strsplit(cxx, " ")[[1]][1])
Sys.which(cxx_bin)
#> g++
#> "C:\\rtools45\\X86_64~1.POS\\bin\\G__~1.EXE"Once you have both paths, set compilerPath and the R include path in c_cpp_properties.json accordingly.