Let’s briefly show some of the capabilities of cppally, from its custom C++ scalar and vectors, to using templates and concepts.
Registering R functions
To make a C++ function available to R we use the
[[cppally::register]] tag.
#include <cppally.hpp>
using namespace cppally;
[[cppally::register]]
void hello_world(){
print("Hello World!");
}After tagging our functions we want to make them available to R. To do that we have a few routes.
Registering C++ functions outside of a package context
After writing our hello world program in foo.cpp we can use
cpp_source() to compile and register the function to R.
cpp_source(file = "src/foo.cpp")Now the function is available in R
hello_world()
#> Hello World!Similarly we can use the helper cpp_eval to run simple
expressions and return the result without needing to include cppally.hpp
and register the function.
cpp_eval('print("Hello World Again!")')
#> Hello World Again!Note - For the rest of the examples it is assumed that the following code is always included beforehand.
Registering C++ functions inside a cppally-linked package
Since cppally is header-only, we can include the headers directly into our own package.
General steps to using cppally in a package
- Create package (if you haven’t already done so) using
usethis::create_tidy_package() - Run
cppally::use_cppally() - Run
cppally::document()
This will automatically add the necessary package content needed to
start working with cppally. For continuous development, use
cppally::load_all() to compile and register cppally tagged
functions, including our hello world function.
Note: We aim to integrate cppally registration into
the devtools framework for ease-of-use.
C++ types
cppally offers a rich set of R types in C++ that are NA-aware. This
means that common arithmetic and logical operations will account for
NA in a similar fashion to R.
logical scalar - r_lgl
cppally’s scalar version of logical, r_lgl
can represent true, false or NA.
#> [1] TRUE
#> [1] FALSE
#> [1] NA
Logical operators work just like in R
[[cppally::register]]
r_vec<r_lgl> lgl_ops(){
return make_vec<r_lgl>(
r_true || r_false, // true
r_true && r_false, // false
r_na || r_true, // true
r_na && r_true, // NA
r_na && r_false, // false
r_na || r_na, // NA
r_na && r_na // NA
);
}
lgl_ops()
#> [1] TRUE FALSE TRUE NA FALSE NA NAUsing r_lgl in if-statements
For type-safety reasons r_lgl cannot be implicitly
converted to bool except in if-statements where an error is
thrown if the value is NA.
DON’T do this:
[[cppally::register]]
void bad_lgl_print(r_lgl condition){
if (condition){
print("true");
} else {
print("false");
}
}
bad_lgl_print(TRUE)
#> true
bad_lgl_print(FALSE)
#> false
bad_lgl_print(NA) # Can't implicitly convert NA to bool
#> Error:
#> ! Cannot implicitly convert r_lgl NA to bool, please checkDO this:
[[cppally::register]]
void good_lgl_print(r_lgl condition){
if (is_na(condition)){
print("NA");
} else if (condition){
print("true");
} else {
print("false");
}
}
good_lgl_print(TRUE)
#> true
good_lgl_print(FALSE)
#> false
good_lgl_print(NA) # NA is handled explicitly so no issues
#> NAWe can also use r_lgl members is_true() and
is_false() which return bool and are
equivalent to R’s isTRUE() and isFALSE()
[[cppally::register]]
void also_good_lgl_print(r_lgl condition){
if (condition.is_true()){
print("true");
} else {
print("not true");
}
}
also_good_lgl_print(TRUE)
#> true
also_good_lgl_print(FALSE)
#> not true
also_good_lgl_print(NA) # Falls into 'not true' branch here as expected
#> not trueAll cppally scalar types are implemented as structs that contain the underlying C/C++ types as well as other member functions.
| cppally type | Description | Implicitly converts to |
|---|---|---|
r_lgl |
Scalar logical |
bool only in
if-statements |
r_int |
Scalar integer | int |
r_int64 |
Scalar 64-bit integer | int64_t |
r_dbl |
Scalar double | double |
r_str |
Scalar string | SEXP |
r_cplx |
Scalar double complex | std::complex<double> |
r_raw |
Scalar raw | unsigned char |
r_sym |
Symbol | SEXP |
r_date 1
|
Scalar date | double |
r_psxct |
Scalar date-time | double |
r_sexp |
Generic R object (SEXP)2 | SEXP |
NA values can be accessed via the template function
na<T>
C++ NA values and their R C API equivalents
| Type | Value | R C API Value | constexpr?3 |
|---|---|---|---|
r_lgl |
na<r_lgl>()/r_na
|
NA_LOGICAL |
Yes |
r_int |
na<r_int>() |
NA_INTEGER |
Yes |
r_int64 |
na<r_int64>() |
Not applicable | Yes |
r_dbl |
na<r_dbl>() |
NA_REAL |
Yes |
r_str |
na<r_str>() |
NA_STRING |
No |
r_cplx |
na<r_cplx>() |
Not applicable | Yes |
r_sym |
Not applicable | Not applicable | No |
r_sexp4 |
na<r_sexp>/r_null
|
R_NilValue |
No |
Vectors
cppally vectors are templated and can be thought of as containers of
scalar elements like r_int, r_dbl, etc.
We can create vectors like so
// Integer vector of size n
[[cppally::register]]
r_vec<r_int> new_integer_vector(int n){
r_vec<r_int> int_vctr(n, /*fill = */ r_int(0));
return int_vctr;
}
new_integer_vector(3)
#> [1] 0 0 0inline vectors
To create inline vectors, use make_vec<>
#> [1] 1.0 1.5 2.0 NA
We can add names on the fly with arg()
make_vec<r_dbl>(
arg("first") = 1,
arg("second") = 1.5,
arg("third") = 2,
arg("last") = na<r_dbl>()
)#> first second third last
#> 1.0 1.5 2.0 NA
In R a list is a generic vector, so cppally defines lists as
r_vec<r_sexp>, a vector of the generic type
r_sexp.
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
A list of all cppally vectors of length 0
[[cppally::register]]
r_vec<r_sexp> all_vectors(){
return make_vec<r_sexp>(
arg("logical") = r_vec<r_lgl>(),
arg("integer") = r_vec<r_int>(),
arg("integer64") = r_vec<r_int64>(), // Requires bit64
arg("double") = r_vec<r_dbl>(),
arg("character") = r_vec<r_str>(),
arg("character") = r_vec<r_str_view>(),
arg("raw") = r_vec<r_raw>(),
arg("date") = r_vec<r_date>(),
arg("date-time") = r_vec<r_psxct>(),
arg("list") = r_vec<r_sexp>()
);
}
all_vectors()
#> $logical
#> logical(0)
#>
#> $integer
#> integer(0)
#>
#> $integer64
#> integer64(0)
#>
#> $double
#> numeric(0)
#>
#> $character
#> character(0)
#>
#> $character
#> character(0)
#>
#> $raw
#> raw(0)
#>
#> $date
#> Date of length 0
#>
#> $`date-time`
#> POSIXct of length 0
#>
#> $list
#> list()Concepts and Templates
One of the most powerful features of C++20 are concepts. These allow users to write human-readable templates and constraints.
When writing your own templates, it is highly encouraged to place them in headers for cppally registration to work correctly.
Let’s practice by creating an absolute function in C++ using
templates and the RMathType concept.
template <RMathType T>
[[cppally::register]]
T cpp_abs(T x){
if (is_na(x)) return na<T>();
if (x < 0){
return -x;
} else {
return x;
}
}Works correctly for doubles
cpp_abs(-5)
#> [1] 5
cpp_abs(0)
#> [1] 0
cpp_abs(100)
#> [1] 100
cpp_abs(NA_real_)
#> [1] NAIt also works for integers
cpp_abs(-3L)
#> [1] 3
cpp_abs(NA_integer_)
#> [1] NAThe top-line template <RMathType T> declares a
template that encapsulates T, an RMathType - a
concept that contains r_lgl, r_int,
r_int64 and r_dbl
If x is NA then we immediately also return NA via
na<T>() which is a templated function that returns NA
of the input type T.
Without templates, writing C++ functions that accept flexible inputs is quite difficult because C++ is a statically-typed language. Usually one would write one absolute function for doubles and another for integers whereas here we don’t have to.
Notes on templates
To correctly register templates, the ‘[[cppally::register]]’ tag must always go above the function name.
Explicit instantiation (from R) is unfortunately not possible and template types must be deduced from supplied arguments.
You may get a cryptic compiler error like this
error: no matching function for call to 'foo()'
[]<typename T>() -> decltype(cpp_to_sexp(::foo())) {along with an equally cryptic note
This is because the parameter T cannot be automatically
deduced from any of the function inputs. Even though these kinds of
templates can be written with cppally, they cannot be exported to R.
An obvious and somewhat ugly workaround is to include a prototype argument that allows the template parameter to be deduced from.
// Return the default constructor result of RScalar types
template <RScalar T>
[[cppally::register]]
T scalar_default(T ptype){
return T();
}
scalar_default(integer(1)) # Default is 0L
#> [1] 0
scalar_default(numeric(1)) # Default is 0.0
#> [1] 0
scalar_default(character(1)) # Default is ""
#> [1] ""Exporting variadic templates are also not supported. The best
alternative is to use lists (r_vec<r_sexp>).
In the above example we used the RScalar concept which
includes all cppally scalar types (excluding r_sexp). For a
list of all cppally concepts, please see the Annex
Coercion
To coerce from one scalar to another we can use
as<T>
double_to_int(pi)
#> [1] 3
double_to_int(NA_real_)
#> [1] NAWe can also coerce from one vector type to another
to_int_vec(c(0, 1.5, NA))
#> [1] 0 1 NASince as<T> is extremely flexible, we can also
coerce from a scalar to a vector or vice versa
[[cppally::register]]
r_vec<r_sexp> coercions(){
r_dbl a(4.2);
r_vec<r_dbl> b = make_vec<r_dbl>(2.5);
return make_vec<r_sexp>(
as<r_vec<r_int>>(a),
as<r_int>(a),
as<r_int>(b),
as<r_dbl>(b)
);
}
coercions()
#> [[1]]
#> [1] 4
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 2
#>
#> [[4]]
#> [1] 2.5Strings
cppally provides the useful string type r_str
We can create R strings easily
#> [1] "hello"
To get a C or C++ string, use the members c_str() and
cpp_str() respectively
C string via c_str()
#> [1] "hello"
C++ string_view via cpp_str()
This can be converted into a std::string via its constructor
[[cppally::register]]
r_str str_concatenate(r_str x, r_str y, r_str sep){
std::string left = std::string(x.cpp_str());
std::string right = std::string(y.cpp_str());
std::string middle = std::string(sep.cpp_str());
std::string combined = left + middle + right;
return r_str(combined.c_str());
}
str_concatenate("hello", "how are you?", sep = ", ")
#> [1] "hello, how are you?"Symbols
Symbols have class r_sym and can be created directly
from a string literal
#> new_symbol
Or from a cppally string
#> symbol_from_string
Cached strings & symbols
cppally provides an efficient caching strategy for constructing cppally strings/symbols from string literals
cached_str<>
#> [1] "cached_string"
This initialises the string once, caches it (to R’s CHARSXP pool), and efficiently re-uses the cached string for each subsequent call.
We can cache symbols in a similar way
#> cached_symbol
Lists
r_sexp is generally interpreted as an “element of a
list” since lists are defined as r_vec<r_sexp>, a
vector that holds generic r_sexp elements.
new_list(0)
#> list()
new_list(3)
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> [[3]]
#> NULLThe problem with a class like r_sexp is that it is by
design generic and therefore difficult to work with in C++. To
disambiguate the actual type we can use visit_vector() or
visit_sexp() via a C++ lambda.
Example: using visit_vector() to resize
every vector to length n in-place
[[cppally::register]]
r_vec<r_sexp> resize_all(r_vec<r_sexp> x, r_size_t n){
r_size_t list_length = x.length();
for (r_size_t i = 0; i < list_length; ++i){
visit_vector(x.view(i), [&](auto vec) {
x.set(i, vec.resize(n));
});
}
return x;
}
# Resize to size 1
resize_all(list(1:5, letters), n = 1)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "a"When we pass a non-vector to visit_vector, it aborts and
explains that the input must be a vector
resize_all(list(mean_fn = mean), 1)
#> Error:
#> ! `x` must be a vector to be instantiated from an `r_sexp`visit_sexp
This allows us to visit more types than just vectors, including
factors, symbols and (soon to be implemented) data frames. When an
object’s type can’t be deduced into a distinct type, r_sexp
is returned.
Example: Same example as above but with
visit_sexp()
[[cppally::register]]
r_vec<r_sexp> resize_all2(r_vec<r_sexp> x, r_size_t n){
r_size_t list_length = x.length();
for (r_size_t i = 0; i < list_length; ++i){
visit_sexp(x.view(i), [&](auto vec) {
using vec_t = decltype(vec); // type of object `vec`
if constexpr (RVector<vec_t>){
x.set(i, vec.resize(n));
} else {
abort("Cannot resize a non-vector");
}
});
}
return x;
}
# Resize to size 1
resize_all2(list(1:5, letters), n = 1)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "a"
resize_all2(list(mean_fn = mean), n = 1)
#> Error:
#> ! Cannot resize a non-vectorFactors
We can create a factor via r_factors()
new_factor(letters)
#> [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y zIn cppally, like R, factors are not vectors and therefore do not
satisfy the RVector concept. To access the underlying integer codes
vector, use the public codes() member function
static_assert(!RVector<r_factors>);
[[cppally::register]]
r_vec<r_int> factor_codes(r_factors x){
return x.codes();
}
letter_fct <- new_factor(letters)
letter_fct |>
factor_codes()
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26Attributes
Attributes can be manipulated via functions defined in the attr namespace.
Example: Converting a list of samples to a data frame
[[cppally::register]]
r_vec<r_sexp> list_as_df(r_vec<r_sexp> x){
r_size_t n = x.length();
if (n_unique(x.lengths()) > 1){
abort("List must have vectors of equal length to be converted to a data frame");
}
r_vec<r_str> names(attr::get_attr(x, cached_sym<"names">()));
if (names.is_null()){
abort("list must have names to be converted to a data frame");
}
r_vec<r_sexp> out = shallow_copy(x);
int nrow = 0;
r_vec<r_int> row_names;
if (n > 0){
nrow = out.view(0).length();
row_names = make_vec<r_int>(na<r_int>(), -nrow);
}
attr::set_attr(out, cached_sym<"row.names">(), row_names);
attr::set_attr(out, cached_sym<"class">(), make_vec<r_str>("data.frame"));
return out;
}
set.seed(42)
norm_samples <- lapply(1:5, \(x) rnorm(10, mean = x))
names(norm_samples) <- paste0("sample_", 1:5)
list_as_df(norm_samples)
#> sample_1 sample_2 sample_3 sample_4 sample_5
#> 1 2.3709584 3.3048697 2.693361 4.455450 5.205999
#> 2 0.4353018 4.2866454 1.218692 4.704837 4.638943
#> 3 1.3631284 0.6111393 2.828083 5.035104 5.758163
#> 4 1.6328626 1.7212112 4.214675 3.391074 4.273295
#> 5 1.4042683 1.8666787 4.895193 4.504955 3.631719
#> 6 0.8938755 2.6359504 2.569531 2.282991 5.432818
#> 7 2.5115220 1.7157471 2.742731 3.215541 4.188607
#> 8 0.9053410 -0.6564554 1.236837 3.149092 6.444101
#> 9 3.0184237 -0.4404669 3.460097 1.585792 4.568554
#> 10 0.9372859 3.3201133 2.360005 4.036123 5.655648More useful attribute helpers
-
get_attrs()- Returns a list of attributes (possiblyr_vec<r_sexp>(r_null)) -
set_attrs()- Sets attributes to ones specified. Note: replaces any current attributes -
clear_attrs()- Removes all attributes -
set_attr()- Set a single attribute -
get_attr()- Get a single attribute -
inherits1()- Does object inherit class? -
inherits_any()- Does object inherit at least one of the specified classes? -
inherits_all()- Does object inherit all of the specified classes? -
modify_attrs()- Modifies current attributes but doesn’t remove any existing ones
Sugar functions
cppally also offers many useful and high-performance common functions in cppally/sugar
Example: n_unique() - fast calculation
of number of unique values.
template <RVector T>
[[cppally::register]]
r_int cpp_n_unique(T x){
return as<r_int>(n_unique(x));
}
library(bench)
x <- sample(1:100, 10^5, replace = TRUE)
mark(
base_n_unique = length(unique(x)),
cppally_n_unique = cpp_n_unique(x)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 base_n_unique 1.25ms 1.3ms 765. 1.38MB 20.4
#> 2 cppally_n_unique 264.62µs 266.2µs 3705. 0B 0More useful sugar functions
unique()- Like R’sunique()but with asortargument to return sorted unique valuesidentical()- A very fast identical function that works for scalars and vectors. Use this for exact equality of any scalar or vector.match()- Like R’s match, but also fastersequences()- Likesequence()but it returns a list of sequences and also works with doubles.order()- Like base R’s order but it internally uses a hybrid approach of ska sort, count sorting, quick sort, etc.-
make_groups()- An advanced function that returns a struct containing group IDs and number of groups (i.e number of unique group IDs). Thegroupsstruct contains the following members:- r_vec
ids - The cached group IDs - int n_groups - Number of unique groups
- bool ordered - Do the group IDs specify a sorting order, or are they by order-of-first-appearance?
- bool sorted - Are the group IDs sorted? (This can also be true for order-of-first-appearance IDs)
- r_vec
start() - Returns an r_vec (n_groups) vector of start locations of each unique group, signifying the location in the data at which each group initially appeared - r_vec
counts() - Returns an r_vec (n_groups) vector of frequency counts of each unique group - r_vec
order() - Returns an r_vec (ids.length()) order vector. This is a 0-indexed permutation vector that can be used to return sorted group IDs
- r_vec
recycle()- Recycles supplied vectors to common lengthr_vec<T>::subset()- Fast subsetting of vectors
Scalar math functions
There is a rich suite of math functions. Some examples include
min(), max(), round(),
log(), floor(), ceiling() and
more.
Stats sugar functions
Some statistical summary functions that are all very highly optimised for speed
Annex
Symbols in R-registered templates
r_sym is unsupported in templates when it’s part of a
template argument but is supported when the argument is explicitly an
r_sym.
All core cppally concepts
RIntegerType - Includes
r_lgl,r_int,r_int64RMathType - Includes
r_lgl,r_int,r_int64andr_dblRStringType - Includes
r_strandr_str_viewRScalar - Includes all cppally specific scalar types
RVal - Includes anything a cppally vector (
r_vec<>) can contain: RScalar +r_sexpRVector - Includes
r_vec<T>whereTis an RValRTimeType - Includes
r_dateandr_psxctRNumericType - Numeric types, including RMathType and RTimeType
RSortableType - Includes RNumericType and RStringType (strings can also be sorted)
RAtomicVector - A vector that contains RScalar elements
CppallyType - Any R type defined by R, including RVal, RVector, RFactor, RDataFrame, RSymbol
CppType - Anything that is not an CppallyType
CastableToRScalar - Anything that can be constructed or cast into an RScalar (which also includes RScalar)
CastableToRVal (questioning) - Anything that can be constructed or cast into an RVal. This is more complicated as it includes vectors, factors and data frames which can be cast to
r_sexp
Other useful type traits
-
unwrap_t- Returns the underlying unwrapped type -
as_r_scalar_t- Returns the equivalent RScalar type -
as_r_val_t- Returns the equivalent RVal type -
common_r_t- Returns the common RVal type between 2 types. Generally this is a hierarchy where the common type is the type that both values can be coerced to without complete loss of information
Accessing the underlying types and values
While it is generally recommended not to access the underlying
objects, you can do so with unwrap() which returns the
underlying C/C++ value. For example, unwrap(r_int(5)) will
return an int of value 5.
To access the underlying type, use unwrap_t<>
which always aligns with unwrap()
The main reason for wanting to access underlying values would likely
be optimisation and so unwrap() and unwrap_t
allow this to be done consistently.
Example: Summing a double vector using
r_vec<T>::data() member
[[cppally::register]]
double primitive_sum(const r_vec<r_dbl>& x){
// r_vec<T>::data_type always returns typename T
using data_t = typename std::remove_cvref_t<decltype(x)>::data_type;
using primitive_t = unwrap_t<data_t>;
primitive_t *p_x = x.data();
r_size_t n = x.length();
double sum = 0;
OMP_SIMD_REDUCTION1(+:sum)
for (r_size_t i = 0; i < n; ++i){
sum += p_x[i];
}
return sum;
}
x <- rnorm(10^5)
primitive_sum(x)
#> [1] -467.8787