Unofficial Introduction To extendr (2): Type Conversion Between R and Rust

Rust extendr

Integrate R and Rust with extendr

Hiroaki Yutani true
06-14-2021

extendr is a project that provides an interface between R and Rust. In the last post, I explained about how to create an R package with extendr briefly. This time, we’ll walk though how to handle various R types.

Vector

Let’s start with the last example in the last post.

#[extendr]
fn add(x: i32, y: i32) -> i32 {
    x + y
}

While this works perfectly fine with a single value, this fails when the length is more than one.

add(1:2, 2:3)
Error in add(1:2, 2:3): Input must be of length 1. Vector of length >1 given.

This is very easy to fix. In Rust, we can use Vec<T> to represent a vector of values of type T.

// I don't explain much about the Rust code this time, but, for now, please don't
// worry if you can't understand what it does at the moment. Probably it's not
// very important to understand this post. Move forward.

#[extendr]
fn add2(x: Vec<i32>, y: Vec<i32>) -> Vec<i32> {
    x.iter().enumerate().map(|(i, x)| x + y[i]).collect()
}
add2(1:2, 2:3)
[1] 3 5

Easy!

Wait, didn’t you say we can’t do this…!?

Some of you might remember, in this post, I wrote

We cannot simply pass a variable length of vector

from R to Rust.

Yeah, it’s true it was too difficult because I was struggling to do it via FFI! There’s no metadata available about the length or the structure of the data by default. But, with extendr, we can seamlessly access these metadata via R’s C API. So, in short, extendr is the game changer.

&[T]

If you are already familiar with Rust, you might feel using Vec<T> as arguments looks a bit weird. In fact, the document of Vec<T> says:

In Rust, it’s more common to pass slices as arguments rather than vectors when you just want to provide read access. The same goes for String and &str.
(https://doc.rust-lang.org/std/vec/struct.Vec.html#slicing)

Yes, you can use &[T] instead of Vec<T>, and this seems to matter on the performance slightly. If you are familiar with Rust to the extent that you know the difference between &[T] and Vec<T> (confession: I’m not!), you can should use &[T] instead. Otherwise, Vec<T> just works.

#[extendr]
fn add2_slice(x: &[i32], y: &[i32]) -> Vec<i32> {
    x.iter().enumerate().map(|(i, x)| x + y[i]).collect()
}
add2_slice(1:2, 2:3)
[1] 3 5

Please note that this isn’t the reference to the original R object, just that to the copied values. If you really want no copying, you should use the “proxy” types, which I’ll cover in the next post.

NA

One more caveat about add() is that this cannot handle a missing value, NA.

add(1L, NA)
Error in add(1L, NA): unable to convert R object to primitive

In Rust, we can use Option<T> to represent an optional, or possibly missing, value.

// pattern match is one of the most powerful things in Rust, btw!

#[extendr]
fn add3(x: Option<i32>, y: Option<i32>) -> Option<i32> {
    match (x, y) {
        (Some(x), Some(y)) => Some(x + y),
        _ => NA_INTEGER
    }
}

This function can handle NA.

add3(1L, 2L)
[1] 3
add3(1L, NA)
[1] NA

It might be safe to always use Option since there’s always possibility that R value can be NA by nature. But, we might want to choose non-Option version to avoid the overhead (c.f. How much overhead is there with Options and Results? - The Rust Programming Language Forum), so it depends.

Primitive types

Okay, let’s learn about the primitive types at last. Here’s the corresponding table of R types and Rust types. We don’t have the direct equivalent of factor and complex here, but let’s talk about it later.

R Rust
integer i32
numeric f64
logical bool
character String &str
factor -
complex -

integer and numeric

integer and numeric can mainly be converted into i32 and f64 respectively. I used “mainly” because it’s not that strict. They both can be converted into either of:

So, in other words, if you don’t want to prevent from numeric values are coerced into integers, you’ll need to check the types by yourself.

logical

logical is translated from/into bool. That’s all.

character

character is a bit tricky in that you can convert it to either of String and &str. You’ll probably have to scratch your head to understand the concept of “lifetime” to choose the proper one (confession: I still don’t understand it). But, in short,

If you are not familiar with Rust yet, I recommend you to start with String. String is copied around so you might have unnecessary overhead, but it’s generally easier to handle because we need to think about the lifetimes less frequently.

factor

To put things simpler, until this point, I deliberately chose the cases when we have the corresponding types in Rust’s side. But, factor isn’t the case. It cannot be directly converted into a simple Rust type (at least at the moment). Instead, it can be cast into StrItr. StrItr is a “proxy” to the underlying data on R’s side.

I’ll try explaining this in another post, but keep in mind that extendr provides that “proxy”-type of interface as well as the simple conversion to Rust’s primitive types.

list

A list can be converted into HashMap<String, Robj>. Robj is also a “proxy,” which contains arbitrary R data.

#[extendr]
fn print_a(x: HashMap<String, Robj>) {
    println!("{:?}", x.get("a"));
}
print_a(list(a = 1, b = 2))
print_a(list(b = 2))

r! is a macro to create an R object from a Rust expression, by the way.

Robj?

As a sneak peak of the next post, let’s take a look at the usage of Robj.

So far, I created only functions that accepts just one type. What if we want to create a function that accepts multiple types of arguments? In this case, we can create a function that takes Robj as its argument and convert it by ourselves. Robj has many methods as_XXX() to convert to (or, more precisely, extract and copy the value of R object, and turn it into) a type. Here, let’s use as_integer() to generate Option<i32> .

#[extendr]
fn int(x: Robj) -> Option<i32> {
    x.as_integer()
}
# integer
int(1L)
[1] 1
# not integer-ish
int("foo")
[1] NA

What’s next?

In this post, I focused mainly the Rust’s side of the type ecosystem. Next, I probably need to write about more R-ish things like Function or Symbol , which I need some time to understand correctly. Stay tuned…

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".