# Some more notes about using Rust code in R packages

## September 15, 2020 by Hiroaki Yutani

When I first tried to use Rust code within R package five years ago, it was like crawling in the dark and I wasted several days just to find I didn’t understand anything. But, now we have Using Rust code in R packages, a great presentation by Jeroen Ooms. It taught me almost everything! But still, I needed to learn myself some more things for my purpose. Let me leave some notes about those.

## Passing a string from R to Rust

hellorust covers how to pass a string from Rust to R, but not the vice versa. I learned this from the code on clauswilke/sinab.

For example, let’s consider an improved version of hellorust::hello() that takes an argument name to say hello to.

### R code

Let’s name it hello2.

hello2 <- function(name) {
.Call(hello_wrapper2, name)
}


### C code

hello_wrapper2 would be like the code below. STRING_ELT(x, i) takes i-th element of a character vector x, and Rf_translateCharUTF8() converts it to a pointer to the string encoded in UTF-8.

SEXP hello_wrapper2(SEXP name){
char* res = string_from_rust2(Rf_translateCharUTF8(STRING_ELT(name, 0)));
return Rf_ScalarString(Rf_mkCharCE(res, CE_UTF8));
}


### api.h

The string is passed as const char *.

char * string_from_rust2(const char *);


### Rust code

The function takes the string as *const c_char. If we process the string in Rust code, we need to create a String. This is done by std::ffi::CStr::from_ptr(). CStr is a representation of a borrowed C string, and can be converted to String by to_string() or to_string_lossy(). Since this is an unsafe operation, it needs to be wrapped with unsafe.

use std;
use std::ffi::{CStr, CString};
use std::os::raw::c_char;

// Utility function to convert c_char to string
fn c_char_to_string(c: *const c_char) -> String {
unsafe { CStr::from_ptr(c).to_string_lossy().into_owned() }
}

#[no_mangle]
pub extern fn string_from_rust2(c_name: *const c_char) -> *const c_char {
let name = c_char_to_string(c_name);

let s = CString::new(format!("Hello {} !", name)).unwrap();
let p = s.as_ptr();
std::mem::forget(s);
p
}


### Result

You can view the diff here:

https://github.com/r-rust/hellorust/commit/a42346c728a408fb1b2e6e7522082e19ec5b8a04

## Passing a vector from Rust to R, or vice versa

(Update: this code is incomplete, please read the next section as well)

It took me some time to figure out how to handle arrays. I’m still not confident if I understand this correctly, but let me try to explain…

We cannot simply pass a variable length of vector to FFI because the length is not known. So, what we need to do is obvious; pass the data with the length at the same time. To do this, we need to define the same struct both in C and in Rust.

Suppose we want to implement a function that takes one double vector and reverse it.

In api.h, let’s define a struct named Slice:

typedef struct
{
double *data;  // since we want to process REALSXP here, the data type is double
uint32_t len;
} Slice;


and in Rust code define the same one. #[repr(C)] means “do what C does.” This is needed to match the alignment of the field with C.

use std::os::raw::{c_double, c_uint};

#[repr(C)]
pub struct Slice {
data: *mut c_double,
len: c_uint,
}


### R code

The R code is pretty simple.

rev <- function(x) {
x <- as.double(x)
.Call(rev_wrapper, x)
}


### C code

We need to allocate a REALSXP vector and copy the result into it.

SEXP rev_wrapper(SEXP x){
Slice s = {REAL(x), Rf_length(x)};
Slice s_rev = rev_slice(s);

SEXP out = PROTECT(Rf_allocVector(REALSXP, s_rev.len));
for (int i = 0; i < s_rev.len; i++) {
SET_REAL_ELT(out, i, s_rev.data[i]);
}
UNPROTECT(1);

return out;
}


### Rust code

To convert the Slice into Rust’s slice, we can use std::slice::from_raw_parts_mut. This is unsafe operation, so it needs to be wrapped with unsafe.

slice and vector can be converted into an unsafe pointer by as_mut_ptr().

#[no_mangle]
pub extern fn rev_slice(s: Slice) -> Slice {
// convert from Slice to Rust slice
let s = unsafe { std::slice::from_raw_parts_mut(s.data, s.len as _) };

let mut v = s.to_vec();
v.reverse();
let len = v.len();

let v_ptr = v.as_mut_ptr();
std::mem::forget(v);

Slice {
data: v_ptr,
len: len as _,
}
}


### Result

You can view the diff here:

https://github.com/r-rust/hellorust/commit/e278d1541301ae18446bf1149a15d7aed868bd51

## Update: free the Rust-allocated memory

The code above works, but I noticed the memory is never freed. Yes, that’s because I forgot to free it. This was my nice lesson to learn that Rust is not always automatically saving me from doing silly things :P

Of course we can free it, but it’s a bit tricky. Since Slice is allocated by Rust, it needs to be freed by Rust (c.f. How to return byte array from Rust function to FFI C? - help - The Rust Programming Language Forum). (IIUC, if the length is known in advance, it might be good idea to allocate on C’s side and pass it to the Rust, as the answer on the forum above suggests. rev() is the case, but let me explain the different one for now…)

### Rust code

Let’s define a Rust function to free the memory. Box::from_raw() constructs a Box, a pointer for heap allocation, from the raw pointer. After that, the raw pointer is owned by the box, which means it’s now Rust’s role to destruct it and free the memory.

#[no_mangle]
pub extern "C" fn free_slice(s: Slice) {
// convert to Rust slice
let s = unsafe { std::slice::from_raw_parts_mut(s.data, s.len as _) };
let s = s.as_mut_ptr();
unsafe {
Box::from_raw(s);
}
}


I still don’t understand how to use Box properly, but it seems Sized structs can be handled simpler using Box in the argument: https://doc.rust-lang.org/std/boxed/index.html#memory-layout

### C code

Call the function above from C to free the memory as soon as it’s no longer in use.

// Need to include to use memcpy()
#include <string.h>

// ...snip...

SEXP rev_wrapper(SEXP x){
Slice s = {REAL(x), Rf_length(x)};
Slice s_rev = rev_slice(s);

SEXP out = PROTECT(Rf_allocVector(REALSXP, s_rev.len));
memcpy(REAL(out), s_rev.data, s.len * sizeof(double));
free_slice(s_rev); // free!!!
UNPROTECT(1);

return out;
}


### Result

The full diff is here:

https://github.com/r-rust/hellorust/commit/97b3628b4a66eae9e25898a79ebf20fa59741063

### Can I do zero-copy?

Copying memory to memory is not very cool, but it just works. I don’t know any nicer way yet. Apache Arrow seems a overkill for this simple usage, but will I need it in future…? Or flatbuffer? This seems a battle for another day, so I’ll stop here for now.

## Precompiled binary for Windows

As you might already notice, hellorust’s installation instruction for Windows is a bit long. But, do I really need to require the users to install cargo, just to compile my useless package? Now that we have GitHub Actions CI, maybe preparing a precompiled binary is a choice.

Here’s the YAML I’m using to compile on windows runners and attach the binary on the releases (This creates a two separate releases for x86_64 and i686, which might be improved…).

on:
push:
tags:
- 'windows*'

name: Build Windows

jobs:
build:
strategy:
matrix:
target:
- x86_64
- i686

name: build-${{ matrix.target }}-pc-windows-gnu runs-on: windows-latest steps: - name: Checkout sources uses: actions/checkout@v2 - name: Install stable toolchain uses: actions-rs/toolchain@v1 with: toolchain: stable target:${{ matrix.target }}-pc-windows-gnu
profile: minimal
default: true

- name: Run cargo build
uses: actions-rs/cargo@v1
with:
command: build
args: --release --target=${{ matrix.target }}-pc-windows-gnu --manifest-path=src/string2path/Cargo.toml - name: List files run: ls ./src/string2path/target/${{ matrix.target }}-pc-windows-gnu/release/
shell: bash

- name: Create Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with: tag_name:${{ github.ref }}-${{ matrix.target }} release_name: Release${{ github.ref }}-${{ matrix.target }} draft: false prerelease: true - name: Upload Release Asset id: upload-release-asset uses: actions/upload-release-asset@v1 env: GITHUB_TOKEN:${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create_release.outputs.upload_url }} asset_path: ./src/string2path/target/${{ matrix.target }}-pc-windows-gnu/release/libstring2path.a
asset_name: libstring2path.a
asset_content_type: application/octet-stream


If there’s a precompiled binary, we can skip the compilation by tweaking Makevars.win like this:

CRATE = string2path

# Change this when created a new tag
BASE_TAG = windows7

TARGET = $(subst 64,x86_64,$(subst 32,i686,$(WIN))) LIBDIR = windows/$(TARGET)
STATLIB = $(LIBDIR)/lib$(CRATE).a
PKG_LIBS = -L$(LIBDIR) -l$(CRATE) -lws2_32 -ladvapi32 -luserenv

all: clean

$(SHLIB):$(STATLIB)

$(STATLIB): mkdir -p$(LIBDIR)
# Not sure, but $@ doesn't seem to work here... curl -L -o$(STATLIB) https://github.com/yutannihilation/$(CRATE)/releases/download/$(BASE_TAG)-$(TARGET)/lib$(CRATE).a

clean:
rm -Rf $(SHLIB)$(STATLIB) \$(OBJECTS)


By the way, at the time when hellorust was created, the extension of staticlib was .lib on Windows (MinGW), but recently (as of v1.44) this is changed to .a. Be careful.

## Why Rust?

Lastly, let me answer to what some of you might wonder. I know you want me to say something like “memory safe” or “fast,” but…, it was just I was more familiar with Rust than C/C++.

I just happened to learn Rust. I was searching for some alternative of Processing, a great creative coding framework, and I found nannou. At first, I didn’t expect I needed to learn Rust seriously, as the framework wraps the things very nicely. But, since nannou is still maturing, I found I needed to dive a bit deeper into the world of Rust to make things work on my environment. I’m now learning wgpu, a Rust implementation of WebGPU. If you are interested in, here’s some resources: