Writing A Configure Script For An R Package Using Rust

configure.win and configure

Rust
extendr
Author

Hiroaki Yutani

Published

September 21, 2021

I’ve been struggling with configure.win for several days. I think I’ve done, but it seems I’ve come too far from the last post. So, let me try to explain what a configure.win (or configure) would look like.

Let’s start with this Makevars.win, basically the same one on the last blog post.

CRATE = string2path
BASE_TAG = windows_20210801-3

TARGET = $(subst 64,x86_64,$(subst 32,i686,$(WIN)))-pc-windows-gnu
LIBDIR = ./rust/target/$(TARGET)/release
STATLIB = $(LIBDIR)/libstring2path.a
PKG_LIBS = -L$(LIBDIR) -lstring2path -lws2_32 -ladvapi32 -luserenv

# c.f. https://stackoverflow.com/a/34756868
CARGO_EXISTS := $(shell cargo --version 2> /dev/null)

all: C_clean

$(SHLIB): $(STATLIB)

$(STATLIB):
ifdef CARGO_EXISTS
    cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml
else
    mkdir -p $(LIBDIR)
    curl -L -o $(STATLIB) https://github.com/yutannihilation/$(CRATE)/releases/download/$(BASE_TAG)/$(TARGET)-lib$(CRATE).a
endif

C_clean:
    rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS)

clean:
    rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS) rust/target

Tweak Makevars.win

Before talking about configure scripts, I have to write a bit about Makevars.win (and Makevars) because it has something to be fixed.

Set CARGO_HOME

Can you see what part is wrong in the above Makevars.win? After the first CRAN submission of this package, I got this reply:

Checking this creates ~/.cargo sized 82MB, in violation of the CRAN Policy. Please fix as necessary and resubmit.

The CRAN Policy says:

  • Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up).

By default, cargo uses ~/.cargo for caching various things like the crates.io index and the dependency crates. Apparently, this is not allowed. To avoid this, we can set CARGO_HOME to the package’s local directory, like gifski package does.

export CARGO_HOME=$(PWD)/.cargo

Alternatively, we can set this on the head of the line of cargo build directly.

CARGO_HOME=$(PWD)/.cargo

...snip...

  CARGO_HOME=$(PWD)/.cargo cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml

In addition to this, as it “should be cleaned up,” we also need to add these two lines after cargo build:

    rm -Rf $(CARGO_HOME)
    rm -Rf $(LIBDIR)/build

But, this is a bit painful in terms of development. Why do I need to compile it always from scratch even on my local??

NOT_CRAN environmental variable

Fortunately, devtools utilities provide NOT_CRAN envvar to distinguish CRAN and other environment. So, maybe we can determine whether to set CARGO_HOME, depending on NON_CRAN. It would be:

# An envvar cannot be referred to as $(NOT_CRAN)
NOT_CRAN_ENVVAR = ${NOT_CRAN}

$(STATLIB):
ifeq ($(NOT_CRAN_ENVVAR),"true")
  cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml
else
  CARGO_HOME=$(PWD)/.cargo cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml
    rm -Rf $(CARGO_HOME)
    rm -Rf $(LIBDIR)/build
endif

Set PATH

One more thing my Makevars (not Makevars.win this time) couldn’t covered was the case when cargo is on PATH but rustc is not. It seems gifski package handles this by including $(HOME)/.cargo/bin in PATH (Makevars).

I think sourcing "$(HOME)/.cargo/env" should also work (this actually just sets PATH), so I’ll try this next time. Note that source is not available on dash, so use . for this.

. "$(HOME)/.cargo/env" && cargo build ...

Okay, done. Let’s move onto configure scripts.

What is a configure script?

A configure script is often used for configuring Makefile or Makevars, depending on the user’s setup.

Writing R Extensions says:

If your package needs some system-dependent configuration before installation you can include an executable (Bourne shell script configure in your package which (if present) is executed by R CMD INSTALL before any other action is performed.

configure is executed on UNIX-alikes, and Windows uses a different file configure.win1. Actually, I got this request from CRAN:

By the way, ideally string2path would use configure to test for cargo

So far, I used Makevars.win for testing the existence of cargo, but it seems configure scripts are the better place for this. Moreover, I do want to check the cargo functionality more precisely, for example,

  • if the Rust version is not too old to support (“MSRV”)
  • (Windows only) if the Rust installation has the required toolchain, stable-msvc
  • (Windows only) if the Rust installation has the required targets, x86_64-pc-windows-gnu and i686-pc-windows-gnu

but Makevars.win is a bit too narrow to write a complex shell script.

Biarch: true

If we use configure.win, we have to add the following line to DESCRIPTION.

Biarch: true

Otherwise, the 32-bit version won’t get built for unknown reason and it makes CRAN angry. This behaviour is found on R for Windows FAQ, but it doesn’t explain what we should do. I found this on the following post on RStudio Community.

Check cargo

Check cargo is installed

This is simple.

cargo version >/dev/null 2>&1
if [ $? -ne 0 ]; then
  echo "cargo command is not available"
  exit 1
fi

Note that we don’t want to exit here, because the absence of cargo isn’t the end of the world; there might be a precompiled binary for the platform. But, let’s tweak it later.

Check if the Rust version is not too old to support

This isn’t always necessary, but we might want to reject the older Rust in the case when we use some feature that is available after the specific version of Rust. You know, comparing versions is tricky. But, this can be archived by sort command with -V option and -C option (c.f. SO answer). -V means version sort. -C means checking if the input is already sorted, and errors when it’s not. In summary, the implementation is the following:

# c.f. https://github.com/ron-rs/ron/issues/256#issuecomment-657999081
MIN_RUST_VERSION="1.41.0"

RUST_VERSION="`cargo --version | cut -d' ' -f2`"
if ! printf '%s\n' "${MIN_RUST_VERSION}" "${RUST_VERSION}" | sort -C -V; then
  echo "The installed version of cargo (${RUST_VERSION}) is older than the requirement (${MIN_RUST_VERSION})"
  exit 1
fi

(Btw, did you know we cannot use $(expr) notation in configure because this syntax isn’t available on Solaris? We need to use `expr` instead)

Check if the Rust installation has the required toolchain and targets (Windows only)

On Windows, extendr provides support only the specified set of toolchain and target. So, we need to check it.

Checking toolchain is simple. Use + to specify the toolchain.

EXPECTED_TOOLCHAIN="stable-msvc"

cargo "+${EXPECTED_TOOLCHAIN}" version >/dev/null 2>&1
if [ $? -ne 0 ]; then
  echo "${EXPECTED_TOOLCHAIN} toolchain is not installed"
  exit 1
fi

Installed targets can be listed by rustup target list --installed. So, the check would be like

EXPECTED_TARGET="x86_64-pc-windows-gnu"

if ! rustup target list --installed | grep -q "${EXPECTED_TARGET}"; then
  echo "target ${EXPECTED_TARGET} is not installed"
  exit 1
fi

One thing tricky here is that, unlike Makevars.win, configure.win is executed only once, not per architecture. So, we need to manually enumerate both 64-bit and 32-bit.

One more tricky thing at the time of writing this is…, you might not notice, the 32-bit version no longer exists in R-devel, which is supposed to be released as R 4.2! So, we want to check 32-bit only when there’s 32-bit. How? I couldn’t come up with some nice way, but it seems check on ${R_HOME}/bin/i386/ works:

check_cargo_target() {
  EXPECTED_TARGET="$1"
  
  if ! rustup target list --installed | grep -q "${EXPECTED_TARGET}"; then
    echo "target ${EXPECTED_TARGET} is not installed"
    exit 1
  fi
}

check_cargo_target x86_64-pc-windows-gnu

if [ -d "${R_HOME}/bin/i386/" ]; then
  check_cargo_target i686-pc-windows-gnu
fi

But what can we do?

Now we can check the cargo installation. But, if the check fails, what should we do? Actually, on CRAN, Windows and macOS machines don’t have Rust installed.

There are two options.

  1. Install cargo into the temporary directory
  2. Download the precompiled binaries (which means you have to serve the binaries on somewhere beforehand).

For example, the gifski package uses option 1 for macOS and option 2 for Windows. My string2path package now uses option 2 both for macOS and for Windows.

Install cargo on the fly

The implementation for macOS is below. The actual process is written in the downloaded script, but what it does is basically downloading cargo.

# Try local version on MacOS, otherwise error
[ `uname` = "Darwin" ] && curl "https://autobrew.github.io/scripts/rust" -sSf | sh && exit 0

I don’t write much about this this time, but cargo package should also be useful for this strategy.

Download the precompiled binaries

This is what was done in Makevars.win.

$(STATLIB):
ifdef CARGO_EXISTS
    cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml
else
    mkdir -p $(LIBDIR)
    curl -L -o $(STATLIB) https://github.com/yutannihilation/$(CRATE)/releases/download/$(BASE_TAG)/$(TARGET)-lib$(CRATE).a
endif

To move this to configure and configure.win. There are several things to consider. For example:

  • As this downloads the binary before $(STATLIB) is executed, we need some tweak to ensure it’s not removed by C_clean
  • (Windows only) unlike Makevars.win is executed per arch, configure.win is executed only once

What’s more, I want to make one addition:

  • Verify the checksums

Some might have wondered if it’s safe to download an arbitrary binary from GitHub. First of all, I’d argue it’s safe. To say the least, essentially, it’s no unsafer than downloading a cargo binary itself, (or than “curl URL | sh”). If it’s downloaded over HTTPS, the data hardly gets compromised as long as their servers are not compromised. I believe GitHub servers and Rust servers are both very secure.

That said, we can improve the security further by “pinning” the binary. So, I recommend verifying the checksum when downloading a binary. For example, the stringi package does this.

Makevars.in and Makevars.win.in

Remember Makevars.win has these lines ($(STATLIB) is the artifact of the Rust code):

all: C_clean
C_clean:
    rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS)

Removing $(STATLIB) is needed to invoke cargo build in the case when cargo is installed. Otherwise, $(STATLIB) keeps existing, which means that target is never executed.

On the other hand, we don’t want to execute cargo build if there’s no cargo. So, in this case, we need to prevent the downloaded binary from getting removed.

In order to change the logic like this, we can use Makevars.in and Makevars.win.in as a template to generate Makevars and Makevars.win respectively. Generating these files can be done in configure and configure.win. For example,

Makevars.in:

C_clean:
    rm -Rf $(SHLIB) $(OBJECTS) @CLEAN_EXTRA@

configure:

# cargo installed
sed -e 's|@CLEAN_EXTRA@|$(STATLIB)|' src/Makevars.in > src/Makevars

# no cargo
sed -e 's|@CLEAN_EXTRA@||' src/Makevars.in > src/Makevars

If we generate Makevars and Makevars.win in configure scripts, we also need to clean up these by the cleanup script (this is required by Writing R Extensions).

rm -f src/Makevars src/Makevars.win

Of course, don’t forget to add src/Makevars and src/Makevars.win to .gitignore.

(Probably, we can generate more sophisticated Makevars, but this time I used only this one replacement.)

Verify the checksums

To verify the checksum, we can use sha256sum on platforms other than macOS, and shasum -a 256 on macOS. For the example of macOS:

SHA256SUM_EXPECTED=be65f074cb7ae50e5784e7650f48579fff35f30ff663d1c01eabdc9f35c1f87c

# Verify the checksum
SHA256SUM_ACTUAL=`shasum -a 256 "${DST}" | cut -d' ' -f1`
if [ -z "${SHA256SUM_ACTUAL}" ]; then
  echo "Failed to get the checksum"
  exit 1
fi

if [ "${SHA256SUM_ACTUAL}" != "${SHA256SUM_EXPECTED}" ]; then
  echo "Checksum mismatch for the pre-compiled binary"
  exit 1
fi

[MOST IMPORTANT!] Ask CRAN maintainers to exclude Solaris

Oh, sorry, I forgot to enumerate the most important option!

  1. Give up

Because 32-bit Solaris is not a supported platform by Rust, there’s no option other than giving up. I wrote this on cran-comment and it seems this was accepted:

I would like to request to exclude Solaris from the build targets because Solaris is not a supported platform by Rust. This should be in line with the treatments of other CRAN packages that use Rust; gifski, baseflow, and salso are not built on Solaris. I’m sorry that I didn’t write this in the first submission.

Be sure to add some comment like this when you submit an R package with Rust to CRAN!

Example

For a real example, please refer to string2path, though your mileage may vary.

Footnotes

  1. There’s also configure.ucrt, but you can forget this for now.↩︎