The Rust Interop Book

by Preben Aandahl

This book will show you practical examples of interop between Rust and other programming languages.

Interop with C

This section will take you through a practical example of how to integrate Rust code into an existing C codebase.

What you learn here is fundamental to all interop with Rust since the C ABI (Application Binary Interface) is the only way to communicate with Rust functions from foreign languages. All tools and frameworks that offer integration with other programming languages build on the same techniques we use here.

The tools we'll use

On the Rust side of the fence, we have the FFI (Foreign Function Interface). We are provided a few keywords and a handful of library utilities to expose functions to the outside world (through the C ABI).

We will start by making a minimal Rust library, exposing a function over the FFI, and manually writing the matching C declarations to be able to call our function.

Further on, we will explore how we can use the cbindgen tool to automatically generate function declarations and types for our C code.

Finally, we'll show how we can make function calls in the other direction, from C to Rust, using Rust bindgen to keep the Rust declarations in sync.

Other helpful resources

  • The FFI chapter of the Rustonomicon offers some technical details on FFI and a few practical examples (focusing mainly on calling C from Rust).
  • The Rust FFI Omnibus has a handful of snippets demonstrating how to pass data across the language boundary.
  • cbindgen has a User Guide, but it doesn't really show how to use the tool, focusing mainly on configuration options.

Introducing our C application

Our project will start out with a simple C application called count, which we'll gradually extend with Rust code.

The application relies on an internal C module (file.h / file.c) to read a file from disk and convert it to a string.

Based on a user-supplied command, the application then runs a calculation on the file text. Initially, bytes is the only supported command, but we will soon add other options.

Building and running the code

The source code for each chapter is available in our GitHub repository, and you'll find the initial code here.

Start by checking out the code and preparing a build directory:

$ git checkout git://rust-interop
$ cd rust-interop/c/chap1
$ mkdir build && cd build

Configure & run the build, create some test data, and run the program:

$ cmake .. && make
$ echo -n "Rust interop" > test.txt
$ ./count bytes test.txt
12

Source code

As we go along, we will try to add new functionality to this application, implementing most of the new logic in Rust.

The initial code is included in full below if you want to familiarize yourself with it:

src/main.cpp - the entry point

#include "modules/file/file.h"

#include <stdio.h>
#include <string.h>

void run_command_for_file(const char* command, const char* filename);
uint64_t do_calculation(const char* command, const char* data);
uint64_t count_bytes(const char* data);
void print_result(const uint64_t result);

int main(const int argc, const char *argv[]) {
    const char* command = argv[1];
    const char* filename = argv[2] ;
    run_command_for_file(command, filename);
    return 0;
}

void run_command_for_file(const char* command, const char* filename) {
    File file = file_read(filename);
    char* str = file_to_string(file);

    const uint64_t result = do_calculation(command, str);
    print_result(result);

    file_free_string(str);
    file_free(file);
}

uint64_t do_calculation(const char* command, const char* data) {
    if (strcmp(command, "bytes") == 0) {
        return count_bytes(data);
    } else {
        fprintf(stderr, "Unrecognized command: %s\n", command);
        exit(1);
    }
}

uint64_t count_bytes(const char* data) {
    return strlen(data);
}

void print_result(const uint64_t result) {
    printf("%llu\n", result);
}

src/modules/file.h - The file module interface

#pragma once

#include <stdlib.h>

typedef struct File {
    const char* filename;
    uint8_t* data;
    size_t length;
} File;

File file_read(const char* filename);
char* file_to_string(File file);
void file_free(File file);
void file_free_string(char* file_string);

src/modules/file.c - The file module implementation

#include "file.h"

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

File file_read(const char* filename) {
    FILE* file_handle = fopen (filename, "rb");
    if (!file_handle) {
        printf("Could not open file: '%s'\n", filename);
        exit(1);
    }
    fseek(file_handle, 0, SEEK_END);
    long length = ftell(file_handle);
    fseek(file_handle, 0, SEEK_SET);
    uint8_t* data = (uint8_t*)malloc(length);
    fread(data, 1, length, file_handle);
    fclose(file_handle);
    return (File) {
            filename,
            data,
            length
    };
}

char* file_to_string(const File file) {
    char* str = (char*)malloc(file.length + 1);
    memcpy(str, file.data, file.length);
    str[file.length] = '\0';
    return str;
}

void file_free(const File file) {
    free(file.data);
}

void file_free_string(char* file_string) {
    free(file_string);
}

CMakeLists.txt - build configuration

cmake_minimum_required(VERSION 3.22)
project(rust-interop-c-chap1)
set(CMAKE_C_STANDARD 17)

add_executable(count src/main.c src/modules/file/file.c)

Calling a Rust function from C

NOTE:

If you're going to type the code as you follow along, you should start by making a copy of the initial C application in the first chapter.

$ cp -R rust-interop/c/chap1 count
$ cd count

The final result of this chapter is also available here.

Let's dive straight in and use cargo to initialize a Rust library directly in our project folder:

$ cp -R rust-interop/c/chap1 count
$ cd count
$ cargo init --lib --name count

Here, we put our C and Rust code in the same folder. The upside to this approach is that we can gradually introduce Rust to our modules, and we're going to demonstrate a mixed-language unit of code with dependencies in both directions.

Another common practice is keeping the Rust library in a separate folder; if it is self-contained, that is an excellent way to handle it.

We also have to tell cargo that we intend to produce a static system library (to be linked into our C binary):

Filename: Cargo.toml

# --snip--

[lib]
crate-type = ["staticlib"]

[dependencies]

Making the function

Let's add a version command that works like this:

$ ./count version
1.0.0

We start by replacing the contents of lib.rs with our new function:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn print_version() {
    println!("count version 1.0.0");
}
}

There are a couple of noteworthy things to point out here:

Name mangling

Usually, the Rust compiler rewrites the function names behind the scenes to include details such as the crate name and the containing module.

Our function name will turn into something like this:

__ZN5count11get_version17h0b87bf00f9702f77E

C has no concept of crates and modules, so we need to add #[no_mangle] to be able to resolve the function simply as print_version().

With mangling disabled, all exported function names need to be unique.

ABI (Application Binary Interface)

We also add extern "C" to the function to allow it to be called with your platforms C ABI. This specifies how data is laid out in memory and how functions are called.

The C ABI is the lingua franca of application binaries and our only line of communication to the non-Rust world. All interop between Rust and other languages is based on calling conventions from C.

NOTE: We have also added the pub keyword to our function. Although strictly not necessary (C has no concept of private functions), it's good to be explicit that our function is part of the library's public interface.

Building the Rust library

We can test that our Rust library builds with cargo:

$ cargo build
Compiling count v0.1.0 (/path/to/count)
 Finished dev [unoptimized + debuginfo] target(s) in 0.56s

The static library should now be ready at target/debug/libcount.a (Unix-like) or target/debug/count.lib (Windows).

Calling our function from C

By manually writing a function declaration, we tell our C application that the function get_version() exists. Then we call that function if command is equal to "version". We make sure to do this before the file name is parsed since no file is involved.

Filename: src/main.c

// --snip--

void print_version();

int main(const int argc, const char *argv[]) {
    const char* command = argv[1];

    if (strcmp(command, "version") == 0) {
        print_version();
        return 0;
    }

    // --snip--

We amust also amend our CMake configuration to link to the Rust library. Add the following lines to the bottom of the file:

Filename: CMakeLists.txt

# --snip--

set(RUST_LIB_NAME ${CMAKE_STATIC_LIBRARY_PREFIX}count${CMAKE_STATIC_LIBRARY_SUFFIX})
set(RUST_LIB_PATH ${CMAKE_SOURCE_DIR}/target/debug/${RUST_LIB_NAME})
target_link_libraries(count ${RUST_LIB_PATH})

We construct the library name in a platform-independent way (libcount.a or count.lib), and link it to our executable.

Let's build and run the program:

$ mkdir -p build
$ cd build
$ cmake ..
$ cmake --build .
$ ./count version
count version 1.0.0

We've extended our C application with Rust code!

Harmonizing CMake & Cargo

The current setup works, but if we make changes on the Rust side, we have to manually trigger the cargo build before the CMake build.

Let's rewrite our CMake configuration so that it automatically rebuilds the Rust code upon changes:

Filename: CMakeLists.txt

cmake_minimum_required(VERSION 3.22)
project(rust-interop-c)
set(CMAKE_C_STANDARD 17)

set(RUST_LIB_NAME ${CMAKE_STATIC_LIBRARY_PREFIX}count${CMAKE_STATIC_LIBRARY_SUFFIX})
set(RUST_LIB_PATH ${CMAKE_SOURCE_DIR}/target/debug/${RUST_LIB_NAME})

add_custom_command(
        OUTPUT ${RUST_LIB_PATH}
        COMMAND cargo build --manifest-path ${CMAKE_SOURCE_DIR}/Cargo.toml
        DEPENDS ${CMAKE_SOURCE_DIR}/src/lib.rs
        USES_TERMINAL
)

add_executable(count src/main.c src/modules/file/file.c ${RUST_LIB_PATH})
target_link_libraries(count ${RUST_LIB_PATH})

We use add_custom_command() to define our library as an OUTPUT of the cargo build, that DEPENDS on changes to the content of lib.rs.

Adding the library to the source list in add_executable() will evaluate our custom command before linking happens.

From the build-folder, we can now re-configure CMake:

$ cmake ..

And subsequent builds should recompile the Rust library if there are new changes:

$ cmake --build .

What's next?

The facilities we use to bind Rust to other languages are often referred to as the Rust FFI (Foreign Function Interface). Now that we have a working configuration, we will see how we can send and receive data across the FFI boundary.

Taking references and returning primitives from Rust

Let's add a third command to our application:

./count version
./count bytes file.txt
./count characters file.txt

The number of bytes and characters in a file will often be the same. But a byte can only represent 256 different values, and to support all the alphabets out there, and special characters like emojis, UTF-8 encoded text allows for multiple bytes representing single characters.

Example: while the string Kaimū has five characters, it is six bytes long because the ū is a two-byte character.

C has no built-in support for multibyte characters, so let's implement the counting function in Rust, where Unicode characters are first-class citizens.

Dispatching our new command

This time we start from the other side by adding a function declaration for our new function: count_characters(). We also add an else if clause to dispatch the new command in do_calculation():

Filename: src/main.c

// --snip --

void print_version();
uint64_t count_characters(const char* text);

// --snip --

uint64_t do_calculation(const char* command, const char* data) {
    if (strcmp(command, "bytes") == 0) {
        return count_bytes(data);
    } else if (strcmp(command, "characters") == 0) {
        return count_characters(data);
    } else {
        fprintf(stderr, "Unrecognized command: %s\n", command);
        exit(1);
    }
}

Adding a function with parameters and a return type

Filename: src/lib.rs

#![allow(unused)]
fn main() {
use std::ffi::CStr;
use std::os::raw::c_char;

// --snip--

#[no_mangle]
pub extern "C" fn count_characters(text: *const c_char) -> u64 {
    let text = unsafe { CStr::from_ptr(text) };
    let text = text.to_str().expect("Unicode conversion failed.");
    text.chars().count().try_into().unwrap()
}
}

Our new function takes a pointer to C characters, represented by the c_char type from std::os::raw. We then turn it into a CStr with CStr::from_ptr(). CStr is a handy utility class that deals with C string references (i.e. it doesn't try to take ownership of or free the data).

By converting the CStr to a &str, we gain access to Rust's regular string utilities and can proceed with text.chars().count() to get the number of Unicode characters.

The function returns a plain u64, since that type matches the definition of uint64_t on the C side.

Let's try it:

$ cd build
$ cmake --build .
$ echo "Kaimū" > kaimu.txt
$ ./count bytes kaimu.txt
6
$ ./count characters kaimu.txt
5

Which types match up?

Here's a quick reference of the most common Rust primitives you can pass directly across the FFI boundary.

RustC
boolbool
u8 / i8uint8_t / int8_t
u16 / i16uint16_t / int16_t
u32 / i32uint32_t / int32_t
u64 / i64uint64_t / int64_t
f32float
f64double
usizeuintptr_t

There are also some compatibility types in std::os::raw for the platform-specific C types. Here's an excerpt:

RustC
c_charchar
c_intsigned int
c_uintunsigned int
c_longsigned long
c_ulongunsigned long
c_voidvoid

NOTE: The C standard does not strictly define the length of float and double, but in practice, this mapping will work on all major platforms. For the paranoid, there's also a c_float and a c_double in std::os::raw.

You can read more detailed documentation about the memory layout of scalar types here.

In the next chapter, we will discover how the function declarations that our C code needs from Rust can be generated automatically instead of the error-prone and tedious task of writing and maintaining them by hand.

Using cbindgen to generate C headers

In the previous chapters we manually wrote the C declarations corresponding to our Rust FFI functions. There's at least two good reasons to automate this process:

  • Writing duplicate C declarations is tedious
  • An automated system leaves much less room for error

cbindgen to the rescue! Add it as a build-dependcy:

Filename: Cargo.toml

# --snip--

[build-dependencies]
cbindgen = "0.24"

We can customize our Cargo build by adding a build.rs to our project root:

Filename: build.rs

use cbindgen::Language;
use std::env;

fn main() {
    println!("cargo:rerun-if-changed=src/lib.rs");

    let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap();

    cbindgen::Builder::new()
        .with_crate(manifest_dir)
        .with_language(Language::C)
        .generate()
        .expect("Unable to generate C bindings")
        .write_to_file("target/bridge/bindings.h");
}

On every change to lib.rs we generate bindings.h. This will crawl through our Rust code and generate C declarations for all FFI-exported types and functions.

You can have a look at the output by running cargo build and checking out target/bridge/bindings.h.

Let's update our CMake build correspondingly:

Filename: CMakeLists.txt

# --snip--

set(
        RUST_LIB_SOURCES
        ${CMAKE_SOURCE_DIR}/build.rs
        ${CMAKE_SOURCE_DIR}/src/lib.rs
)

add_custom_command(
        # --snip--

        DEPENDS ${RUST_LIB_SOURCES}

        # --snip--
)

# --snip--

target_include_directories(count PRIVATE ${CMAKE_SOURCE_DIR}/target/bridge)

We've added a list of all the Rust sources instead of listing files directly in the DEPENDS clause. This will scale better as our application continues to grow.

We also included the directory of our generated C headers, so that the compiler will know where to look for them.

The last step is to include the header in our C code, and to remove our manually written declarations.

Filename: src/main.c

#include "modules/file/file.h"
#include "bindings.h"

// --snip--

// Remove: void print_version();
// Remove: uint64_t count_characters(const char* text);

// --snip--

That's all. If you reconfigure & rebuild the CMake project, the application should still work exactly like it used to.

This might seem like a lot of work to get rid of two lines of code, but as our application grows, this investment will pay off.

Let's see how we can leverage this new setup to pass custom structs across the language boundary.

NOTE: In a later chapter we will see how we can use bindgen (as opposed to cbindgen) to generate declarations in the opposite direction - allowing us to call C code from Rust.

Shared structs and enums

The current program works if you pass the correct command line arguments, but if a user tries to call it without any arguments it will fail with a segfault:

$ ./count
zsh: segmentation fault  ./count

To improve upon this, we'll add error handling by validating the command line arguments passed in by the user.

Let's start by adding a struct to represent the command line arguments, and an enum to hold the chosen command.

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip--

pub struct Arguments {
    command: Command,
    filename: *const c_char,
}

#[derive(PartialEq)]
pub enum Command {
    Version,
    Bytes,
    Characters,
}
}

We use *const c_char instead of the built-in String type because we intend to read the filename property from our C code.

Let's define the interface of the function that will parse the arguments:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip--

#[no_mangle]
pub extern "C" fn parse_args(argc: usize, argv: *const *const c_char) -> Arguments {
    let arguments = unsafe { slice::from_raw_parts(argv, argc) };

    // --snip--

    Arguments { command, filename }
}
}

We define argc and argv to mimic the parameters our entry point takes in main.c. There's a subtle difference in that usize translates to uintptr_t, not size_t as one might have thought. Although not guaranteed by the C standard, size_t is smaller or equal to unitptr_t in all known implementations, so we should be fine.

The two parameters are then upgraded to a slice (&[*const c_char]) for easier access. This is unsafe because we have to promise the compiler that argv contains exactly argc elements. Once the slice is created, its elements can be accessed safely.

Let's see how we can convert the second element to a Command:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip--

#[no_mangle]
pub extern "C" fn parse_args(argc: usize, argv: *const *const c_char) -> Arguments {
    // --snip--

    let command = arguments.get(1).copied().expect("Missing command.");
    let command = unsafe { CStr::from_ptr(command) }.to_str().unwrap();
    let command = match command {
        "version" => Command::Version,
        "bytes" => Command::Bytes,
        "characters" => Command::Characters,
        _ => panic!("Command not recognized: {command}")
    };

    // --snip--
}
}

We use three separate steps to gradually turn command into the type we want it to be:

  1. We .get() the second element of the arguments slice, make a copy of the referenced pointer, and trigger a panic! if it's missing. If everything worked as expected, we now have a *const c_char.
  2. The pointer is then fed in to Cstr::from_ptr, which is unsafe because we have to promise that our string is zero-terminated (ends with a \0 character). We then convert it to an &str.
  3. Lastly, we use a match expression to map it to the correct Command.

The filename also needs to be handled:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip--

#[no_mangle]
pub extern "C" fn parse_args(argc: usize, argv: *const *const c_char) -> Arguments {
    // --snip--

    let filename = arguments.get(2).copied();
    if filename.is_none() && command != Command::Version {
        panic!("Missing filename.");
    }
    let filename = filename.unwrap_or(ptr::null());

    // --snip--
}
}

We only require the filename to be present if the command is something else than "version". In the other case, we create a null pointer by calling ptr::null().

Inspecting the generated bindings

Let's have a look at the generated C bindings by running a cargo build:

Filename: target/bridge/bindings.h

// --snip--

typedef struct Arguments Arguments;

void print_version(void);

uint64_t count_characters(const char *text);

struct Arguments parse_args(uintptr_t argc, const char *const *argv);

We have a problem! The Arguments is defined as an opaque struct. So while we can get a handle on it, we won't be able to access any of its fields in C.

The reason that cbindgen went for the opaque type is that Rust's default data layout is incompatible with C. Luckily, Rust has an attribute called repr, that will fix that problem for us:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip --

#[repr(C)]
pub struct Arguments {
    command: Command,
    filename: *const c_char,
}

#[repr(C)]
#[derive(PartialEq)]
pub enum Command {
    Version,
    Bytes,
    Characters,
}

// --snip --
}

With this fix in hand, we can do another cargo build and look at cbindgen's output.

Filename: target/bridge/bindings.h

// --snip--

typedef enum Command {
  Version,
  Bytes,
  Characters,
} Command;

typedef struct Arguments {
  enum Command command;
  const char *filename;
} Arguments;

// --snip--

This looks much better! But let's ask cbindgen to prefix our enum values to reduce the risk of a name colission:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip --

/// cbindgen:prefix-with-name
#[repr(C)]
#[derive(PartialEq)]
pub enum Command {
    Version,
    Bytes,
    Characters,
}

// --snip --
}

A rebuild should change the Command enum to this:

Filename: target/bridge/bindings.h

// --snip--

typedef enum Command {
  Command_Version,
  Command_Bytes,
  Command_Characters,
} Command;

// --snip--

That looks much better! Let's take it for a spin.

Using our argument parser

Using our new argument parsing function is fairly straightforward:

Filename: src/main.c

// --snip--

int main(const int argc, const char *argv[]) {
    const Arguments args = parse_args(argc, argv);

    if (args.command == Command_Version) {
        print_version();
        return 0;
    }

    run_command_for_file(args.command, args.filename);
    return 0;
}

// --snip--

This won't work quite yet - we also have to update the command type in run_command_for_file and do_calculation:

Filename: src/main.c

// --snip--

void run_command_for_file(Command command, const char* filename);
uint64_t do_calculation(Command command, const char* data);

// --snip--

void run_command_for_file(const Command command, const char* filename) {
    // --snip--
}

uint64_t do_calculation(const Command command, const char* data) {
    switch (command) {
        case Command_Bytes:
            return count_bytes(data);
        case Command_Characters:
            return count_characters(data);
        default:
            fprintf(stderr, "Unrecognized command: %i\n", command);
            exit(1);
    }
}

// --snip--

Now that command is an enum, we can get rid of all the calls to strcmp, and we can also opt to use a switch statement.

Build and run the application to see the input validation in action:

$ cmake --build .
$ ./count
thread '<unnamed>' panicked at 'Missing command.', src/lib.rs:36:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
zsh: abort      ./count

Panicking is a very crude form of error handling, but at least we give the user a hint that there's a 'Missing command.'.

Sharing structs across the FFI boundary is a useful technique when your function is dealing with more than simple values. In the next section we will show how you can send function callbacks from C to Rust.

Sending callbacks from C

Our application is counting bytes and characters like there's no tomorrow. But imagine you're writing a book, have one file per chapter, and want to count characters across them regularly.

Given a CSV file (Comma Separated Values), we want our application to run the calculation on each file in the list.

An example:

Filename: list.csv

chapter1.md,chapter2.md

Filename: chapter1.md

# Getting started

Filename: chapter2.md

# Wrapping up

Our programs command line interface gets a new flag:

./count version
./count bytes list.csv [--csv-list]
./count characters list.csv [--csv-list]

For the given files we want to be able to run commands like this:

$ ./count bytes list.csv --csv-list
18 chapter1.md
14 chapter2.md

Adding a CSV module

This time around we'll start by making the logic in Rust, and then we'll make a FFI wrapper separately, with C types and attributes. The core of our module is a function that takes some string data, splits it on commas, and calls a callback one time for each of the separated values:

Filename: src/modules/csv.rs

#![allow(unused)]
fn main() {
fn for_each_value(csv: &str, callback: impl Fn(&str)) {
    for value in csv.split(",") {
        callback(value.trim());
    }
}
}

We proceed to add the C interface in a separate module at the beginning of the file:

Filename: src/modules/csv.rs

#![allow(unused)]
fn main() {
mod ffi {
    use std::ffi::{c_void, CStr, CString};
    use std::os::raw::c_char;

    #[no_mangle]
    pub extern "C" fn csv_for_each_value(
        csv: *const c_char,
        c_callback: unsafe extern "C" fn(*const c_char, *const c_void),
        context: *const c_void,
    ) {
        let csv = unsafe { CStr::from_ptr(csv) }.to_str().unwrap();
        super::for_each_value(csv, |value| {
            let value = CString::new(value).unwrap();
            unsafe { c_callback(value.as_ptr(), context) };
        });
    }
}

// --snip--
}

We have separated out the FFI-related type conversions from our logic. Notice that our exported wrapper function has the same name, but with the module name prefixed: csv_for_each_value().

The wrapper takes three parameters:

1. csv: *const c_char

The contents of a CSV file, as a char pointer. Just like earlier, we process it from *const c_char to CStr to &str.

2. c_callback: unsafe extern "C" fn(*const c_char, *const c_void)

An external function callback that takes two arguments. The first is a char pointer taking values from our CSV, and the second is a c_void pointer. Since C doesn't have closures, a void pointer is a common way to allow the callee to pass along arbitrary data / state to the callback function.

3. context: *const c_void

The last paramter is a void pointer to the data we want to pass along to the callback.


Upon receiving the string data in our wrapper, we pass it along to CString::new(value).unwrap().

CString is to CStr what String is to &str - an owned version of a C string. But why are we creating an owned string when we want to pass along a reference?

Ideally, we would have liked to do the inverse of what we do on the receiving end, going from &str to CStr to *const c_char. But to convert to a C string Rust needs to zero-terminate it by adding a \0 at the end of the buffer, thereby requiring ownership of the data.

We then call value.as_ptr() to get a *const c_char reference to our temporary zero-terminated string.

Parsing the new command line argument

In our library's entry point, we need to pull in our new module at the beginning of the file, and extend the argument parsing to look for and validate our new --csv-list flag:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
mod modules {
    mod csv;
}

// --snip--

#[repr(C)]
pub struct Arguments {
    command: Command,
    filename: *const c_char,
    file_mode: FileMode,
}

/// cbindgen:prefix-with-name
#[repr(C)]
pub enum FileMode {
    Normal,
    CsvList,
}

// --snip--

pub extern "C" fn parse_args(argc: usize, argv: *const *const c_char) -> Arguments {
    // --snip--

    let file_mode = if let Some(csv_flag) = arguments.get(3).copied() {
        let csv_flag = unsafe { CStr::from_ptr(csv_flag) }.to_str().unwrap();
        match csv_flag {
            "--csv-list" => FileMode::CsvList,
            _ => panic!("CSV flag not recognized: {csv_flag}")
        }
    } else {
        FileMode::Normal
    };

    Arguments { command, filename, file_mode }
}
}

We should rebuild the C bindings every time this new file changes:

Filename: build.rs

// --snip--

fn main() {
    println!("cargo:rerun-if-changed=src/lib.rs");
    println!("cargo:rerun-if-changed=src/modules/csv.rs");

    // --snip--
}

// --snip--

And let's not forget to add it as a dependency of our CMake config:

Filename: CMakeLists.txt

# --snip--

set(
        RUST_LIB_SOURCES
        ${CMAKE_SOURCE_DIR}/build.rs
        ${CMAKE_SOURCE_DIR}/src/lib.rs
        ${CMAKE_SOURCE_DIR}/src/modules/csv.rs
)

# --snip--

Re-wiring main.c

We also have to adapt our entry point to the new realities. First, we change run_command_for_file so that we'll be able to use it as a callback. We flip around the two parameters it takes, and substitute Command for a CommandContext, which is the state we soon will pass around as a void pointer:

Filename: src/main.c

// --snip--

typedef struct CommandContext {
    Command command;
} CommandContext;

void run_command_for_file(const char* filename, const void* ctx_ptr);

// --snip--

void run_command_for_file(const char* filename, const void* ctx_ptr) {
    const CommandContext* ctx = (CommandContext*) ctx_ptr;
    File file = file_read(filename);
    char* str = file_to_string(file);

    const uint64_t result = do_calculation(ctx->command, str);
    print_result(result);

    file_free_string(str);
    file_free(file);
}

// --snip--

We also have to rewrite the main()-function to adhere to our new file_mode property. If we have FileMode_Normal, we just wrap the command in a CommandContext, and call run_command_for_file the same way we always did.

If we have FileMode_CsvList, we read the contents of the CSV-file to a string, and pass it on to the Rust-defined csv_for_each_value().

// --snip--

int main(const int argc, const char *argv[]) {
    const Arguments args = parse_args(argc, argv);

    if (args.command == Command_Version) {
        print_version();
        return 0;
    }

    switch (args.file_mode) {
        case FileMode_Normal: {
            CommandContext ctx = { .command = args.command };
            run_command_for_file(args.filename, &ctx);
            break;
        }
        case FileMode_CsvList: {
            char* csv = file_to_string(file_read(args.filename));
            CommandContext ctx = { .command = args.command };
            csv_for_each_value(csv, run_command_for_file, &ctx);
            file_free_string(csv);
            break;
        }
    }

    return 0;
}

// --snip--

Let's test what we've got so far:

$ cmake ..
$ cmake --build .
$ echo "# Getting started" > chapter1.md
$ echo "# Wrapping up" > chapter2.md
$ echo "chapter1.md,chapter2.md" > list.csv
$ ./count characters list.csv --csv-list
18
14

While we do get the count for each of the files, it's not very easy to see which count is for which file.

As a finishing touch, we'll add the filename for each count, if we are in CSV-mode:

// --snip--

typedef struct CommandContext {
    Command command;
    bool print_filename;
} CommandContext;

// --snip--

void print_result_with_filename(uint64_t result, const char* filename);

// --snip--

int main(const int argc, const char *argv[]) {

    // --snip--

    switch (args.file_mode) {
        case FileMode_Normal: {
            CommandContext ctx = { .command = args.command, .print_filename = false };
            // --snip--
        }
        case FileMode_CsvList: {
            char* csv = file_to_string(file_read(args.filename));
            // --snip--
        }
    }

    // --snip--
}

// --snip--

void run_command_for_file(const char* filename, const void* ctx_ptr) {
    // --snip--

    const uint64_t result = do_calculation(ctx->command, str);
    if (ctx->print_filename) {
        print_result_with_filename(result, filename);
    } else {
        print_result(result);
    }

    // --snip--
}

// --snip--

void print_result_with_filename(const uint64_t result, const char* filename) {
    printf("%s: %lli\n", filename, result);
}

// --snip--

Our void pointer lets us add a new property to the CommandContext without touching any code on the Rust side.

We also have to add a print_result_with_filename() function, and selecteviley execute it in run_command_for_file().

A final test is in order:

$ cmake --build .
$ ./count characters list.csv --csv-list
18 chapter1.md
14 chapter2.md

This output is much easier to digest.

In this section we have showed how you can pass control from C to Rust and back again. In the next one, we will will see how we can pass heap allocated data from Rust to C.

Transferring ownership of data

In the previous section we added the ability to run calculations on a comma separated list of filenames, counting characters in e.g. a list of chapters in a book.

But what if you want to add all those counts together, counting the number of characters in the entire book? We'll add a new --csv-merged flag that merges all the files, and run a count on the result:

./count version
./count bytes list.csv [--csv-list | --csv-merged]
./count characters list.csv [--csv-list | --csv-merged]

Supplied with the same files as the previous chapter, it should work like this:

$ ./count characters list.csv --csv-merged
32

To make things interesting we will implement this with a Rust function that takes ownership of the incoming CSV data, and that returns an owned string of the merged file content. We want something like this:

#![allow(unused)]
fn main() {
pub extern "C" fn csv_merge_files(csv: *mut c_char) -> *mut c_char;
}

Transferring ownership means that csv should be free'd by csv_merge_files, and preferably not be used anymore by the caller. The caller also has to make sure the returned string is dropped. We'll get back to how we deal with this a little bit later.

The new function will also have to read the files before it can merge them together. Our application already has a file-module written in C, and we're going to reuse that logic. Calling C functions from Rust in the topic of the next section, so we'll make a mock module to help us in the meantime:

Filename: src/modules/file/mod.rs

#![allow(unused)]
fn main() {
pub struct File(String);

impl File {
    pub fn to_str(&self) -> &str {
        if self.0 == "chapter1.md" {
            "# Getting started\n"
        } else if self.0 == "chapter2.md" {
            "# Wrapping up\n"
        } else {
            panic!("No content defined for file: {}", self.0);
        }
    }
}

pub fn read_file(filename: &str) -> File {
    File(filename.to_owned())
}
}

Add the new module to our library:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
mod modules {
    mod csv;
    mod file;
}

// --snip--
}

Before we forget, let's add the new file to the RUST_LIB_SOURCES list in CMakeLists.txt: ${CMAKE_SOURCE_DIR}/src/modules/file/mod.rs

NOTE

It would of course have been easier to rewrite the file_read() function completely in Rust, but with this project we want to show how you can reuse pre-existing code. When gradually porting a real world application, this is a useful skill.

Merging the files

The file merging logic is rather simple. We read each file, and push the text content to a merged string:

Filename: src/modules/csv.rs

#![allow(unused)]
fn main() {
use crate::modules::file;

// --snip--

fn merge_files(csv: &str) -> String {
    let mut merged = String::new();
    for value in csv.split(",") {
        let file = file::read_file(value.trim());
        merged.push_str(file.to_str());
    }
    merged
}
}

We also need an FFI wrapper to be able to call it from C:

Filename: src/modules/csv.rs

#![allow(unused)]
fn main() {
// --snip--

mod ffi {
    // --snip--

    #[no_mangle]
    pub extern "C" fn csv_merge_files(csv: *mut c_char) -> *mut c_char {
        let csv_str = unsafe { CStr::from_ptr(csv) }.to_str().unwrap();
        let merged = super::merge_files(&csv_str);
        CString::new(merged).unwrap().into_raw()
    }
}
}

CString::into_raw is the only thing new here. It gives us a *mut char and consumes the original CString. The last part is important - to stop the CString destructor from immediately clearing the string upon returning. We transfer ownership of the data to C.

Putting the pieces together

To be able to pass along the new flag, we need to update our command line parser:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
// --snip--

/// cbindgen:prefix-with-name
#[repr(C)]
pub enum FileMode {
    Normal,
    CsvList,
    CsvMerged
}

#[no_mangle]
pub extern "C" fn parse_args(argc: usize, argv: *const *const c_char) -> Arguments {
    // --snip--

    let file_mode = if let Some(csv_flag) = arguments.get(3).copied() {
        let csv_flag = unsafe { CStr::from_ptr(csv_flag) }.to_str().unwrap();
        match csv_flag {
            "--csv-list" => FileMode::CsvList,
            "--csv-merged" => FileMode::CsvMerged,
            _ => panic!("CSV flag not recognized: {csv_flag}")
        }
    } else {
        FileMode::Normal
    };

    // --snip--
}
}

And we also need to update our main()-function to use the new function:

Filename: src/main.c

// --snip--

int main(const int argc, const char *argv[]) {
    // --snip--

    switch (args.file_mode) {
        // --snip--
        case FileMode_CsvMerged: {
            char* csv = file_to_string(file_read(args.filename));
            char* content = csv_merge_files(csv);
            const size_t result = do_calculation(args.command, content);
            print_result(result);
            break;
        }
    }

    // --snip--
}

We read the CSV file, and pass it onto csv_merge_files(). Since all the file reading is already done once we have the merged content, we skip right ahead to do_calculation() and print_result().

We can go ahead and test the new functionality:

$ cmake ..
$ cmake --build .
$ echo "chapter1.md,chapter2.md" > list.csv
$ ./count characters list.csv --csv-merged
32

This is indeed the output we expected. The files chapter1.md and chapter2.md have 32 charactes in total.

But while we passed ownership of data in both directions, we never did any cleanup. For a small program like this it wouldn't really matter, as the operating system will reclaim the memory when the process finishes. It is however a good practice to free all the memory you use, and in a long running or data intensive application it is strictly necessary, lest you'll run out of memory eventually.

Freeing up memory from the other side

We cannot reliably free C allocated memory from Rust, and vice versa. It has to be free'd / dropped where it was created. In practice this means that if you transfer ownership of heap allocated data across the language border, you also have to provide a way to deallocate that data.

Let's start with the csv argument that we pass to csv_merge_files(). Alongside our data, we require a callback function, free_csv, that will free it. We use this callback to free csv as soon as we're done with it:

Filename: src/modules/csv.rs

#![allow(unused)]
fn main() {
mod ffi {
    // --snip--

    #[no_mangle]
    pub extern "C" fn csv_merge_files(
        csv: *mut c_char,
        free_csv: unsafe extern "C" fn(*mut c_char),
    ) -> *mut c_char {
        let csv_str = unsafe { CStr::from_ptr(csv) }.to_str().unwrap();
        unsafe { free_csv(csv); }
        let merged = super::merge_files(&csv_str);
        CString::new(merged).unwrap().into_raw()
    }
}

// --snip--
}

We need to update main.c to pass along the callback:

Filename: src/main.c

// --snip--

int main(const int argc, const char *argv[]) {
    // --snip--

    switch (args.file_mode) {
        // --snip--
        case FileMode_CsvMerged: {
            // --snip--
            char* content = csv_merge_files(csv, file_free_string);
            // --snip--
        }
    }
}

// --snip--

After these changes, csv_merge_files() is truly the responsible owner of csv. But it also returns data, that we should deal with. We start by adding a function for C to free the data:

Filename: src/lib.rs

#![allow(unused)]
fn main() {
mod ffi {
    //--snip--

    #[no_mangle]
    pub extern "C" fn csv_free_merged_file(merged: *mut c_char) {
        unsafe { CString::from_raw(merged) };
    }
}

//--snip--
}

At first glance it might look like this code is doing nothing. But just like CString::into_raw() transfers ownership away from an object, CString::from_raw() will reclaim it. Since the CString immediately goes out of scope, it's destructor will be called, and the data will be deallocated. Let's clean up after ourselves in main.c:

Filename: src/main.c

// --snip--

int main(const int argc, const char *argv[]) {
    // --snip--

    switch (args.file_mode) {
        // --snip--

        case FileMode_CsvMerged: {
            char* csv = file_to_string(file_read(args.filename));
            char* content = csv_merge_files(csv, file_free_string);
            const size_t result = do_calculation(args.command, content);
            csv_free_merged_file(content);
            print_result(result);
            break;
        }
    }

    // --snip--
}

A call to csv_free_merged_file() has been added, and all memory should now explicitly have been taken care of.

References vs owned data

When you transfer data to or from C, you generally have to read the documentation to know who's the owner of the data after the function call has happened.

We've created a mini-reference below for how you can pass common types of heap data through FFI.

Passing heap data from C to Rust

From -> toConversion
*const c_char -> &str (reference)unsafe { CStr::from_ptr(char_ptr) }.to_str().unwrap()
*mut c_char -> &str (owned)unsafe { CStr::from_ptr(char_ptr) }.to_str().unwrap()
*const u8 + len -> &[u8] (reference)unsafe { slice::from_raw_parts(ptr, len) }
*const u8 + len -> &[u8] (owned)unsafe { slice::from_raw_parts(ptr, len) }

Remember that if you transfer ownership, you should also supply a callback or a function to free the memory.

Passing heap data from Rust to C

From -> toConversion
&str -> *const c_char (reference)CString::new(rust_str).unwrap().as_ptr()
String -> *mut c_char (owned)CString::new(rust_str).unwrap().into_raw()

Deallocation:
unsafe { CString::from_raw(char_ptr) };
&[u8] -> *const u8 (reference)u8_slice.as_ptr()
Vec -> *mut u8 (owned)vec.shrink_to_fit();
let mut vec = mem::ManuallyDrop::new(vec);
let ptr = vec.as_mut_ptr()
let len = v.len();

Deallocation:
unsafe {
    Vec::from_raw_parts(ptr, len, len)
}
RustType -> *const RustType (reference)&rust_obj as *const RustType
RustType -> *mut RustType (owned)Box::into_raw(Box::new(rust_obj));

Deallocation:
unsafe { Box::from_raw(ptr) };

In the next section we will give you even more control, by enabling you to call any C function directly from Rust, not just callbacks.