Taking references and returning primitives from Rust
Let's add a third command to our application:
./count version ./count bytes file.txt ./count characters file.txt
The number of bytes and characters in a file will often be the same. But a byte can only represent 256 different values, and to support all the alphabets out there, and special characters like emojis, UTF-8 encoded text allows for multiple bytes representing single characters.
Example: while the string Kaimū has five characters, it is six bytes
long because the ū is a two-byte character.
C has no built-in support for multibyte characters, so let's implement the counting function in Rust, where Unicode characters are first-class citizens.
Dispatching our new command
This time we start from the other side by adding a function
declaration for our new function: count_characters(). We
also add an else if clause to dispatch the new command
in do_calculation():
Filename: src/main.c
// --snip --
void print_version();
uint64_t count_characters(const char* text);
// --snip --
uint64_t do_calculation(const char* command, const char* data) {
if (strcmp(command, "bytes") == 0) {
return count_bytes(data);
} else if (strcmp(command, "characters") == 0) {
return count_characters(data);
} else {
fprintf(stderr, "Unrecognized command: %s\n", command);
exit(1);
}
}
Adding a function with parameters and a return type
Filename: src/lib.rs
#![allow(unused)] fn main() { use std::ffi::CStr; use std::os::raw::c_char; // --snip-- #[no_mangle] pub extern "C" fn count_characters(text: *const c_char) -> u64 { let text = unsafe { CStr::from_ptr(text) }; let text = text.to_str().expect("Unicode conversion failed."); text.chars().count().try_into().unwrap() } }
Our new function takes a pointer to C characters, represented
by the c_char type from std::os::raw. We then turn it into
a CStr with CStr::from_ptr(). CStr is a handy
utility class that deals with
C string references (i.e. it doesn't try to take ownership
of or free the data).
By converting the CStr to a &str, we gain access to Rust's
regular string utilities and can proceed with
text.chars().count() to get the number of Unicode characters.
The function returns a plain u64, since that type matches the
definition of uint64_t on the C side.
Let's try it:
$ cd build
$ cmake --build .
$ echo "Kaimū" > kaimu.txt
$ ./count bytes kaimu.txt
6
$ ./count characters kaimu.txt
5
Which types match up?
Here's a quick reference of the most common Rust primitives you can pass directly across the FFI boundary.
| Rust | C |
|---|---|
| bool | bool |
| u8 / i8 | uint8_t / int8_t |
| u16 / i16 | uint16_t / int16_t |
| u32 / i32 | uint32_t / int32_t |
| u64 / i64 | uint64_t / int64_t |
| f32 | float |
| f64 | double |
| usize | uintptr_t |
There are also some compatibility types in std::os::raw for the platform-specific C types. Here's an excerpt:
| Rust | C |
|---|---|
| c_char | char |
| c_int | signed int |
| c_uint | unsigned int |
| c_long | signed long |
| c_ulong | unsigned long |
| c_void | void |
NOTE: The C standard does not strictly define the length of float and double, but in practice, this mapping will work on all major platforms. For the paranoid, there's also a
c_floatand ac_doubleinstd::os::raw.You can read more detailed documentation about the memory layout of scalar types here.
In the next chapter, we will discover how the function declarations that our C code needs from Rust can be generated automatically instead of the error-prone and tedious task of writing and maintaining them by hand.