9 minutes
Compiling Rust Just-In-Time, the easy way
I just learned about dlopen
a few days ago while scrolling on Twitter, it’s a function for programmatically loading dynamic libraries and then calling into them or using exported symbols.
I already knew it was possible before, but until then I didn’t know how exactly, but now that all this power is at my fingertips I started to think about all kind of crazy things I could do with that.
For instance, I can load libraries at runtime, but nothing prevent me from also building them at runtime with my compiler right? So what happens if I dynamically generate code, compile it, link it to my program and call into it? According to Wikipedia that’s exactly what Just-In-Time (JIT) compilation is:
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program – at run time – rather than before execution.
Wikipedia
Granted, it’s not the kind of JIT compilation people usually talk about, and the performances will probably be terrible, but hey let’s do it anyway.
Get your linker ready, here we go!
Building a dynamic library in Rust
Let’s start simple, we’ll create a dynamic library with a hello
function in a single hello.rs
file:
#[no_mangle]
pub extern fn hello() {
println!("Hello, world!");
}
Here the hello
function is marked as extern, this tell the compiler that we want to make this function available to other programs to use.
We will use rustc
directly instead of cargo, this will make things easier later when compiling at runtime.
One of the arguments rustc accepts is the crate type, here we want a dynamic library:
rustc --crate-type=dylib hello.rs
This will create a libhello.so
on Linux (or a .ddl
on Windows and .dylib
on MacOs, but to be honest I don’t really know how these OSes handle dynamic libraries so what I will say here applies mostly to Linux).
Libraries traditionally begin with the lib prefix and some tools expect it, so if you decide to rename your output with the -o
flag be sure to use the lib prefix too.
To check that it worked let’s look for the hello symbol with nm: nm -D libhello.so | grep hello
0000000000063160 T hello
Yes, the symbol is here, and according to the doc ‘T’ means that it lives in the text (code) section (so it actually points to instructions, great) and the uppercase means that the symbol is global (exported).
I didn’t comment about the #[no_mangle]
bit, it tells the compiler net to do funny stuff with the function name, without it I got something like:
0000000000064160 T _ZN4dlib5hello17h40cf6e2e0e47355eE
Much harder to remember… And I doubt there are any kinds of stability guarantee.
Linking our library
Now it’s time to load our library, we will create a cargo project, let’s say ‘rs-jit’, and declare an extern function:
// src/main.rs
extern {
fn hello();
}
fn main() {
println!("Trying to load a library...");
unsafe {
hello();
}
}
Running this will panic, indeed we need to do something to tell the compiler about our library.
When doing this kind of stuff it’s often a good idea to also build the library as a part of the cargo build process, we can create a build.rs
at the root to tell cargo how to build our dependencies:
// build.rs
use std::env;
use std::process::Command;
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
// Compile our dynamic library
Command::new("rustc")
.args(&["--crate-type=dylib", "dlib/hello.rs", "-o"])
.arg(&format!("{}/libhello.so", out_dir))
.status()
.unwrap();
// Linking directives
println!("cargo:rustc-link-search={}", out_dir);
println!("cargo:rustc-link-lib=dylib=hello");
// Re-run this script only when the library has changed
println!("cargo:rerun-if-changed=lib/hello.rs");
}
This script assumes that we put our hello.rs
file in a dlib
folder, and will call rustc to build the library as we did before. What is great with this approach is that we can ask Cargo to re-build the library when the source file change.
Notice that we told cargo about our hello library and where to find it (in out_dir
, which is somewhere in the target/
directory).
Let’s run it with cargo run
:
Trying to load a library...
Hello, world!
Great, it worked!
It is also possible to inspect the dependencies of an executable with the ldd
command, for instance with ldd target/debug/rs-jit | grep hello
:
libhello.so => not found
Ho wait, not found? But it worked right?
Yes, that is because we told cargo where was our library, he is then running our executable with an updated environment that contains the library. In fact if we run our executable directly it will crash:
./target/debug/rs-jit: error while loading shared libraries: libhello.so: cannot open shared object file: No such file or directory
We can ask cargo to tell us what he is doing exactly with the -vv
(very verbose) flag.
When running cargo run -vv
an inspecting the output there are quite a lot of flags, but one in particular caught my attention: -L /home/<path-to-my-folder>/rs-jit/target/debug/build/rs-jit-923fa3340f768e20/out
Usually, the -L
flags is used to tell a compiler where to look for libraries, so I strongly suspect it points to the dynamic library in the cargo target
folder. We can check that by running the executable in an environment with the LD_LIBRARY_PATH
set to this value, it’s a variable used by the linker to look for libraries:
LD_LIBRARY_PATH=/home/<path-to-my-folder>/debug/build/rs-jit-923fa3340f768e20/out ./target/debug/rs-jit
Trying to load a library...
Hello, world!
And it works again!
Loading libraries at runtime
Let’s remove our build.rs
and do the linking at runtime: translating the example of the manual in rust is pretty straightforward if we don’t mind doing things the C (unsafe) way.
Let’s suppose we have a dynamic library libhello.so
in the working directory, to call a function linked at runtime we need to:
- Load the library using
dlopen
. - Search for a symbol (the function) in the library with
dlsym
. - Close the library once we are done (with
dlclose
).
The libc
crate re-export all these functions, with their raw C interface, so it boils down to this:
use libc::{c_void, dlclose, dlopen, dlsym, RTLD_NOW};
use std::ffi::CString;
fn main() {
println!("Trying to load a library...");
unsafe {
// Load the library
let filename = CString::new("./libhello.so").unwrap();
let handle = dlopen(filename.as_ptr(), RTLD_NOW);
if handle.is_null() {
panic!("Failed to resolve dlopen")
}
// Look for the function in the library
let fun_name = CString::new("hello").unwrap();
let fun = dlsym(handle, fun_name.as_ptr());
if fun.is_null() {
panic!("Failed to resolve '{}'", &fun_name.to_str().unwrap());
}
// dlsym returns a C 'void*', cast it to a function pointer
let fun = std::mem::transmute::<*mut c_void, fn()>(fun);
fun();
// Cleanup
let ret = dlclose(handle);
if ret != 0 {
panic!("Error while closing lib");
}
}
}
Let’s run it:
Trying to load a library...
Hello, world!
Perfect!
A (kind of) JIT compiler
Now we can:
- Build a dynamic library from rust (as we did in
build.rs
). - Load a dynamic library at runtime and call into a function.
Let’s combine both and create a JIT compiler!
I propose to build a simple calculator: the user can give an expression with two variables a
and b
and a value for those, then the expression is compiled by creating a small rust program and calling rustc on it, finally we link the library and call the function with the given values.
First we can build a small JIT engine backed by a file: when asked to compile an expression we write down a simple rust function by formatting a template
use std::fs::File;
use std::io::prelude::*;
use std::io::SeekFrom;
use std::process::Command;
const SOURCE_PATH: &'static str = "/tmp/jit.rs";
const LIB_PATH: &'static str = "/tmp/librsjit.so";
const FUN_NAME: &'static str = "calc";
pub struct JitEngine {
file: File,
}
impl JitEngine {
pub fn new() -> Self {
let file = File::create(SOURCE_PATH).expect("Could not create file");
Self { file }
}
// Compile and expression and return a wrapper around the linked function
pub fn compile(&mut self, expression: &str) -> Fun {
// Reset the source file
self.file.set_len(0).unwrap();
self.file.seek(SeekFrom::Start(0)).unwrap();
// Write the rust program
self.file
.write_all(
format!(
"
#[no_mangle]
pub extern fn calc(a: i64, b: i64) -> i64 {{
{}
}}",
expression
)
.as_bytes(),
)
.unwrap();
// Compile the sources
Command::new("rustc")
.args(&["--crate-type=dylib", SOURCE_PATH, "-o"])
.arg(LIB_PATH)
.status()
.unwrap();
// Return a wrapper around the function
unsafe { Fun::new(LIB_PATH, FUN_NAME) }
}
}
This is obviously a disaster from a security point of view, we are basically giving the user the right to execute arbitrary code on our machine, please don’t do that in real life :)
The Fun
struct in the previous snippet is a small wrapper around dlopen
, it loads the library, then the symbol and close the library on Drop
:
/// A function from a library dynamically linked.
pub struct Fun {
fun: fn(a: i64, b: i64) -> i64,
handle: *mut c_void,
}
impl Fun {
unsafe fn new(lib_path: &str, fun_name: &str) -> Self {
// Load the library
let filename = CString::new(lib_path).unwrap();
let handle = dlopen(filename.as_ptr(), RTLD_NOW);
if handle.is_null() {
panic!("Failed to resolve dlopen")
}
// Look for the function in the library
let fun_name = CString::new(fun_name).unwrap();
let fun = dlsym(handle, fun_name.as_ptr());
if fun.is_null() {
panic!("Failed to resolve '{}'", &fun_name.to_str().unwrap());
}
// dlsym returns a C 'void*', cast it to a function pointer
let fun = std::mem::transmute::<*mut c_void, fn(i64, i64) -> i64>(fun);
Self { fun, handle }
}
pub fn call(&self, a: i64, b: i64) -> i64 {
(self.fun)(a, b)
}
}
impl Drop for Fun {
fn drop(&mut self) {
unsafe {
let ret = dlclose(self.handle);
if ret != 0 {
panic!("Error while closing lib");
}
}
}
}
And then we are good to go! Let’s write a small program jitting some expressions:
fn main() {
let mut jit = JitEngine::new();
loop {
println!("Value for a:");
let a = read_value();
println!("Value for b:");
let b = read_value();
println!("Expression:");
let expression = read_expression();
let fun = jit.compile(&expression);
let result = fun.call(a, b);
println!("{}\n", result);
}
}
You can find the full source code here.
Let’s compile and try it:
Value for a:
3
Value for b:
4
Expression:
a*b
12
Yeah, it works! There is ~1s of latency for computing the result, due to the call to rustc, but we are not here for performances anyway ¯\_(ツ)_/¯
Hope you enjoyed this post, feel free to report any bug/error or post a comment on github or on the source code.