C Interop
This is a continuation of the posts on The Path to 1.0.
If you’ve ever written a BPF program from scratch you know there are two main components: the userspace code and the BPF/Kernel code. The former is used to load, attach, and process data from the latter. The BPF/Kernel code is, for the most part, a C program that is a subset of standard C with verifier constraints, meaning certain things are disallowed in order to protect the kernel from crashes or extended delays in scheduling.
bpftrace abstracts these things away, letting you write a single program (or script) that transparently handles all this complexity. This is a major selling point, but comes at a cost as users are limited to the BPF features which have been hard-coded into bpftrace. If you need something new, you have to wait for someone to add it to bpftrace then wait months for a release — too slow, even by kernel standards. That is to say, users did have to wait...
C interop allows bpftrace scripts to communicate directly with raw BPF C code, regardless of whether the code lives inside or outside the main codebase. Let’s see what this looks like in practice.
import "my_c_lib.bpf.c";
begin {
print(__add_one(1));
exit();
}
And the imported "my_c_lib.bpf.c" C file:
int __add_one(int val) { return 1 + val; }
All this does is provide a function called "__add_one". However, you can see in the bpftrace script that you can now call this function and utilize its return value. This script prints ‘2’! Amazing, I know. How about a more complicated example, which has already been integrated into bpftrace’s standard library[1].
#define __KERNEL__
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include "strings.h"
extern struct task_struct *bpf_task_from_pid(s32 pid) __weak __ksym;
extern void bpf_task_release(struct task_struct *p) __weak __ksym;
int __bpf_task_comm_from_pid(s32 pid, m_arg *out) {
if (!bpf_task_from_pid || !bpf_task_release)
return -95;
struct task_struct *task = bpf_task_from_pid(pid);
if (task) {
__builtin_memcpy(&out->data, &task->comm, sizeof(*out));
bpf_task_release(task);
return 0;
}
return -3;
}
This BPF C code (written by a bpftrace contributor, Rtoax) utilizes a kfunc (bpf_task_release) and logic that has, obviously, not been manually written in LLVM IR (a former, difficult requirement of new bpftrace features). There are also extern function declarations that are checked at runtime to handle kernel versions that don’t yet support bpf_task_release; this would have been A LOT of bespoke logic sprinkled throughout the bpftrace codebase before C interop.
Now users don’t have to wait for bpftrace if they want to call a new kfunc, or BPF helper, or utilize some other BPF feature. They can just write the BPF C code themselves, import it, and call it.
Why Is This Special?
If you’ve been around the block, you know that BCC has been able to interop with BPF C code for years and this isn’t a groundbreaking revelation. However, consider that in BCC you still have to deal with BPF verifier errors, lifecycles, and other complicated (and poorly documented) BPF concepts. This is to say we don’t intend for the majority of bpftrace users to write their own BPF C code but at least now they can. It’s an escape hatch that will prevent users from needing to abandon bpftrace all together if they need something that isn’t provided out of the box.
C interop also empowers anyone to easily add new features to the standard library (example), lowering the barrier for new and existing contributors. We’re in the process of migrating current functions to C to reduce the surface area and complexity of LLVM IR code, which is hard to get right. Writing these functions in C let’s compiler do the heavy lifting and all we have to do is link them in at the end.
Types
For C interop to work properly we needed to be able to pass arguments to external C functions and handle return values meaning that bpftrace’s own internal type system (SizedType) needed to interop with C types, or, more specifically, with BTF (BPF Type Format). We’ve added a layer that can go between these two type systems but it’s still a work in progress; please file bugs if you encounter them.
We plan to eventually unify our internal type system around BTF but SizedType still has some special properties that don’t exist in BTF like "address space", which is used to determine what BPF helper functions bpftrace needs to call, and internal types like stats_t, which is used to determine the formatting when printed.
Pointers
Previously, users couldn’t get a pointer to a bpftrace scratch variable or map because they didn’t need this functionality. If bpftrace invoked a BPF helper or kfunc, it would handle the pointer wrangling under the hood in the LLVM IR layer. For C interop we needed a generic solution. So we added the address-of operator (&) to pass a pointer to variables and maps to imported BPF C functions. It’s not pretty (e.g. &$my_var) but it works and is familiar enough if you’re already writing C. We plan on extending this in the future so pointers can also be utilized for expressions and functions.
Compiling C
I glossed over this part but if bpftrace is allowing imports of raw BPF C files, how are they getting compiled and linked? Well, if you didn’t know this already, bpftrace ships with LLVM as a dynamic dependency, making it a very large binary indeed. But instead of forcing users to compile their own BPF C files with something like bpftool (which makes C interop less convenient) we decided to lean in and do this compilation and linking on the fly. This was also necessary because bpftrace supports multiple versions of LLVM and if we built the BPF C part of the standard library for one LLVM version it might not work on another. The compiling and linking steps are actually pretty fast but we've added conditional compilation (based on the script contents) and intend to add caching soon to ensure snappy startup times.
We’re also still moving toward a world where you can do ahead-of-time (AOT) compilation of bpftrace scripts which yield a much smaller binary that can be shipped around and run on more memory constrained devices, like cell phones.
A Work In Progress
There is still a lot more to be done for C Interop including:
- adding the ability to write full BPF C programs (including the attachpoint) and have bpftrace
- automatically manage the life cycle
- provide an API to the userspace part of bpftrace so users don’t have to rely exclusively on the
printfamily of bpftrace builtins - better sharing of maps between bpftrace and C code
- automatic type coercion for C function return values and arguments
Additionally, because this feature hasn’t been officially released (early next year) there will probably be a lot of small bugs and sharp edges to work out so please bear with us and submit feedback/issues.