Why your bpftrace programs should not include kernel headers.
Imagine you write a bpftrace program which needs to access a data structure of
some kernel data type, say struct task_struct
. In order to generate correct
offsets for accessing the struct fields, bpftrace needs to know the layout of
the type on the running kernel. Historically, this could be achieved by
providing the correct kernel headers to the program using the #include
directive. With the coming of BTF (BPF Type Information), this is no longer
necessary as bpftrace is able to automatically extract the types layout from
BTF. Therefore, for a vast majority of use-cases, including headers is not only
unnecessary, but can also lead to unexpected problems and should be avoided, if
possible. In this blog post, we will look into the reasons why that is the case
and show that the less headers a bpftrace program includes, the more portable it
is across kernel versions.
The #include
directive
The #include
directive in bpftrace works similarly to C - it copies the
contents of the included header (and recursively of all the headers that it
includes). Then, it runs Clang to parse the obtained code and extracts type and
enum definitions for the relevant types. This gives users a convenient and
natural way to provide layout of the types used by the script to bpftrace. Since
bpftrace is intended for both kernel and userspace tracing, the included headers
are searched in the standard system paths as well as in the include directories
of the running kernel.
Limitations
While the #include
directive is a powerful mechanism, it has its problems,
especially in the kernel. Let us look at the most important ones:
- Types defined in source directories. Some types in the kernel are not
defined in the standard include directories. Instead, they are either defined
in “internal” headers located next to the source files or directly in the
source files themselves. In both cases, bpftrace doesn’t know how to find
such types so, if the script works with them, the only option is to embed
them directly in the script. This is, however, error-prone,
maintenance-heavy, and not very portable as the type layout can vary between
kernel versions. A good example is the
runqlen.bt
tool from the bpftrace repo which contains an embedded definition of
struct cfs_rq
(from an internal kernel headerkernel/sched/sched.h
). We even need to maintain another version of the tool for usage with kernels older than 6.14 since the layout of the type changed since that version. - Types being moved between kernel headers. In some cases, a kernel patch
may cause bpftrace
#include
s to stop working, if a type is moved into a different header. In such a situation, you need to maintain multiple versions of your script for different kernel versions.
BTF: a better way to work with types
There exists a solution to overcome the above mentioned problems and that solution is called BTF (BPF Type Format). In short, BTF is a kind of compact debugging information which (among other things) contains definitions of all kernel types. Thanks to its small size, it can be embedded directly in the kernel (as opposed to DWARF) so most modern distros ship BTF by default these days.
bpftrace automatically reads BTF of the running kernel and uses it to resolve
kernel types. Therefore, if the script operates on kernel types only, it is not
necessary to use the #include
directive at all - the layout of all types will
be deduced from BTF. This not only allows you to simplify the bpftrace script
but also makes it more portable - correct types from the running kernel will
always be used, no matter where in the kernel they are defined.
So, can we just drop all #include
statements?
In most cases yes, but not always. There are situations when you still need to include headers and we will look into them in this section.
Information not in BTF
There still remains some information defined in header files but not present in
BTF. Probably the best example is constants defined via the macro #define
directive. If you want your bpftrace script to use the macro name instead of its
numerical value, you either need to include the appropriate header or redefine
the macro within the script (yes, bpftrace supports the #define
directive).
Enum types
At the moment, bpftrace doesn’t support extracting enum types from BTF, despite the fact that they are there. This is a limitation of bpftrace which is currently being worked on. If you need to use enum values in the meantime, you need to include the appropriate headers or define the constants manually.
Userspace types
Once your bpftrace script uses userspace types, BTF will not help - userspace
types are, naturally, not included in the kernel BTF. Good news is that bpftrace
has other ways to help you. For one, if the traced application contains debug
info (in DWARF format), bpftrace can read it and extract the type layout from it
and you don’t need to include any headers. Another nice feature is that types
from included headers, BTF, and DWARF can be used simultaneously, provided they
do not conflict. If they do conflict, only the types from headers are used and
BTF/DWARF is ignored. This usually happens when including system headers from
the sys/
directory which often define userspace variants of internal kernel
types.
Type conflicts
There may be situations (such as one of the above) when including headers is unavoidable. Then, it may happen that some included type definition is conflicting with a definition taken from BTF. For such a case, bpftrace will disable BTF and only rely on types from the included headers so you will need to include everything necessary.
Conclusion
Putting it all together, the general recommendation is to completely avoid including headers, if possible and let bpftrace extract the types from BTF. When tracing the kernel, start with no headers and only add includes if bpftrace fails to parse your script. For userspace tracing, only include the minimal amount of userspace headers (if your script works with userspace types) and try avoid adding includes from the kernel. Following this advice will let bpftrace leverage BTF as much as possible, which will make your script shorter and more portable across kernel versions.