The Case of the Vanishing CPU
A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging,
revealing a deeper issue within the Linux kernel’s memory management. This is an
article written by Sergei Trifonov featuring bpftrace.
A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging,
revealing a deeper issue within the Linux kernel’s memory management. This is an
article written by Sergei Trifonov featuring bpftrace.
I spent a few weeks earlier this year [tracking down][1] a set of flaky
end-to-end tests where bpftrace would occasionally cease to print output. I
had gotten as far as figuring out std::cout
had [badbit][0] set after a write
but had run out of ideas on how to debug it. At the time, because I could not
reproduce it locally, I had assumed it was an oddity with pipes and CI and
given up.
Except bugs never go away. They only lay dormant.
Ever wondered what gets written into the big global bit bucket, /dev/null
? No? On busy, active systems it is not only interesting to see what is writen to this file but it may actually be extremely useful for debugging and troubleshooting. This is simply because developers frequently redirect stderr to /dev/null either in applications or in scripts and, while this may be the correct thing to do most of the time, it can sometimes obscure interesting runtime behaviour.