Fixing Bazel out-of-memory problems
Memory management is a generally hard topic in computer systems operations. Debugging it inside a cloud-hosted build system is even worse!
There are two potential problems:
The Bazel server runs in a JVM, and it internally tries to allocate more obj...
blog.aspect.build4 min read
I have a few notes to add to this:
In my experience, Bazel OOM is far trickier than it should be. I reported in github.com/bazelbuild/bazel/issues/15959 an issue where Bazel JVM could run into OOM... slowly. When that happen, Bazel JVM does not get killed, but instead, get stuck in a deadlock until you manually kill the JVM process.
Instead of relying on
bazel info | grep heap, you could track JVM memory data via BES with--memory-profileflag.A great mitigation I use is to regularly(weekly / daily) shutdown Bazel JVM with
bazel shutdownand perform a clean build to start up a new JVM process to avoid any long-term memory leak issues. This bought us a lot of time to delay the troubleshooting of Bazel's internal memory.experimental_local_memory_estimateis actually no-op in Bazel latest code base. See cs.opensource.google/bazel/bazel/+/361ce673ad2b95โฆThe downside of using
exec_propertiesis that you could only set resource constraint per mnemonic via rule definition. These value are hard set, so if you have a small binary A and a big binary B being built using the same rules, both will get the same amount of resources(CPU and RAM). In Bazel 5.3, there is a new change that would allow you to define a starlark function that would adjust resources consumption of each action dynamically, allow Bazel to be smarter about action scheduling.