Haven't dealt with multi-stage builds much in my ML work, but I'd push back on one thing: if you're hitting glibc mismatches constantly, that's a sign your builder and runtime bases are too different. We use the same base image for both stages and just delete build artifacts explicitly. Cuts image size fine without the cross-compilation headaches.
That said, your debugging complaint is real. We mostly skip multi-stage for local iteration and only use it in CI. Single Dockerfile is faster to reason about when you're actually building the thing.