Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?
TL;DR
After del tensor; torch.cuda.empty_cache(), PyTorch’s caching allocator still holds 53.7 MB that it won’t release. We traced the CUDA Runtime and Driver APIs with eBPF uprobes to see exactly wh
ingero.hashnode.dev7 min read