Caching and Persistence in Spark: Storage Levels and When to Use Them
TLDR: Calling cache() or persist() does not immediately store anything — Spark caches lazily at the first action, partition by partition, managed by a per-executor BlockManager. When memory fills up, LRU eviction silently drops or spills partitions. ...
abstractalgorithms.dev22 min read