@BytesOfDeepankar
Making data sexy, one query at a time: Data's ultimate wingman.
I am available for sharing my learning and knowledge with everyone and taking feedback from everyone and grow with everyone.
Dec 25, 2025 · 8 min read · I recently built an Apache Spark standalone cluster on a single Raspberry Pi 5 (8 GB RAM) using Docker.The cluster had: 1 Spark Master 4 Spark Workers: harvey, mike, donna, louis (named after Suits characters 😄) Strict memory limits per container...
Join discussion
Dec 22, 2025 · 4 min read · A question I kept coming back to while comparing Spark with BigQuery was this: If Spark executors write shuffle data to disk, and that disk still exists, why can’t other executors read that data when one executor dies? At first glance, it feels lik...
Join discussion
Dec 22, 2025 · 4 min read · BigQuery decouples storage from compute using Google’s high-speed Jupiter network. It uses a multi-tenant architecture. Client Request: You submit a GoogleSQL query via the Console, API, or CLI. Query Parsing & Optimization: The query reaches the Q...
Join discussion
Dec 22, 2025 · 3 min read · I’ve spent my first week diving into BigQuery's internals. Everyone talks about "serverless," but the real magic happens at the Leaf Node level. If you’re used to Spark Executors or traditional MPP workers, the way BigQuery handles "Leaf Nodes" (also...
Join discussion