Tag feed

#apache-spark

131 posts46 followers

Explore Hashnode

Alternatives

Trending tags this week

RKRithwik Kumar Nagulapatirithwikn.hashnode.devJul 5 · 7 min read

Beating the 7-Day Cluster TTL: How I Built a Graceful Shutdown Watchdog for a Kafka Streaming Pipeline

It started with a cost conversation My manager flagged the VMware costs on our streaming pipeline. Two machines running 24/7 — two Kafka consumers and a Bootstrap application, always on, whether traff

0

KKKushneet Kaurcloudnativebykushneet.hashnode.devJun 20 · 5 min read

What is a Lakehouse?

This article is part of the Databricks from Scratch series.Start from the beginning: Stop Optimising Your Prompts. Fix Your Data Pipelines. Picture this. It's IPL ticket booking day. 10 AM. 1 crore

0

APAndrea Parisandreaparisdata.hashnode.devJun 11 · 6 min read

The Moment I Realised a Database Is Not a Data Warehouse

Context The platform combines real-time cryptocurrency market data from the CoinGecko API with sentiment analysis derived from cryptocurrency-related YouTube discussions. Apache Kafka, Spark Structure

0

APAndrea Parisandreaparisdata.hashnode.devJun 10 · 6 min read

Building a Real-Time Crypto Analytics Platform

From Streaming Pipeline to Analytics Platform When I started learning data engineering, I wanted a project that would force me to use the technologies I was studying in a realistic setting. Tutorials

0

VDVishnu Dthedatatrench.hashnode.devMay 29 · 9 min read

Spark Architecture Simply Explained

You have been using Spark for months -- running notebooks, submitting jobs, reading docs. But when someone asks you to explain what actually happens when a job runs, you find yourself stalling. The co

0

APAishwarya Patankaraishwaryapatankar.hashnode.devApr 30 · 5 min read

Behind Every Payment: The Data Pipelines You Don’t See

The Problem: Payments Look Simple, But Aren’t When you send money via UPI or receive your salary, it feels instant and effortless. But behind that single action, multiple systems exchange structured d

0

MTMadhusmita Talukdargiiki.hashnode.devApr 24 · 4 min read

Stop Ignoring Data Pipelines: ETL vs ELT Explained Using a Real ML Workflow

Most of us love building machine learning models. We tune hyperparameters, try different algorithms, and chase better accuracy. But there’s one part we quietly ignore: How the data actually gets to th

0

AAAbstract Algorithmsabstractalgorithms.hashnode.devApr 19 · 28 min read

Spark Architecture: Driver, Executors, DAG Scheduler, and Task Scheduler Explained

TLDR: Spark's architecture is a precise chain of responsibility. The Driver converts user code into a DAG, the DAGScheduler breaks it into stages at shuffle boundaries, the TaskScheduler dispatches ta

0

AAAbstract Algorithmsabstractalgorithms.hashnode.devApr 19 · 24 min read

Spark DataFrames and Spark SQL: Schema, DDL, and the Catalyst Optimizer

TLDR: Catalyst is Spark's query compiler. It takes any DataFrame operation or SQL string, parses it into an abstract syntax tree, resolves column references against the catalog, applies a library of a

0

AAAbstract Algorithmsabstractalgorithms.hashnode.devApr 19 · 37 min read

Spark Executor Sizing: Memory Model, Core Tuning, and GC Strategy

TLDR: Spark executor OOMs are almost never caused by insufficient total cluster RAM — they are caused by misallocating memory across five distinct JVM regions while ignoring GC behavior and memoryOver

0

#apache-spark

Search Hashnode

#apache-spark

Explore Hashnode

Trending tags this week

Beating the 7-Day Cluster TTL: How I Built a Graceful Shutdown Watchdog for a Kafka Streaming Pipeline

What is a Lakehouse?

The Moment I Realised a Database Is Not a Data Warehouse

Building a Real-Time Crypto Analytics Platform

Spark Architecture Simply Explained

Behind Every Payment: The Data Pipelines You Don’t See

Stop Ignoring Data Pipelines: ETL vs ELT Explained Using a Real ML Workflow

Spark Architecture: Driver, Executors, DAG Scheduler, and Task Scheduler Explained

Spark DataFrames and Spark SQL: Schema, DDL, and the Catalyst Optimizer

Spark Executor Sizing: Memory Model, Core Tuning, and GC Strategy