MOHAMMED FAIYAZapache-beam.hashnode.dev·Feb 5, 2025Apache Beam: A Unified Framework for Batch and Stream ProcessingIntroduction Apache Beam is an open-source, unified programming model that enables developers to build efficient and scalable batch and stream data processing pipelines. Originally developed by Google, Apache Beam is now a part of the Apache Software...Apache Beam
Yingjun WuforRisingWave Labs Blogrisingwave.com·Jan 24, 2025Stream Processing Systems in 2025: RisingWave, Flink, Spark Streaming, and What's AheadStream processing isn’t a new technology. In fact, the concept has been studied for at least 23 years! The first academic paper I came across dates back to 2002, just two years before the publication of the famous MapReduce paper. Pioneering compani...Blogstreamprocessing
Aaron Jevil Nazarethaarons-space.hashnode.dev·Jan 21, 2025Speeding Up Spark: The Simple Trick That Saved Me 2 HoursHello there, fellow coders and web enthusiasts!Today, I want to share a challenge I was recently assigned, one that had me scratching my head for a bit. Finding a solution was not as straightforward as I’d hoped, but I’m excited to walk you through t...1 likespeed up
Alex Cloudstarblog.alexcloudstar.com·Jan 20, 2025Understanding Scala: A Modern and Powerful Language for the JVMScala is a multi-paradigm programming language that combines the best features of both object-oriented and functional programming. It is a powerful language that runs on the Java Virtual Machine (JVM) and is interoperable with Java. In this article, ...Scala
Kilian Baccaro Salinasdatagym.es·Jan 16, 2025Como obtener todas las configuraciones de la sesión de Spark + secretos de Azure Key VaultConocer como está configurada tu sesión de Spark es importante para debugging o para confirmar que los valores de los parámetros están bien configurados. Con el siguiente comando puedes obtener todas las configuraciones actuales de la sesión de Spark...54 readsspark
Nextwebbnextwebb.hashnode.dev·Jan 10, 2025Data Engineering Foundations: A Hands-On GuideData Engineering Foundations: A Hands-On Guide Hey there! If you’ve been curious about data engineering, this guide will help you understand the basics and walk you through practical examples. Whether it’s setting up storage, processing data, automat...67 readsETLPipelines
Rahul Dasschemasensei.hashnode.dev·Jan 7, 2025Unlocking Real-Time Data with Change Data Capture (CDC)In this guide, we will cover CDC, its importance, and the setup of a CDC stack using Kafka, Debezium, and other services. Additionally, we will configure a PostgreSQL connector using the Confluent Control Center web UI to capture changes from a Postg...27 readskafka
Rajanand Ilangovanblog.rajanand.org·Dec 28, 2024Predicate Pushdown in SparkWhat is predicate pushdown? Predicate pushdown is an optimization technique in Apache Spark where the filtering logic (predicates) is pushed closer to the data source. Instead of Spark loading all the data into memory and applying the filters, the fi...36 readsspark
Renjitha Krenjithak.hashnode.dev·Dec 25, 2024Crafting a Seamless Data Journey: Navigating the Medallion Architecture PipelineHave you ever thought about how modern data systems manage data efficiently? The Medallion Architecture is a smart, structured approach that ensures your data is reliable, scalable, and ready to use. Let’s dive into it step-by-step and explore how it...52 readsMedallionArchitecture
Sharath Kumar Thungathurthisharaththungathurthi.hashnode.dev·Dec 22, 2024Impala Interview QuestionsHere are some questions and answers related to Impala, covering various aspects of its usage, architecture, and functionality. These are useful for interview preparation or as a study guide. 1. What is Impala? Answer:Impala is an MPP (Massively Paral...Impala