Farbod AhmadianforDataChef's Blogblog.datachef.co·May 14, 2024Apache Iceberg CompactionIntroduction Apache Iceberg introduces a powerful compaction feature, especially beneficial for Change Data Capture (CDC) workloads. This document outlines the key properties and commands necessary for effective Iceberg table management, focusing on ...Discuss·10 likes·43 readsapacheiceberg
Vipinvipinmp.hashnode.dev·May 9, 2024Kickstart your Spark Data Exploration journey with DatabricksApache Spark is an open-source framework for large-scale data processing. It's known for its speed and ability to handle various tasks, like: Batch data processing: Working with large datasets all at once. Real-time data processing: Analyzing data ...Discuss·53 reads#apache-spark
Mark williamsmarkwilliams21.hashnode.dev·Apr 15, 2024Apache Spark Interview Questions and Answers for 2024: A Comprehensive Guide for StudentsHey Spark Enthusiasts! Are you gearing up for an interview that involves Apache Spark? Whether you're a seasoned data aficionado or just diving into the world of big data, preparing for an Apache Spark interview requires a solid understanding of its ...Discussapache
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...Discuss·394 readsDabbling with Apache Sparkspark
Prabodh AgarwalforCMD-LYNEtoplyne.hashnode.dev·Apr 3, 2024Skiing with SnowflakeIn this article, I will demonstrate how to formulate a lakehouse strategy that pairs well with Snowflake. A few months ago, I began exploring opportunities to develop ETL pipelines in Ray. I had to perform my PoC on SnowflakeDB. Unfortunately, Ray C...Discuss#apache-spark
Alex Mercedtechblog.alexmerced.com·Apr 1, 2024End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)Data engineering aims to make data accessible and usable for data analytics and data science purposes. This involves several key aspects: Transferring data from operational systems like databases to systems optimized for analytical access. Modeling...Discuss·1 like·36 readsdata lakehouse
Vaishnave Subbramanianvaishnave.page·Mar 21, 2024Dabbling with Spark Essentials AgainApache Spark stands out as a pivotal tool for navigating the complexities of Big Data analysis. This article embarks on a comprehensive journey through the core of Spark, from unraveling the intricacies of Big Data and its foundational concepts to ma...Discuss·105 readsDabbling with Apache Sparkspark
Vaishnave Subbramanianvaishnave.page·Mar 16, 2024Dabbling with Spark EssentialsEmbarking on the journey of understanding Apache Spark marks the beginning of an exciting series designed for both newcomers and myself, as we navigate the complexities of big data processing together. Apache Spark, with its unparalleled capabilities...Discuss·250 readsDabbling with Apache Sparkspark
Cloud Tunedcloudtuned.hashnode.dev·Mar 11, 20245 Apache Spark Use Cases5 Apache Spark Use Cases Apache Spark has emerged as one of the most popular distributed computing frameworks for big data processing. Its versatility and scalability make it suitable for a wide range of applications across various industries. Let's ...Discuss#apache-spark
Deepankar Yadavbytesofdeepankar.hashnode.dev·Feb 26, 2024Join Strategies in Apache SparkAlthough we are quite familiar with join operations in spark, but do you know spark has some inbuilt tricks to do joins in an efficient manner without letting you know, unless you tame spark and make it do the way you want. PREREQUISITE: TERMINOLOGY:...Discussspark