Shreyash Banteshreyash27.hashnode.dev·Aug 22, 2024Apache Spark ArchitectureApache Spark’s architecture is a cornerstone of its ability to efficiently process large-scale data. It is designed around the concept of distributed computing, which enables it to process massive datasets quickly and reliably across a cluster of com...spark
Kumar Rohitkrohit-de.hashnode.dev·Aug 15, 2024Hello Spark on MinikubeMinikube is a beginner-friendly tool that lets you run a Kubernetes cluster on your local machine, making it easy to start learning and experimenting with Kubernetes without needing a complex setup. It creates a single-node cluster inside a virtual m...46 readsExperiments on Minikube 🚀sparksql
Mehul Kansalmehulkansal.hashnode.dev·Aug 12, 2024Week 12: Mastering Hive 📈Hey fellow data engineers! 👋 This week's blog explores the architecture of Hive, offering insights into its components like data storage and metadata management. It also covers the different types of tables Hive supports, essential optimizations for...hive
Turboline LTDblog.turboline.ai·Jul 10, 2024AI-Based Data Transformation: A Comparison of LLM-Generated PySpark Code (Using Mistral & Google Gemini Advanced)Mistral Le Chat Mistral's Le Chat didn't have the functionality to upload CSV files, so I couldn't generate a transformation code. I could have built a lightweight ingestion process using Mistral API and Langchain. However, that was out of scope for ...google gemini
Kilian Baccaro Salinasdatagym.es·Jul 6, 2024Como crear una dimensión Date con PySparkUna tabla de dimensión Fecha es crucial para nuestros proyectos e informes analíticos. Es una tabla que no contiene mucha lógica y se puede crear en cualquier parte del proceso ETL. Con un sencillo script con PySpark y Spark SQL puedes crearla en tu ...lakehouse
Mehul Kansalmehulkansal.hashnode.dev·May 27, 2024Week 2: Elevate your Spark Game - Exploring Higher Level APIs! 📈Introduction Hey there, data enthusiasts! This week, we will explore the power of Spark's higher level APIs - Dataframes and Spark SQL. Let's begin using Spark like a pro and unveil the enhanced data processing capabilities of this dynamic duo. Task ...data-engineering
Mark williamstechcapital.hashnode.dev·Apr 15, 2024Apache Spark Interview Questions and Answers for 2024: A Comprehensive Guide for StudentsHey Spark Enthusiasts! Are you gearing up for an interview that involves Apache Spark? Whether you're a seasoned data aficionado or just diving into the world of big data, preparing for an Apache Spark interview requires a solid understanding of its ...apache
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...1 like·560 readsDabbling with Apache Sparkspark
Vaishnave Subbramanianvaishnave.page·Mar 16, 2024Dabbling with Spark EssentialsEmbarking on the journey of understanding Apache Spark marks the beginning of an exciting series designed for both newcomers and myself, as we navigate the complexities of big data processing together. Apache Spark, with its unparalleled capabilities...1 like·417 readsDabbling with Apache Sparkspark
Kilian Baccaro Salinasdatagym.es·Mar 3, 2024Delta Table History y VacuumIntroducción En este artículo veremos como se puede recuperar información sobre las operaciones, usuario, marca de tiempo, etc. de cada escritura en una tabla Delta ejecutando el comando history. Además, veremos como se pueden eliminar los archivos d...microsoftfabric