Shreyash Banteshreyash27.hashnode.dev·Aug 22, 2024Apache Spark ArchitectureApache Spark’s architecture is a cornerstone of its ability to efficiently process large-scale data. It is designed around the concept of distributed computing, which enables it to process massive datasets quickly and reliably across a cluster of com...Discussspark
Kumar Rohitkrohit-de.hashnode.dev·Aug 15, 2024Hello Spark on MinikubeMinikube is a beginner-friendly tool that lets you run a Kubernetes cluster on your local machine, making it easy to start learning and experimenting with Kubernetes without needing a complex setup. It creates a single-node cluster inside a virtual m...Discuss·43 readsExperiments on Minikube 🚀sparksql
Mehul Kansalmehulkansal.hashnode.dev·Aug 12, 2024Week 12: Mastering Hive 📈Hey fellow data engineers! 👋 This week's blog explores the architecture of Hive, offering insights into its components like data storage and metadata management. It also covers the different types of tables Hive supports, essential optimizations for...Discusshive
Turboline LTDblog.turboline.ai·Jul 10, 2024AI-Based Data Transformation: A Comparison of LLM-Generated PySpark Code (Using Mistral & Google Gemini Advanced)Mistral Le Chat Mistral's Le Chat didn't have the functionality to upload CSV files, so I couldn't generate a transformation code. I could have built a lightweight ingestion process using Mistral API and Langchain. However, that was out of scope for ...Discussgoogle gemini
Mehul Kansalmehulkansal.hashnode.dev·May 27, 2024Week 2: Elevate your Spark Game - Exploring Higher Level APIs! 📈Introduction Hey there, data enthusiasts! This week, we will explore the power of Spark's higher level APIs - Dataframes and Spark SQL. Let's begin using Spark like a pro and unveil the enhanced data processing capabilities of this dynamic duo. Task ...Discussdata-engineering
Mark williamstechcapital.hashnode.dev·Apr 15, 2024Apache Spark Interview Questions and Answers for 2024: A Comprehensive Guide for StudentsHey Spark Enthusiasts! Are you gearing up for an interview that involves Apache Spark? Whether you're a seasoned data aficionado or just diving into the world of big data, preparing for an Apache Spark interview requires a solid understanding of its ...Discussapache
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...Discuss·498 readsDabbling with Apache Sparkspark
Vaishnave Subbramanianvaishnave.page·Mar 16, 2024Dabbling with Spark EssentialsEmbarking on the journey of understanding Apache Spark marks the beginning of an exciting series designed for both newcomers and myself, as we navigate the complexities of big data processing together. Apache Spark, with its unparalleled capabilities...Discuss·331 readsDabbling with Apache Sparkspark
KALINGA SWAINkalingaswain.hashnode.dev·Feb 11, 2024EMR with EKSHi, welcome to the event! Amazon EMR is like the Rockstar of cloud big data. Picture this: petabyte-scale data parties, interactive analytics shindigs, and even machine learning raves—all happening with cool open-source crews like Apache Spark, Apach...Discuss·53 reads#AWSConsole
Anees Shaikhaneesshaikh.hashnode.dev·Jan 18, 2024Replace withColumn with withColumns to speed up your Spark applications.Disclaimer - the views and opinions expressed in this blogpost are my own. Practical takeaways The .withColumn() function in Spark has been a popular way of adding and manipulating columns. In my experience, it is far more common than adding columns ...Discuss·173 readsdataengineering