Prasana Kumar Parthasarathyprasanainsights.com·Nov 23, 2023Navigating the Data Tsunami: Unleashing the Power of Big Data in the Tech LandscapeIn the ever-expanding realm of technology, the #DataTsunami, manifested through #BigData, #DataInnovation, and #DataRevolution, has reached unprecedented heights. Various platforms like #AWS, #GCP, #Azure, #OCI, #Kafka, #Databricks, #Snowflake, and m...DiscussCloud Engineeringbig data
Tarun Kumar Bodapatitarunbodapati.hashnode.dev·Nov 6, 2023Cool Intro on Data EngineeringI want to share my understanding of data engineering, as I am interested and new to this field, I started taking a course on data engineering and learning from the course. According to me, the basic idea of data engineering is to extract data from di...Discussdata-engineering
Utsav Gohelgoutsav.hashnode.dev·Nov 4, 2023Apache SparkHere I am attaching a file of lab practice work that I have done in my lab session on big data working with spark. you can download those file by clicking the link Data Processing file (python file)Data set File What is Apache Spark? Apache Spark is ...Discuss·1 like#apache-spark
Swaroop Shankarswaroopshankar.hashnode.dev·Jul 27, 2023Setting up Java, Python and Spark dev environment on UbuntuIntroduction This is a step-by-step guide to setting up the development environment for Java, python and installing various other software for Ubuntu 22.04.1 LTS. Objective: We need to set up and install the following:1. Visual Studio Code2. Docker D...Discuss·75 readsJava
Elvis Davidtechlake.dev·Jul 4, 2023When Should You Use Apache Flink vs Spark?Introduction Real-time data processing has become a crucial aspect of modern business operations as companies strive to provide instantaneous, interactive experiences to their customers. With the increasing use of microservices and event-driven syste...Discuss·5 likes·96 readsData Engineering#apache-spark
Wanjiru Njugunawanjiruh.hashnode.dev·Jun 27, 2023Introduction to Spark with ScalaImagine it's your first day of a new project in Spark, the project manager looks at your team and says to you, in this project I want you to use Scala. so let's first understand the context of spark and data processing in general and also know what S...Discuss·2 likes·35 reads#apache-spark
padmanabha reddypadmanabha.hashnode.dev·Jun 27, 2023Apache Spark - Structured APIApache Spark's Structured API is a high-level programming interface that enables users to manipulate and analyze structured and semi-structured data in a distributed computing environment. It is built on top of the Spark Core engine, providing a user...Discuss#apache-spark
Nupoor Nawatheynupoor01nawathey.hashnode.dev·Jun 25, 2023Are Dataframes better than Spark SQL ?Half-knowledge is worse than ignorance. Thomas B. Macaulay Since there is a lot of noise on the internet for the battle between dataframes vs spark.sql I was also at one point forced to believe that dataframes are always more performant than the que...Discuss#apache-spark
Ujjwal Rastogiujjwalrastogi.hashnode.dev·Jun 1, 2023Introduction to Big Data with pySparkIn this article I have covered : Introduction to Spark and its Architechture, RDDs and Spark DataFrames, translating b/w pandas and spark DataFrames, Writing SQL Queries and reading csv data into Spark Dataframes. Getting to know py- Spark: Spark is ...Discuss·62 readsdistributed system
Priti Jhapritijha.hashnode.dev·Jun 4, 2023Meet Apache Hudi : Supporting Modern Data LakeHUDI stands for Hadoop Upserts Deletes Inserts. It is the framework designed by Uber to achieve ACID properties for distributed data in Hadoop or S3. When compared with Apache Iceberg or Apace DeltaLake, It provides more features like MOR(Merge on Re...Discuss·70 reads#apache-spark