Kooscha RahimiforBricks in the Cloudazure-data.hashnode.dev·Apr 15, 2024A step-by-step guide to CI/CD using Databricks Asset Bundles and the Nutter frameworkThese days continuous integration (CI) and continuous delivery/deployment (CD) is used throughout most disciplines of software and data. Within the realms of data engineering and big data however, the implementations can at times be quite challenging...Discuss·1 likeDatabricks asset bundles
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...Discuss·361 readsDabbling with Apache Sparkspark
Vaishnave Subbramanianvaishnave.page·Mar 21, 2024Dabbling with Spark Essentials AgainApache Spark stands out as a pivotal tool for navigating the complexities of Big Data analysis. This article embarks on a comprehensive journey through the core of Spark, from unraveling the intricacies of Big Data and its foundational concepts to ma...Discuss·79 readsDabbling with Apache Sparkspark
navinkumarnotes123.hashnode.dev·Mar 18, 2024How to decide bucket count in hiveSteps Calculate Expected Bucket Size: Divide the table size by the block size on Hadoop to get an initial estimate. Expected Bucket Size = Table Size / Block Size on Hadoop Find the Nearest Power of 2: Take the base-2 logarithm of the ini...DiscussHivehive
Vaishnave Subbramanianvaishnave.page·Mar 16, 2024Dabbling with Spark EssentialsEmbarking on the journey of understanding Apache Spark marks the beginning of an exciting series designed for both newcomers and myself, as we navigate the complexities of big data processing together. Apache Spark, with its unparalleled capabilities...Discuss·238 readsDabbling with Apache Sparkspark
Gaurav Vishwakarma gauravoncloud.hashnode.dev·Feb 26, 2024Unveiling the Powerhouse: Data Engineering in the Digital EpochIn the vast landscape of technology, where information reigns supreme, the unsung hero orchestrating the symphony of data is none other than data engineering. This field, often hidden behind the glitz of data science and analytics, plays a crucial ro...Discussdata-engineering
Deepankar Yadavbytesofdeepankar.hashnode.dev·Feb 26, 2024Join Strategies in Apache SparkAlthough we are quite familiar with join operations in spark, but do you know spark has some inbuilt tricks to do joins in an efficient manner without letting you know, unless you tame spark and make it do the way you want. PREREQUISITE: TERMINOLOGY:...Discussspark
KALINGA SWAINkalingaswain.hashnode.dev·Feb 11, 2024EMR with EKSHi, welcome to the event! Amazon EMR is like the Rockstar of cloud big data. Picture this: petabyte-scale data parties, interactive analytics shindigs, and even machine learning raves—all happening with cool open-source crews like Apache Spark, Apach...Discuss·28 reads#AWSConsole
Ronil Rodriguesronilrodrigues.hashnode.dev·Feb 9, 2024Apache Spark !!IntroApache Spark has emerged as a leading big data processing framework due to its speed, ease of use, and versatility. At the heart of Spark are its core functionalities and commands, which enable users to perform a wide range of data processing tasks e...Discussspark
Anees Shaikhaneesshaikh.hashnode.dev·Jan 18, 2024Replace withColumn with withColumns to speed up your Spark applications.Disclaimer - the views and opinions expressed in this blogpost are my own. Practical takeaways The .withColumn() function in Spark has been a popular way of adding and manipulating columns. In my experience, it is far more common than adding columns ...Discuss·129 readsdataengineering