Sandeep PawarProfabric.guru·Sep 20, 2023A Quick Comparison Of Fabric Spark Configuration SettingsI compared the default Spark configurations in the Fabric Spark runtime with those of the standard Spark. I excluded configurations that were identical between the two, as well as those that were irrelevant. I thought sharing this information might b...Discuss·202 readsspark
James Fordjaggedarray.hashnode.dev·Sep 18, 2023Spark JDBC Read ParallelizationIf you've utilized JDBC within Spark for long enough, or have had to pull large tables, you know the the more data, the longer it takes to pull. Seems pretty obvious right? Well, what if I told you there's an option for speeding things up, drasticall...Discuss·52 readsspark
Aruna Dasarunadas.hashnode.dev·Aug 31, 2023Spark Series #4 : Embracing Laziness: The Celebration of Efficiency in Spark(All images created by Author otherwise stated) In Spark, the core data structures (RDD) are immutable meaning they can’t be modified once created. So then in spark how do you perform transformation one of the basic requirements of ETL ( transformati...DiscussData Science
Nirmal PandeyforBits Notionbitsnotion.com·Jul 26, 2023Tools and Technologies For Big DataThe era of Big Data has ushered in a new paradigm of data-driven decision-making, revolutionizing industries across the globe. To harness the power of Big Data effectively, organizations rely on a diverse array of tools and technologies that cover da...DiscussData science/Big Databig data
Aruna Dasarunadas.hashnode.dev·Aug 17, 2023Spark Series #3 : Architecture of SparkImage credit – Photo by thekliks photos on Unsplash Julia Morgan, a renowned American architect and engineer, eloquently captured the essence of architectural expression in the words above. While she referred to these sentiments in the context of phy...Discuss·10 likesSpark Seriesspark
Aruna Dasarunadas.hashnode.dev·Aug 10, 2023Spark Series #2 : Evolution of SparkHistory of Spark Apache Spark originated as a research project at UC Berkeley’s AMPLab, focusing on big data analytics. It introduced a programming model that offers broader application support compared to MapReduce while maintaining automatic fault ...Discuss·10 likesSpark Seriesspark
Aruna Dasarunadas.hashnode.dev·Aug 3, 2023Spark Series # 1 : Why Spark?What is Big Data? We are currently living in the data era, as a biological being, you are a significant source of big data both internally and externally. Internally, you carry a multitude of minerals such as iron, zinc, calcium, phosphorus, magnesiu...DiscussSpark SeriesData Science
AATISH SINGHaatishintodata.hashnode.dev·Aug 3, 2023Hadoop Namenode Failure ManagementLet's #hadoop 📌 Some insights on Namenode failure management in Hadoop 📢 ✔ Managing NameNode failures in Hadoop is crucial to ensure high availability and fault tolerance in the Hadoop Distributed File System (HDFS). ✔ The NameNode is a critical co...Discuss·1 likehadoop
AATISH SINGHaatishintodata.hashnode.dev·Aug 1, 2023Hadoop #Split-Brain Scenario & #FencingLet's #hadoop 📌 𝐖𝐡𝐚𝐭 𝐢𝐬 #𝐒𝐩𝐥𝐢𝐭 𝐁𝐫𝐚𝐢𝐧 𝐒𝐜𝐞𝐧𝐚𝐫𝐢𝐨 𝐚𝐧𝐝 #𝐟𝐞𝐧𝐜𝐢𝐧𝐠 𝐢𝐧 𝐇𝐚𝐝𝐨𝐨𝐩? ✔ 𝘐𝘯 𝘵𝘩𝘦 𝘤𝘰𝘯𝘵𝘦𝘹𝘵 𝘰𝘧 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘦𝘥 𝘴𝘺𝘴𝘵𝘦𝘮𝘴, 𝘪𝘯𝘤𝘭𝘶𝘥𝘪𝘯𝘨 𝘏𝘢𝘥𝘰𝘰𝘱 𝘤𝘭𝘶𝘴𝘵𝘦𝘳𝘴, 𝘢 "#split_bra...Discusshadoop
Ilumilum.hashnode.dev·Jul 27, 2023Getting started with Data Science on Kubernetes - Jupyter and ZeppelinIt's no secret that the data analytics community has been moving towards using more open-source and cloud-based tools. Apache Zeppelin and Jupyter notebooks are two of the most popular tools used by data scientists today. In this blog post, we will s...Discussspark