Rahul Rathodcodeok.hashnode.dev·Jul 14, 2024The Revolutionary Journey of Apache Spark: From Academic Roots to Industry DominanceIn the world of big data, speed and efficiency are paramount. Among the many technologies that have emerged to address these needs, Apache Spark stands out as a revolutionary force. Born from academic innovation and nurtured by a growing community, S...Discussapache
Pawan Kumardevopsapk.hashnode.dev·Jul 13, 2024Understanding Data Dependency Management in DatabricksWhen working with Databricks, you may often need to incorporate third-party dependencies to enhance your code's functionality. Whether you're using Databricks with Scala or Python, importing external jars or modules is essential for leveraging additi...DiscussAzure
Chandrasekar(Chan) Rajaramcr88.hashnode.dev·Jul 7, 2024Secure Databricks Access to Azure Data Lake Gen2 via Service Principal and Azure Key VaultIntroduction In the world of big data analytics, securing access to your data storage is paramount. As organizations increasingly adopt cloud-based solutions, the need for robust, scalable, and secure data access mechanisms becomes crucial. This blog...Discuss·1 like·40 readsDatabricks
Debashis Adakadak.hashnode.dev·Jun 29, 2024Databricks Variant DataThe VARIANT data type is a recent introduction in Databricks (available in Databricks Runtime 15.3 and above) designed specifically for handling semi-structured data. It offers an efficient and flexible way to store and process this kind of data, whi...Discussbig data
Joubin Najmaiedatafragments.com·Jun 21, 2024Week of June 17 2024 - Mindmap RecapJune 17, 2024: Databricks - Issues with Excel Library in Clusters An issue was encountered with the crealytics:spark-excel library in Databricks. This Spark plugin is essential for reading and writing Excel files within Databricks. However, we observ...DiscussDatabricks
Alex Mercedalexmerced.hashnode.dev·Jun 6, 2024Summarizing Recent Wins for Apache Iceberg Table FormatApache Iceberg is a table format that allows groups of Parquet files in a data lake to be recognized as database tables. These tables can be easily queried using SQL with various engines or loaded into popular Python dataframe libraries such as Polar...Discussapache iceberg
Kiran ReddyforDatabricks - PySparkdatabricks-pyspark-blogs.hashnode.dev·May 26, 2024Understanding Spark Memory Architecture: Best Practices and TipsSpark is an in-memory processing engine where all of the computation that a task does happens in memory. So, it is important to understand Spark Memory Management. This will help us develop Spark applications and perform performance tuning. In Apache...Discuss·10 likesspark optimizations
Vipinvipinmp.hashnode.dev·May 25, 2024Learn Advanced SQL with DatabricksIntroduction In this blog post, we'll explore advanced SQL techniques that are essential for complex data analysis and manipulation. By mastering these SQL features, you can write more efficient queries and gain deeper insights from your data. Table ...DiscussSQL
Vipinvipinmp.hashnode.dev·May 25, 2024Learn Intermediate SQL with DatabricksTable of Contents Aggregate Functions SUM AVG COUNT GROUP BY and HAVING GROUP BY HAVING DISTINCT and NULL Handling DISTINCT NULL Math Functions and SQL Arithmetic Math Functions SQL Arithmetic CASE Statements Simple CASE Searc...Discuss·31 readsDatabricks
Vipinvipinmp.hashnode.dev·May 9, 2024Kickstart your Spark Data Exploration journey with DatabricksApache Spark is an open-source framework for large-scale data processing. It's known for its speed and ability to handle various tasks, like: Batch data processing: Working with large datasets all at once. Real-time data processing: Analyzing data ...Discuss·58 reads#apache-spark