Jesper Baggefullouterjoin.dev·3 hours agoAutomagic schema inference killed the data engineerTodays lesson in working with legacy output is: data contracts, data contracts and data contracts. The great thing about shipping structured data as text files is it is so darn easy! Just store it as a CSV and boom: minimum overhead and just the righ...DiscussClickHouse
navinkumarnotes123.hashnode.dev·5 hours agoHive Partition with BucketIn Hive, partitioning and bucketing are two techniques used for organizing and optimizing data storage and querying. Hive Partition with bucket Example Scenario: Consider a dataset containing drug sales details, and it contains 6 pid, pname, drug, ge...Discusshivehive
Kiran ReddyforDatabricks - PySparkdatabricks-pyspark-blogs.hashnode.dev·Mar 18, 2024Unlocking Data Potential: Introducing Databricks Unity CatalogIn today's data-driven world, managing vast amounts of information efficiently is crucial for businesses to thrive. Databricks, a leading provider of unified analytics platforms, continues to innovate in this space with its groundbreaking tool: Datab...Discuss·10 likesDatabricksUnityCatalog
Surendra Tamangjyaba.hashnode.dev·Mar 16, 2024How do I future proof my career as a Data Engineer?To future-proof your career as a Data Engineer, consider the following strategies inspired by the discussions and insights shared in the: View AI as a Force Multiplier: Rather than seeing AI as a threat, think of it as a tool that enhances your capa...Discussdata-engineering
Kinyanjui Karanjaoverflow.hashnode.dev·Mar 12, 2024Loading, Transforming, and Saving GitHub Archive Data with PySparkIntroduction: GitHub Archive provides a wealth of data capturing various activities on the GitHub platform, such as repository creation, issues opened, and pull requests made. In this blog post, we'll explore how to use PySpark, a powerful analytics ...DiscussPySpark
Madukoma Blessedmblessed.hashnode.dev·Mar 12, 2024Beyond SELECT: Exploring SQL Commands - Part 2Welcome to Part 2 of our SQL exploration, where we're taking a deeper dive into the world of data operations. Unlike Part 1, which laid the groundwork by introducing SQL fundamentals, this segment delves into practical applications beyond simple SELE...Discussdata-engineering
Nikhil ThomasforSnow Forged Machine Minds (SFM2)sfm2.ai·Mar 12, 2024Data Engineering Foundations: A Practical Introduction to Snowflake, Fivetran, and dbtWelcome to our exciting journey through the world of data integration and transformation! Today, we're diving into a practical project that involves a trio of modern data tools: Snowflake, Fivetran, and dbt. Whether you're a data enthusiast, a profes...Discuss·69 readsfivetran
Alex Mercedtechblog.alexmerced.com·Mar 9, 20245 reasons Dremio is the ideal Apache Iceberg Lakehouse PlatformThe Apache Iceberg table format has seen an impressive expansion in its compatibility with a vast spectrum of data platforms and tools. Among these, Dremio stands out as a pioneer, having embraced Apache Iceberg early on. In this article, we delve in...Discussdata-engineering
Isaac Otengisaacoteng.hashnode.dev·Mar 7, 2024Data ingestion using AWS ServicesData ingestion using AWS Services, Part 2 Querying AWS S3 data from AWS Athena using SQL. AWS Athena is an interactive query service that makes it easy to analyze data on Amazon using standard SQL. In this second part of the tutorial, we are going to...DiscussAWS
Isaac Otengisaacoteng.hashnode.dev·Mar 7, 2024Data Ingestion Using AWS Services, Part 1Data ingestion using AWS Services, Part 1 Data ingestion is the process of collecting, importing, and transferring raw data from various sources to a storage or processing system where it can be further analyzed, transformed, and used for various pur...DiscussAWS