Gabriela CaldasforByte-sized Journeybyte-sizedjourneys.hashnode.dev·11 hours agoDesigning Data Pipelines for Success: Best Practices for Scalability and Data QualityIn today’s world, businesses need accurate and readily available insights to stay competitive. Data engineering plays a crucial role in creating the infrastructure that makes this possible. From building efficient data pipelines to ensuring data qual...Discussdata-engineering
Kay SauterProkaysauter.hashnode.dev·Nov 19, 2024Loading files automatically to bronze lakehouseIn my last post on LinkedIn, I explained how to export AdventureWorks2022 tables to csv files. If you don’t want to generate them, you can get them here, as stated in the last post (after my edit in which I realized I made a mistake). My actual blog ...Discussdata-engineering
Nhlahla Sibiyadataandinsight.hashnode.dev·Nov 17, 2024Building a Well-Organized Grocery Store: A Metaphor for Your Data WarehouseIn today’s data-driven world, businesses rely on well-structured data warehouses to make informed decisions. But building a data warehouse can seem daunting, so let’s simplify things with a familiar metaphor: a well-organized grocery store. By compar...DiscussData Science
Ankit Rajankitraj19.hashnode.dev·Nov 16, 2024From Basics to Brilliance: My Week 1 Journey with PostgreSQL for Data EngineeringIntroduction: Background As a data engineer working full-time in the industry, I’m always looking for ways to enhance my skills and stay ahead of the curve with the latest technologies. Recently, I decided to focus more on data engineering and dive d...Discussdata-engineering
Anix LynchProanixblog.hashnode.dev·Nov 16, 2024SQLite Northwind 1# Create DatabaseStep 1: Create a New Folder Create a folder named sqlite_db in drive E (or any preferred location on your computer). This folder will store your database files. Step 2: Create a New Python Script Open a new Python script and save it as create_dat...DiscussSQLite
Alex Mercedalexmerced.hashnode.dev·Nov 15, 2024Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg TablesBlog: What is a Data Lakehouse and a Table Format? Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Lakehouse Catalog Course Iceberg Lakehouse Engineering Video Playlist Manually orchestrating data pipelines to hand...Discussapacheiceberg
Farbod AhmadianforDataChef's Blogblog.datachef.co·Nov 14, 2024Sparkle: Accelerating Data Engineering with DataChef’s Meta-FrameworkSparkle is revolutionizing the way data engineers build, deploy, and maintain data products. Built on top of Apache Spark, Sparkle is designed by DataChef to streamline workflows and create a seamless experience from development to deployment. Our go...Discuss·39 readsspark
Christos Georgoulisgeorgoulis.tech·Nov 14, 2024Data Platform vs Data ProductIn previous discussions I had with colleagues regarding the process of getting requirements for a Data Product, happened to get lost among what should be considered as Data Product requirement and what should be considered as Data Platform requiremen...Discuss·28 readsData platform
Arun R Nairarunrnair.hashnode.dev·Nov 13, 2024Install PySpark in Google Colab with Github IntegrationPre-requisites Colab account (https://colab.research.google.com/) Github account (https://github.com/) Introduction: Google Colab is an excellent environment for learning and practicing data processing and big data tools like Apache Spark. For beginn...Discusscolab
Harvey Ducayhddatascience.tech·Nov 13, 2024What is ETL in Data Engineering?So imagine you're making a smoothie - that's basically what ETL is in the data world. First, you Extract all your ingredients (data) from different places, like grabbing berries from the fridge, bananas from the counter, and yogurt from the store - j...Discussetl-pipeline