KKKushneet Kaurincloudnativebykushneet.hashnode.dev·Jun 20 · 5 min readWhat is a Lakehouse? This article is part of the Databricks from Scratch series.Start from the beginning: Stop Optimising Your Prompts. Fix Your Data Pipelines. Picture this. It's IPL ticket booking day. 10 AM. 1 crore00
PNPrithwish Nathinprithwish-nath.hashnode.dev·Jun 18 · 20 min readBuilding a Local SERP Analytics Pipeline with dbt Core and DuckDBTL;DR: This pipeline uses dbt Core + DuckDB locally — no infrastructure — to normalize domains, deduplicate URLs, enforce data contracts via tests, and materialize four analyst-ready mart tables from 00
YAYoosuf Ahamedinyuusvision.hashnode.dev·Jun 14 · 7 min readThe 80% Reality of Data Science: Why Data Cleaning Dominates Professional WorkflowsThe Dirty Secret of Data Science If you’re new to machine learning, you might imagine a data scientist’s day spent optimizing hyperparameters, launching neural networks, and celebrating high model acc00
MAMaulana Akbar Dwijayainmaulcenter.hashnode.dev·May 27 · 8 min readHow Much Does It Cost to Build a Data Warehouse?Introduction Modern organizations generate massive amounts of data every day from ERP systems, CRM platforms, financial applications, websites, mobile apps, IoT devices, and business operations. While20
NTNikhil Taleinnikhilbuilds.hashnode.dev·May 23 · 5 min readOptimize BigQuery Storage Costs: Making the Right Billing ChoiceBigquery is one of central data warehouse we use in our organization. It is Google's serverless and highly scalable columnar data warehouse build for analytical workloads. In our team we follow multit00
SFSaeed Felegariinsoftarch.hashnode.dev·May 5 · 5 min readBig Data Architecture Isn’t About Volume — It’s About Decisions at ScaleWhen people hear “big data,” they often think in terms of size. Terabytes. Petabytes. Streaming pipelines. Distributed clusters. But at enterprise scale, the real challenge isn’t storing or processing00
JSJasper.B Stewartinaitechy.hashnode.dev·Apr 28 · 2 min readExploring the Architecture of AI-Driven Sentiment AnalysisAs organizations turn to data-driven solutions, the architecture behind AI-Driven Sentiment Analysis becomes crucial. This article delves into the intricate components that facilitate sentiment extraction from user-generated content. Employing this ...00
ARAnkit Rajinankit-data-engineering.hashnode.dev·Apr 25 · 3 min read7 Skills That Make Data Engineering Feel Less HardWhen I started working in data engineering, nothing felt simple. Tasks that looked straightforward took longer than expected, pipelines broke without clear reasons, and debugging often felt like guess00
MTMadhusmita Talukdaringiiki.hashnode.dev·Apr 24 · 4 min readStop Ignoring Data Pipelines: ETL vs ELT Explained Using a Real ML WorkflowMost of us love building machine learning models. We tune hyperparameters, try different algorithms, and chase better accuracy. But there’s one part we quietly ignore: How the data actually gets to th00
AAAbstract Algorithmsinabstractalgorithms.dev·Apr 19 · 37 min readSpark Executor Sizing: Memory Model, Core Tuning, and GC StrategyTLDR: Spark executor OOMs are almost never caused by insufficient total cluster RAM — they are caused by misallocating memory across five distinct JVM regions while ignoring GC behavior and memoryOver00