Mar 6 · 9 min read · Stack: AWS Glue · PySpark · Step Functions · APIs · Power BI Welcome to my very first technical blog post, I'm Omer and I love many things in life, but for now, you will know only two of them: detail
Join discussion
Feb 19 · 5 min read · We’ve all been there: it’s 3 AM, and your data pipeline has stalled. A website you’ve been scraping for months decided to wrap their price tag in an extra <div> or rename a CSS class from product-price to item-price-v2. Your scraper, built on a house...
Join discussion
Feb 17 · 12 min read · When your hotel database thinks "Game Room, Deck & Yard: Chicago Home" is a hotel, you have a data quality problem. When it happens across 212 cities in 25 countries, this isn’t a travel problem; it’s
Kklement commented
Feb 17 · 12 min read · Meta Description: Learn how to design a scalable Enterprise Knowledge Search Pipeline in 2026 using Azure AI Search, OpenAI embeddings, and Python - explained simply, without the jargon overload. Let Me Guess - Your Internal Search Is Terrible You o...
Join discussion
Feb 18 · 8 min read · In today’s data-driven world, managing vast volumes of data is an essential task for organizations. With data’s increasing size and complexity, effective data pipelines are crucial. Creating a smooth data pipeline, guaranteeing data quality, integrat...
Join discussion
Feb 6 · 1 min read · When marketing attribution does not make sense, the blame usually falls on dashboards, models, or reporting logic. In practice, most attribution issues originate much earlier in the pipeline. Incomplete customer journeys, inconsistent identifiers...
Join discussionFeb 1 · 7 min read · Modern data pipelines most often fail at their beginning, not their end. A malformed record, an unexpected delimiter, or an encoding anomaly can cause otherwise robust processing engines to abort after consuming significant computational resources. T...
Join discussion
Jan 28 · 2 min read · As data engineering continues its rapid evolution, the tooling landscape is less about revolutionary shifts and more about the practical, sturdy refinement of existing paradigms. Over the past 12-18 months, we've witnessed dbt, Apache Airflow, and Da...
Join discussion
Jan 26 · 1 min read · Most pipelines are built for analytics. AI requires something different. Dashboards want aggregates. Models want raw signals. Analytics Pipelines Optimized for: Summaries Metrics Human consumption They collapse information. AI Pipelines Optimi...
Join discussion