Tag feed

#aws-glue

88 posts20 followers

Explore Hashnode

Alternatives

Trending tags this week

PAPrakash Agrawalprakashagrawal.hashnode.devJul 24 · 7 min read

ETL Configuration with S3, Glue Studio and Athena in AWS

The AWS ETL Architecture ETL stands for extract, transform and load. It is the process of taking data from a source, preparing or transforming it into a useful form, and loading it into a destination

0

SSapotaCorpsapotacorpvn.hashnode.devJul 12 · 13 min read

Data engineering for a regulated fintech: a 10-month AWS lake build

Client: Regional fintech operator (anonymized per agreement) Timeline: 10-month engagement (6-month build + 4-month fine-tuning & warranty) Team: 6 engineers including 2 from Sapota's data team, worki

0

SSapotaCorpsapotacorpvn.hashnode.devJul 12 · 7 min read

Ingesting every partner's file format without rewriting code each time

A credit bureau we worked with had built its ingestion the way most platforms start: one fixed file template, loaded in batches, parsed by code written specifically for that shape. It worked right up

0

SSapotaCorpsapotacorpvn.hashnode.devJul 12 · 8 min read

Validating Data Against Its Own History: Baseline Data Quality with GlueDQ

A row level data quality gate is good at exactly one thing: deciding whether a single value, looked at in isolation, is acceptable. Is the credit limit a non negative number? Is the new IC twelve digi

0

HTHammad Tariqthehammadtariq.hashnode.devJun 23 · 3 min read

AWS Glue Cost Optimization: The 8 Traps That Inflate Your Bill

AWS Glue Cost Optimization: The 8 Traps That Inflate Your Bill If you run AWS Glue at any real volume, you've probably had the conversation: the bill came in two or three times higher than the forecas

0

Ppratikdhandepratikdhande.hashnode.devMay 31 · 5 min read

Zero Downtime Migration with AWS DMS - Architecting End to End IoT Data Pipeline

I am writing this blog to share my work for Merk Sharp and Dohme (MSD) client. This is not the actual client project. This is a personal protfolio project that reflects the challenges, and money mange

0

VDVishnu Dthedatatrench.hashnode.devMay 29 · 9 min read

Spark Architecture Simply Explained

You have been using Spark for months -- running notebooks, submitting jobs, reading docs. But when someone asks you to explain what actually happens when a job runs, you find yourself stalling. The co

0

Iilshaadcodeless-sync.hashnode.devMay 28 · 11 min read

AWS Glue Alternatives: Simpler Ways to Sync API Data to RDS

AWS Glue can transform terabytes of data across S3 buckets, orchestrate complex ETL workflows, and handle schema evolution at scale. It's also one of the most over-engineered ways to get your Stripe c

0

ÖOÖmer Oruç ÇELİKoorucelik.hashnode.devMar 6 · 9 min read

From 3,600 to 400 API Calls: Optimizing PySpark on AWS Glue with the Yield Pattern

Stack: AWS Glue · PySpark · Step Functions · APIs · Power BI Welcome to my very first technical blog post, I'm Omer and I love many things in life, but for now, you will know only two of them: detail

0

SSSameer Shuklafreecodecamp.orgFeb 5 · 70 min read

How to Optimize PySpark Jobs: Real-World Scenarios for Understanding Logical Plans

In the world of big data, performance isn't just about bigger clusters – it's about smarter code. Spark is deceptively simple to write but notoriously difficult to optimize, because what you write isn't what Spark executes. Between your transformatio...

0

#aws-glue

Search Hashnode

#aws-glue

Explore Hashnode

Trending tags this week

ETL Configuration with S3, Glue Studio and Athena in AWS

Data engineering for a regulated fintech: a 10-month AWS lake build

Ingesting every partner's file format without rewriting code each time

Validating Data Against Its Own History: Baseline Data Quality with GlueDQ

AWS Glue Cost Optimization: The 8 Traps That Inflate Your Bill

Zero Downtime Migration with AWS DMS - Architecting End to End IoT Data Pipeline

Spark Architecture Simply Explained

AWS Glue Alternatives: Simpler Ways to Sync API Data to RDS

From 3,600 to 400 API Calls: Optimizing PySpark on AWS Glue with the Yield Pattern

How to Optimize PySpark Jobs: Real-World Scenarios for Understanding Logical Plans