Checking object existence in large AWS S3 buckets using Python and PySpark
Introduction
In my recent project, I encountered a need to check if data from 3rd party database corresponds with the documents in a S3 bucket. While this might seem like a straightforward task, the approach, the dataset was massive - up to 10 millio...
gorskibartosz.pl5 min read