blog.harshdaiya.comApache Hudi: A Deep Dive with Python Code ExamplesIn today's data-driven world, real-time data processing and analytics have become crucial for businesses to stay competitive. Apache Hudi (Hadoop Upserts and Incremental) is an open-source data management framework that provides efficient data ingest...Jun 8, 2024·5 min read
blog.harshdaiya.comExploring Large Language Models (LLMs) with Python: A Comprehensive GuideIntroduction Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP). These models, such as GPT-4, are designed to understand and generate human-like text. In this post, we will delve into how to work with LLMs...Feb 15, 2024·6 min read
blog.harshdaiya.comImplementing Real-Time Credit Card Fraud Detection with Apache Flink on AWSCredit card fraud is a significant concern for financial institutions, as it can lead to considerable monetary losses and damage customer trust. Real-time fraud detection systems are essential for identifying and preventing fraudulent transactions as...Jan 5, 2024·4 min read
blog.harshdaiya.comManaging keys & environment variables in a python pipeline/appIn a production ETL (extract, transform, load) pipeline, it is often helpful to manage environment variables to store sensitive information, such as database credentials or API keys. This allows you to keep this sensitive information separate from yo...Oct 31, 2023·3 min read