schemasensei.hashnode.devUnlocking Real-Time Data with Change Data Capture (CDC)In this guide, we will cover CDC, its importance, and the setup of a CDC stack using Kafka, Debezium, and other services. Additionally, we will configure a PostgreSQL connector using the Confluent Control Center web UI to capture changes from a Postg...Jan 7, 2025·9 min read
schemasensei.hashnode.devStreaming data from Kafka to BigQuery using Apache BeamIn this guide, we will walk through the process of reading data from Kafka and storing it in BigQuery using Apache Beam. Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. Prerequisite...Sep 25, 2024·4 min read
schemasensei.hashnode.devSetting Up a Kafka Cluster on GCP VM Using DockerIn the world of real-time data processing and streaming, Apache Kafka stands as a robust and widely-used platform. Setting up a Kafka cluster on Google Cloud Platform (GCP) VM can be a crucial step in building your data processing pipeline. In this g...Sep 14, 2024·5 min read
schemasensei.hashnode.devGetting Started with PySparkApache Spark is a powerful distributed computing framework commonly used for big data processing, ETL (Extract, Transform, Load), and building machine learning pipelines. It supports various programming languages, including Scala, Java, and Python, m...Aug 31, 2024·4 min read
schemasensei.hashnode.devSetting up a Multi-Node Hadoop Cluster on Google CloudIn this tutorial, we will walk through the process of setting up a multi-node Hadoop cluster on Google Cloud. This cluster will consist of one master node and two worker nodes. We will be using Google Cloud VM instances for this setup, this tutorial ...Dec 4, 2023·5 min read