Sachin Nandanwarwww.azureguru.net·Sep 13, 2024Copy Data from Cosmos DB to Microsoft Fabric LakehouseCosmos DB offers a robust set of REST APIs that work well for general data operations. I wondered if it would be possible to push data from Cosmos DB into a Fabric Lakehouse. In Microsoft fabric Data Factory I came across a Cosmos DB connector for Fa...DiscussCosmosDB
Shreyash Banteshreyash27.hashnode.dev·Sep 9, 2024ETL Process: A Beginner’s Guide 3LOAD ⭐ Well, so far we extracted the data from the source and transformed it how difficult will be just to push the data to a location right? well it's different from just pushing the Final Dataframeto a location. how you load data depends on the req...DiscussData Science
Fritz LarcoforSling Data Blogblog.slingdata.io·Sep 7, 2024Efficient Data Lake Management with Sling and Delta LakeUnlocking Delta Lake Insights with Sling: Efficient Read-Only Access In the ever-evolving landscape of big data, Delta Lake has emerged as a powerful open-source storage layer that brings reliability and performance to data lakes. Today, we're thrill...Discuss·40 readsdata-engineering
Rahul Dasschemasensei.hashnode.dev·Aug 31, 2024Getting Started with PySparkApache Spark is a powerful distributed computing framework commonly used for big data processing, ETL (Extract, Transform, Load), and building machine learning pipelines. It supports various programming languages, including Scala, Java, and Python, m...Discuss·2 likes·30 readsspark
Fritz LarcoforSling Data Blogblog.slingdata.io·Aug 28, 2024Reading Apache Iceberg Data with SlingWe're excited to announce that Sling now supports reading the Apache Iceberg format, bringing enhanced data lake management capabilities to our users. This addition opens up new possibilities for efficient and flexible data handling in large-scale en...Discuss·118 readsapacheiceberg
Dorian Sabitovsfappsinfo.hashnode.dev·Aug 22, 2024Sliced Bread ETL on Salesforce ReviewIntroduction: ETL Tools and Why You Should Use Them How do you make your main business decisions? What sources do you rely on? I assume that a major focus for owners and stakeholders is data. This information could encompass everything from client re...Discussblog
Kumar Rohitkrohit-de.hashnode.dev·Aug 15, 2024Hello Spark on MinikubeMinikube is a beginner-friendly tool that lets you run a Kubernetes cluster on your local machine, making it easy to start learning and experimenting with Kubernetes without needing a complex setup. It creates a single-node cluster inside a virtual m...Discuss·41 readsExperiments on Minikube 🚀sparksql
Sai SrirampurforPeerDB Blogblog.peerdb.io·Aug 14, 2024Enhancing Postgres to ClickHouse replication using PeerDBProviding a fast and simple way to replicate data from Postgres to ClickHouse has been a top priority for us over the past few months. Last month, we acquired PeerDB, a company that specializes in Postgres CDC. We're actively integrating PeerDB into ...Discuss·83 readsPostgreSQL
Sai SrirampurforPeerDB Blogblog.peerdb.io·Jul 30, 2024ClickHouse acquires PeerDB for native Postgres CDC integrationWe are thrilled to join forces with ClickHouse to make it seamless for customers to move data from their Postgres databases to ClickHouse and power real-time analytics and data warehousing use cases. We released the ClickHouse target connector for Po...Discuss·5 likes·4.2K readsPostgreSQL
Shreyan Dasshreyandas.hashnode.dev·Jul 19, 2024Need for speed, but where do I store all of this data?Dear Diary, The most dreaded day in the life of any data engineer came to me a while ago — our TL sat us down to tell us that we had been billed an enormous amount for our cloud storage in the last quarter, and we needed to find a way to cut down sto...Kev and 1 other are discussing this2 people are discussing thisDiscuss·40 likesConfessions of a Data Engineerdata-engineering