madhavganesan.hashnode.devApache PysparkIt is a fast and general-purpose distributed computing system for big data processing. It provides an in-memory computation model, which significantly improves performance over traditional disk-based processing frameworks like Hadoop MapReduce. Key F...Apr 1, 2025路2 min read
madhavganesan.hashnode.devAzure Data FactoryIt is a cloud-based ETL (Extract, Transform, Load) service designed for serverless data integration and transformation at scale. Key Features: Serverless Data Integration: Automates and orchestrates data workflows across various data sources. Scala...Mar 27, 2025路2 min read
madhavganesan.hashnode.devAzure Data Lake StorageKey Concepts Data Lakehouse It is a modern data management system that combines the benefits of data lakes and data warehouses. It enables efficient data storage, processing, and analytics in a single architecture. Delta Lake It is a technology desig...Mar 22, 2025路2 min read
madhavganesan.hashnode.devIntroduction to Azure DatabricksKey Concepts ETL (Extract, Transform, Load) It is a process used in data warehousing to: Extract data from various sources. Transform it into a format suitable for analysis. Load it into a data warehouse for storage and querying. Data Warehouse ...Mar 18, 2025路3 min read
madhavganesan.hashnode.devAzure For Data EngineeringIt provides a comprehensive ecosystem for data engineering, enabling organizations to build, manage, and optimize large-scale data pipelines efficiently. It offers various services tailored to data ingestion, storage, processing, and analytics. Key A...Mar 15, 2025路3 min read