MGMadhav Ganesaninmadhavganesan.hashnode.dev路Apr 1, 2025 路 2 min readApache PysparkIt is a fast and general-purpose distributed computing system for big data processing. It provides an in-memory computation model, which significantly improves performance over traditional disk-based processing frameworks like Hadoop MapReduce. Key F...00
MGMadhav Ganesaninmadhavganesan.hashnode.dev路Mar 27, 2025 路 2 min readAzure Data FactoryIt is a cloud-based ETL (Extract, Transform, Load) service designed for serverless data integration and transformation at scale. Key Features: Serverless Data Integration: Automates and orchestrates data workflows across various data sources. Scala...00
MGMadhav Ganesaninmadhavganesan.hashnode.dev路Mar 22, 2025 路 2 min readAzure Data Lake StorageKey Concepts Data Lakehouse It is a modern data management system that combines the benefits of data lakes and data warehouses. It enables efficient data storage, processing, and analytics in a single architecture. Delta Lake It is a technology desig...00
MGMadhav Ganesaninmadhavganesan.hashnode.dev路Mar 18, 2025 路 3 min readIntroduction to Azure DatabricksKey Concepts ETL (Extract, Transform, Load) It is a process used in data warehousing to: Extract data from various sources. Transform it into a format suitable for analysis. Load it into a data warehouse for storage and querying. Data Warehouse ...00
MGMadhav Ganesaninmadhavganesan.hashnode.dev路Mar 15, 2025 路 3 min readAzure For Data EngineeringIt provides a comprehensive ecosystem for data engineering, enabling organizations to build, manage, and optimize large-scale data pipelines efficiently. It offers various services tailored to data ingestion, storage, processing, and analytics. Key A...00