Feed
Pro
Search

Author

Write
Drafts

Bug0 - The AI-native e2e QA regression testing Passmark - The open-source AI framework for regression testing Hackathons Changelog Brand Hashnode gql skill - let your AI agent publish to your Hashnode blog The Foreword by Hashnode - official blog from the Hashnode team @hashnode on X Hashnode on LinkedIn Support - hello+support@hashnode.com Code of Conduct Terms Privacy Sitemap
Sign in

Tag feed

#azure-data-engineer

3 posts·0 followers

Articles

Trending tags this week

Search Hashnode

Search posts, tags, users, and pages

Tag feed

#azure-data-engineer

3 posts·0 followers

Articles

Trending tags this week

Explore Hashnode

Alternatives

Hashnode vs Medium
Hashnode vs WordPress
Hashnode vs Ghost
Hashnode vs Substack
Hashnode vs Notion
Hashnode vs Dev.to
All alternatives

Changelog
Sitemap
Terms
Privacy

VMVenkatesh Marellaindata-engineer-solutions.hashnode.dev·Apr 10, 2024 · 1 min read

PySpark: Read Large CSV files efficiently

Scenario You have a large CSV file (100GB+ of data) with millions of records. Loading the file without optimization causes memory issues and slow performance. Solution: Use Partitioning & Parquet for Faster Processing Step 1: Read the Large CSV in Py...

VMVenkatesh Marellaindata-engineer-solutions.hashnode.dev·Apr 6, 2024 · 1 min read

PySpark: Removing Duplicates in Large Datasets

Scenario Your dataset contains duplicate customer records. You need to remove duplicates based on the latest timestamp. Solution: Use dropDuplicates() & Window Functions Step 1: Sample Data from pyspark.sql.functions import col from pyspark.sql impor...

SRSanjay Rathoreinsanjayrathore.hashnode.dev·Jun 26, 2023 · 2 min read

What is Microsoft Azure Autoscaling?

Microsoft Azure Autoscaling is a feature provided by the Azure cloud platform that allows automatic adjustment of computing resources based on the workload demands of an application or service. Autoscaling helps optimize resource utilization and ensu...