Home
Blogs
Bookmarks
Forums
Hackathons
Search

Author

Write
Drafts

New
Bug0 - The AI-native e2e QA regression testing Bug0 Browsers - Cloud Chromium on demand, per-minute, live preview Passmark - The open-source AI framework for regression testing Changelog Brand @hashnode on X Hashnode on LinkedIn Code of Conduct Support - hello+support@hashnode.com
Sign in
Terms Privacy Sitemap
© 2026 LinearBytes Inc.

Search Hashnode

Search posts, tags, users, and pages

Feed

Discussion

Pankaj Tanwar

I write about System Design & Web Tech. Curious about how things work!

Mar 8, 2021

Scalability Challenge : How to remove duplicates in a large data set (~100M) ?

Dealing with large datasets is often daunting. With limited computing resources, particularly memory, it can be challenging to perform even basic tasks like counting distinct elements, membership check, filtering duplicate elements, finding minimum, ...

pankajtanwar.hashnode.dev4 min read

#system-architecture #scalability #programming #design

Responses(1)

Catalin Pit

Mar 8, 2021

My head is spinning from all those numbers, haha!

Great article; well done, Pankaj Tanwar!

Thanks Catalin. Big fan of your articles. Keep up the great work ✌️

Most discussed in Forum

S
Hey Hashnode! Engineering student here trying not to overcomplicate tech.
13170J N T K F2d ago
S
AI writes code perfectly. It still sucks at engineering.
1022F M F A F2d ago
S
Are we actually building apps, or just stitching together npm packages?
1022F M F A F2d ago
S
Software engineering degrees teach syntax, not survival.
1022F M F A F2d ago
S
Your hobby project does not need to scale to millions of users yet.
922F M F A F2d ago

View all threads

Pankaj Tanwar

I write about System Design & Web Tech. Curious about how things work!

Mar 8, 2021

Recent in Forum

D
MMWIN He Sinh Thai Cong Nghe So Mo Rong Da Kenh
21m ago
D
A8858 Phat Trien He Thong So Mo Rong Toan Cau
41m ago
A
I built astralcore-syncer, a sync engine so you can make a basic Google Docs clone in <100 lines
116h ago
W
¿Qué es la Certificación ISO 27001 y cuáles son sus principales beneficios para una organización?
1d ago
T
TIL: Postman Can Generate Test Scripts Using AI
11d ago

View all threads