bijudevassy.hashnode.devUnderstanding Spark Job Execution FlowFrom Code Submission to Final Result When you run a Spark job, it may look simple from the outside: You write a query.You click run.Results appear. But behind the scenes, Spark orchestrates a carefull1d ago·6 min read
bijudevassy.hashnode.devCLUSTERED BY vs ZORDER BY in DatabricksUnderstanding Data Distribution vs Storage Optimization In large-scale Spark workloads, performance problems often lead to one of two suggestions: “Let’s bucket the table.” “Let’s ZORDER it.” Both1d ago·5 min read
bijudevassy.hashnode.devSpark Cluster Sizing in DatabricksDecision Framework, Bottleneck Mapping & Real Fixes When Spark jobs slow down, most teams react the same way: “Increase the cluster.” Sometimes that works.Most of the time, it just increases cost. C1d ago·6 min read
bijudevassy.hashnode.devData Modeling Patterns in MongoDB AtlasWhen people move from relational databases to MongoDB, the biggest shift isn’t syntax — it’s mindset. You stop thinking in tables and joins. You start thinking in documents, access patterns, and how y2d ago·4 min read
bijudevassy.hashnode.devCAP Theorem in Azure Cosmos DB: Choosing What Matters When the Network BreaksThe Reality of Distributed Systems Distributed systems look clean on architecture diagrams: Multiple regions Global users Low latency everywhere In reality: Networks fail Regions get isolated 2d ago·4 min read