navinkumarnotes123.hashnode.dev·11 hours agoHow to decide bucket count in hiveSteps Calculate Expected Bucket Size: Divide the table size by the block size on Hadoop to get an initial estimate. Expected Bucket Size = Table Size / Block Size on Hadoop Find the Nearest Power of 2: Take the base-2 logarithm of the ini...Discusshivehive
Kiran Reddybhanukiranreddy.hashnode.dev·Mar 14, 2024Mastering Window Functions in SQL ServerUnderstanding Window Functions in SQL Server In the realm of database operations, while regular aggregate functions dominate by working on entire tables through the GROUP BY clause, a less-explored yet highly potent tool exists—the window functions. ...Discuss#windowfunction
Saurav Rajrajsaurav.hashnode.dev·Mar 14, 2024Tokyo Olympics Data Engineering and Analysis using Microsoft AzureThis project deals with Tokyo Olympics 2021 dataset. This project involves understanding the data architecture, creating the ETL pipeline, and finally analysing the data. The project is based off Darshil Parmar video on YouTube. This contains the det...DiscussData Science
Kiran ReddyforDatabricks - PySparkdatabricks-pyspark-blogs.hashnode.dev·Mar 13, 2024Mount Points in DatabricksWhat is DBFS? DBFS stands for Databricks File System. It's a distributed file system that's part of the Databricks Unified Data Analytics Platform. DBFS provides a scalable and reliable way to store data across various Databricks clusters. It's desig...Discuss·10 likesmountpoint
Anuj Syalanujsyal.com·Feb 26, 2024How I Passed Databricks Data Engineer Professional ExamIntroduction You've aced the Databricks Data Engineer Associate Exam – congrats! Or maybe, you’re just curious as you heard someone becoming a Certified Data Engineering Professional. Guess it’s time to set your data engineering career in motion. But...Discuss·68 readsDatabricks
Daniel Odanai.hashnode.dev·Feb 17, 2024Building Data Pipelines: Ingesting Data on AWSIn today's Data Engineering exploration, lets dive into creating data pipelines on AWS. Our focus will be on the ingestion to buffer (Kinesis) phase. The data used for this project is from UCI machine learning repository. Overview Key Steps Lambda ...Discussdataengineering
Constantin Lungudatawise.dev·Feb 16, 2024Watch out when using SAFE_CAST in BigQueryHere's an interesting situation I've seen with BigQuery. Say a source system provides JSON events with timestamps at microsecond grain (6 decimals, so something like 2024-01-01 14:00:00.123456). This is cast using SAFE_CAST into a proper TIMESTAM...Discuss·27 readsPractical BigQuerybigquery
Sweta_Sarangiswetasarangi.hashnode.dev·Feb 16, 2024Roles in dataCrafting a narrative with data is a journey that typically doesn't initiate with oneself. The data's origin is crucial, and the effort to make it usable often extends beyond individual capabilities, especially within an enterprise context. Contempora...DiscussData Science
__thatpyjamagirlengineereddata.hashnode.dev·Feb 9, 2024Freelancing with DataFor the first time in my career, I am freelancing for a small startup. Documenting this journey as I go along. Its a small company trying to create a community of gamers and game developers and make a fortune by increasing game engagement. Where do I...DiscussData Science
Sai SrirampurforPeerDB Blogblog.peerdb.io·Feb 5, 2024Reducing BigQuery Costs by 260xIn this blog post, we'll do a deep-dive into a simple trick that can reduce BigQuery costs by orders of magnitude. Specifically, we'll explore how clustering (similar to indexing in BigQuery world) large tables can significantly impact costs. We will...Discuss·12 likes·11.1K readsbigquery