Constantin Lungudatawise.dev·Oct 21, 2024Computing a hash aggregation in BigQuerySo I've seen Snowflake has an HASH_AGG function. When would we need it? Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space. While BigQuery does not have it yet, we ca...DiscussPractical BigQuerybigquery
Constantin Lungudatawise.dev·Oct 20, 2024Another look at ANY_VALUE in BigQueryA reminder that ANY_VALUE is a pretty interesting aggregation function in BigQuery SQL. It gives you a chosen row from a group. Chosen doesn't mean random, but non-deterministic. Together with HAVING MAX | MIN you can actually control what rows get p...DiscussPractical BigQuerybigquery
Raghuveer Sriramanraghuveer.me·Oct 17, 2024Partition and cluster an existing BigQuery tableSometimes it so happens that we create or are using a table with data that is non-partitioned but we need to convert this into a partitioned table. A typical use-case is old tables that start accumulate data over time. Quite often, we need the same d...Discussbigquery
Harvey Ducayhddatascience.tech·Oct 16, 2024Cloud Storage to Bigquery: Data Warehousing and IngestionIntroduction In today's data-driven world, effectively managing and analyzing large datasets has become crucial for business success. Utilizing Cloud Storage and BigQuery is a fundamental skill for data engineers, analysts, and organizations looking ...Discuss·70 readsbigquery
Constantin Lungudatawise.dev·Oct 15, 2024Why partitioning tables is not a silver bullet for BigQuery performanceI recently encountered an interesting case that reminded me of a couple of things and taught me a few lessons. When working with BigQuery tables, partitioning and clustering are often go-to operations. Typically, we would partition by a meaningful da...DiscussPractical BigQuerybigquery
Harvey Ducayhddatascience.tech·Oct 3, 2024The $42 BigQuery Lesson: A Novice's Costly MistakeAs a data engineer, I've always prided myself on my ability to work with databases and analyze datasets efficiently. However, my recent foray into the world of big data and Google BigQuery proved that even experienced professionals can make rookie mi...Discussbigquery
Rahul Dasschemasensei.hashnode.dev·Sep 25, 2024Streaming data from Kafka to BigQuery using Apache BeamIn this guide, we will walk through the process of reading data from Kafka and storing it in BigQuery using Apache Beam. Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. Prerequisite...Discusskafka
Constantin Lungudatawise.dev·Sep 23, 2024Using RANGE_BUCKET in BigQueryI've recently had to perform a validation of differences between 2 data sources, so I've figured it would be interesting to see the distribution of absolute differences between the two (how big are they + how often they happen). I've used RANGE_BUCKE...Discuss·50 readsPractical BigQuerybigquery
Khadeer Khancloudkhanquest.hashnode.dev·Sep 21, 2024The Data Engineer's Guide to Lakes and Warehouses: Navigating Google Cloud SolutionsAs we navigate the complexities of big data, understanding the nuances between data lakes and data warehouses becomes crucial for designing scalable, efficient, and powerful data solutions. We'll examine how these architectural elements fit into the ...Discuss·1 likedata-engineering
Khadeer Khancloudkhanquest.hashnode.dev·Sep 21, 2024Data Engineering Essentials: Powering Insights in the Cloud EraIn today's data-centric world, the role of a data engineer has become increasingly crucial. As businesses strive to harness the power of their data, data engineers are at the forefront, constructing the pipelines that transform raw information into a...Discussgoogle cloud