Raghuveer Sriramanraghuveer.me·Nov 4, 2024A Practical Introduction to Google Cloud DataformDataform is a tool that creates data pipelines using SQL. If you’re familiar with Dbt, Dataform is probably best understood as Dbt-esque tool that integrates really well with BigQuery and other Google Cloud products. In a short amount of time, it’s q...Discuss·3 likes·51 readsPractical guide to build data pipelines with Dataformdataform
Constantin Lungudatawise.dev·Nov 1, 2024Sometimes, you have to use subqueries!Query without FROM clause cannot have a WHERE clause, goes the old SQL adage. So I had this interesting problem the other day. Let's say an order has three boolean flags, each indicating whether a particular error has occurred during its lifetime. Ou...Discuss·31 readsPractical SQLbigquery
Constantin Lungudatawise.dev·Oct 27, 2024Not all NULLS are the sameSo NULLs are definitely beasts of their own and as Data Engineers we come to learn to take them into account. That is because not knowing their quirks can lead to unexpected results or errors. Let's look at how not all NULLS are the same in BigQuery ...Discuss·29 readsPractical SQLbigquery
Constantin Lungudatawise.dev·Oct 27, 2024Extracting keys from JSON in BigQueryA couple of months ago, I've posted about dynamically extracting key-value pairs from JSON in BigQuery SQL which leveraged regex (check comments). Shortly after that post, we've gotten a new built-in function to dynamically extract the keys occurring...Discuss·30 readsPractical SQLbigquery
Constantin Lungudatawise.dev·Oct 21, 2024Computing a hash aggregation in BigQuerySo I've seen Snowflake has an HASH_AGG function. When would we need it? Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space. While BigQuery does not have it yet, we ca...DiscussPractical SQLbigquery
Constantin Lungudatawise.dev·Oct 20, 2024Another look at ANY_VALUE in BigQueryA reminder that ANY_VALUE is a pretty interesting aggregation function in BigQuery SQL. It gives you a chosen row from a group. Chosen doesn't mean random, but non-deterministic. Together with HAVING MAX | MIN you can actually control what rows get p...Discuss·31 readsPractical SQLbigquery
Raghuveer Sriramanraghuveer.me·Oct 17, 2024Partition and cluster an existing BigQuery tableSometimes it so happens that we create or are using a table with data that is non-partitioned but we need to convert this into a partitioned table. A typical use-case is old tables that start accumulate data over time. Quite often, we need the same d...Discussbigquery
Harvey Ducayhddatascience.tech·Oct 16, 2024Cloud Storage to Bigquery: Data Warehousing and IngestionIntroduction In today's data-driven world, effectively managing and analyzing large datasets has become crucial for business success. Utilizing Cloud Storage and BigQuery is a fundamental skill for data engineers, analysts, and organizations looking ...Discuss·71 readsbigquery
Constantin Lungudatawise.dev·Oct 15, 2024Why partitioning tables is not a silver bullet for BigQuery performanceI recently encountered an interesting case that reminded me of a couple of things and taught me a few lessons. When working with BigQuery tables, partitioning and clustering are often go-to operations. Typically, we would partition by a meaningful da...Discuss·56 readsPractical SQLbigquery
Harvey Ducayhddatascience.tech·Oct 3, 2024The $42 BigQuery Lesson: A Novice's Costly MistakeAs a data engineer, I've always prided myself on my ability to work with databases and analyze datasets efficiently. However, my recent foray into the world of big data and Google BigQuery proved that even experienced professionals can make rookie mi...Discussbigquery