© 2023 Hashnode
#data-lake
Stephanie Astono Salim, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons Introduction I write this article from a backend software developer (APIs & microservices). I found that in many domains, some sort …
Since release v0.7.1, DuckDB has the ability to repartition data stored in S3 as parquet files by a simple SQL query, which enables some interesting use cases. Why not use existing AWS services? If yo…
Abstract Amazon S3 is an object store that provides scalability to store any amount of data, and customers leverage S3 to build a data lake. Being an object store, S3 has limitations when it comes to …
Introduction In our previous article, part 2 of the series, we walked through the extraction, processing, and creation of some data mart, using the New York City taxi trip data which is publicly avail…
Introduction In part 1 of this article series, we walked through how to feed a Data Lake built on top of Amazon S3, based on streaming data, using Amazon Kinesis. In part 2, we will cover all of the s…
What is Data Lake? A centralised storage system called a "Data Lake" is used to store all the unprocessed data that is ingested from various sources. It can scale up to accommodate storing all of the …
Every act of conscious learning requires the willingness to suffer an injury to one's self-esteem. That is why young children, before they are aware of their own self-importance, learn so easily.Thoma…
When we scroll through these sites in hopes to find something we need to buy (say, a shirt), we add it to the cart, or we just let it be saved for later. Within a few moments, you begin to see adverti…
A data lake is a central repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run …
Last time, we set up Jupyter in EC2 and Apache Spark with Delta Lake connection to S3. We will import data from the dataset and query it with SQL this time. About Dataset For this experiment, we will …