ByteHousebytehouse.hashnode.dev·Sep 6, 20237 reasons why a user would need to query Amazon S3 directlyAmazon S3 (Simple Storage Service) is a highly scalable and durable object storage service provided by Amazon Web Services (AWS). It is commonly used to store and retrieve large amounts of data, such as images, videos, log files, backups, and other u...DiscussAmazon S3
Achar Oiroacharoiro.hashnode.dev·Aug 26, 2023How to Use Airbyte to Integrate Data from Different SourcesIntroduction What is Airbyte? Airbyte is an open-source data integration platform designed to replicate data from various applications, APIs, and databases to data warehouses, data lakes, and other destinations. Offering full management and cloud-nat...Discuss·164 readsairbyte
Sakshi Ghosalkarsakshighosalkar.hashnode.dev·Aug 7, 2023Azure Storage AccountIn the data-driven world of today, cloud storage options have become essential for businesses that want to handle their data in a scalable and cost-effective way. Microsoft Azure stands out among the many cloud companies because it has a wide range o...Discuss·1 likeCloud
Nitin Kumar Gaurdatayaari.hashnode.dev·Jul 16, 2023DATA KA GHAR (House of DATA)In my case I always bit confuse in data warehouse, lake & base, from starting I only know about Database where we can store Data and access very quickly as per the requirement. Phir jab suna Data warehouse bhi hai, we can store data there also (its m...Discussdata-warehousing
Sakshi Ghosalkarsakshighosalkar.hashnode.dev·Jun 26, 2023Structured Data for Effective AnalyticsIntroduction: In the world of data-driven decision-making, organizations rely on efficient data management and analytics to derive insights. Data lakes and data warehouses are two essential components of modern data architecture. While data lakes sto...Discuss·164 readsData Science
Tobias MüllerProtobilg.com·Feb 26, 2023Using DuckDB to repartition parquet data in S3Since release v0.7.1, DuckDB has the ability to repartition data stored in S3 as parquet files by a simple SQL query, which enables some interesting use cases. Why not use existing AWS services? If your data lake lives in AWS, a natural choice for ET...Discuss·2.3K readsduckDB
Sneh Bhattmytwocents.hashnode.dev·Feb 24, 2023AWS concepts and ideas - ENABLING CONCURRENT WRITES ON S3 DATA LAKEAbstract Amazon S3 is an object store that provides scalability to store any amount of data, and customers leverage S3 to build a data lake. Being an object store, S3 has limitations when it comes to managing concurrent writes on the same data (think...Discuss·191 readsAWS concepts and ideasAmazon S3
Jonathan ReisProblog.jreissup.com·Feb 23, 2023Implementing a Data Lakehouse Architecture in AWS — Part 3 of 4Introduction In our previous article, part 2 of the series, we walked through the extraction, processing, and creation of some data mart, using the New York City taxi trip data which is publicly available to do consumption. We used some of the princi...Discuss·40 readsExploring the Data Lakehouse and Its Implementation in AWSData-lake
Jonathan ReisProblog.jreissup.com·Feb 23, 2023Implementing a Data Lakehouse Architecture in AWS — Part 2 of 4Introduction In part 1 of this article series, we walked through how to feed a Data Lake built on top of Amazon S3, based on streaming data, using Amazon Kinesis. In part 2, we will cover all of the steps needed to build a Data Lakehouse, using trip ...Discuss·51 readsExploring the Data Lakehouse and Its Implementation in AWSData-lake
Sujal Maitisujal.hashnode.dev·Feb 7, 2023"Art of Managing & Working around Data: DataLake"What is Data Lake? A centralised storage system called a "Data Lake" is used to store all the unprocessed data that is ingested from various sources. It can scale up to accommodate storing all of the enterprise's data. It can keep data of different t...Discussdata-engineering