© 2022 Hashnode
#data-engineering
Data wrangling or Data Munging is a process that involves: data Exploration Transformation Validation Making Data available for credible and meaningful analysis Structuring data: The task include…
Data Pipelines - Performance Threats Scalability in the face of increasing datasets and workloads Application failures Scheduled jobs not functioning accurately Tool Incompatibilities Data Pipelines…
V's of Big Data : Velocity: Velocity is the speed at which data accumulates. Data is being generated extremely fast, in a process that never stops. Near or real-time streaming, local, and cloud-base…
Introduction Danny seriously loves Japanese food so, at the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, …
Architecting the Data Platform Layers of a data platform architecture, A layer represents functional components that perform a specific set of tasks in the data Data Ingestion or Data Collection la…
ETL has historically been used for batch workloads on a large scale. They're being used for real-time streaming as well. popular ETL tools: IBM Infosphere information server, AWS Glue, Improvado, Sky…
I am coming with a series of posts on Data Engineering, my information is based on the course that I have completed on Coursera "Data Engineering Professional Certificate" Specialization offered by IB…
Data warehouses give businesses the ability to consolidate data from various sources, such as transactional systems, operational databases, and flat files. Through data integration, bad data removal, …
Problem Lately, I have been working on some ETL projects and during the transform stage I come across the error "AttributeError: 'float' object has no attribute 'rint'", combined with "TypeError: loop…
Overview What is a Data Structure? Its a way, or method, of storing data in your computer: organized in some fashion so that it can be accessed, queried and updated quickly and easily. What is an Algo…