© 2023 Hashnode
#dataops
Introduction Apache Airflow is an open-source platform for authoring, scheduling, and monitoring data and computing workflows. It was developed by Airbnb and is now under the Apache Software Foundation.It uses Python to create workflows tha…
Hello, Nowadays, it is very common to have a data environment with several solutions; in this post on Medium, I will summarize how to implement a data layer to unify access safely weighing on better g…
Introduction Technically speaking, event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; stor…
Background Three key concepts of maintaining data engineering projects are: Versioning: You must always keep a history of the lineage of the data sources and data models that you use. Testing: Like every other aspect of software developme…
Introduction Big data workloads are processed using Apache Spark, an open-source distributed processing engine. It uses efficient query execution and in-memory caching for quick analytic queries against any size of data. It offers code reus…
Background The volume of data that is currently collected and processed is perhaps the most spectacular effect of the digital revolution. According to IDC, we generated approximately two zettabytes (ZB) of digital information in 2010, world…
In today's day and age, data has become a crucial asset for organizations across all kinds of industries. Industry after industry—from retail to e-commerce to manufacturing to accounting to insurance …
Introduction Data engineering is the process of designing, building, maintaining, and running systems and infrastructure for storing, processing, and analyzing large, complex datasets. It is a field that has recently become much more import…