© 2023 Hashnode
#data-engineering
Image Source: Data Lakehouse – Databricks One of the foundational papers (Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics (cidrdb.org)) coined the idea of Lakehouse which explores the opportu…
Every act of conscious learning requires the willingness to suffer an injury to one's self-esteem. That is why young children, before they are aware of their own self-importance, learn so easily.Thoma…
EXECUTIVE SUMMARY Data engineers build pipelines that help companies collect, merge, and transform data to facilitate seamless analytics. They oversee the creation of an infrastructure design that enables modern data analytics. A Data engin…
Spark Streaming: Processing Big Data in Real-Time Big data processing has become an essential aspect of modern data management and analysis. With the growth of connected devices and the Internet of Things (IoT), organizations are faced with…
Automate, customize, and execute your software development workflows right in your repository with GitHub Actions. You can discover, create, and share actions to perform any job you'd like, including …
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are both data integration processes used to move and transform data from one system to another. The main difference between ETL and ELT is the order in which the data is proc…
When it comes to pursuing a career in the field of Data and specifically Data Engineering and many other tech-related fields, Python comes off as a powerful tool. As you will be forging ahead in your …
Big data is one of the most significant challenges facing organizations today, as they try to make sense of vast amounts of data from various sources. To address this challenge, SAP has developed two …
According to The Data Warehouse Toolkit by Kimball "The grain must be declared before choosing dimensions or facts because every candidate dimension or fact must be consistent with the grain." Some examples: The sales table has a grain o…
Celery Executor Celery is used for running distributed asynchronous python tasks. Hence, Celery Executor has been a part of Airflow for a long time, even before Kubernetes. With Celery Executors, you must set a specific number of worker ins…