Ekemini Thompsonekeminithompson.hashnode.dev·Sep 7, 2024Data Engineering Task 2: Understanding ETL PipelinesFor your next task as an aspiring data engineer, your challenge is to write an article titled "Understanding ETL Pipelines: Extract, Transform, Load in Data Engineering." This article should introduce readers to the ETL process, explaining its signif...Discussetl-pipeline
Shreyash Banteshreyash27.hashnode.dev·Sep 4, 2024ETL Process: A Beginner’s Guide 2Transform ⭐ The transform phase in the ETL (Extract, Transform, Load) process is where raw data is refined, organized, and prepared for analysis. This step is crucial because the data extracted from various sources often comes in different formats, w...DiscussData Science
Ekemini Thompsonekeminithompson.hashnode.dev·Aug 30, 2024Transitioning into Tech as a Data EngineerData Engineering Writing Task: TASK 1 Open Accounts: Create accounts on Medium.com, LinkedIn.com, GitHub.com, Dev.to, Twitter.com, and Academia.edu. Task Overview: As someone transitioning into tech with a focus on becoming a data engineer, your...Discuss·30 readsdataengineering
Shreyash Banteshreyash27.hashnode.dev·Aug 26, 2024Understanding the Spark Execution ModelApache Spark's execution model is one of the reasons it stands out as a powerful tool for big data processing. At its core, Spark's execution model revolves around two main concepts: transformations and actions. To understand how Spark operates, it’s...Discuss#rdd
Mehul Kansalmehulkansal.hashnode.dev·Aug 12, 2024Week 12: Mastering Hive 📈Hey fellow data engineers! 👋 This week's blog explores the architecture of Hive, offering insights into its components like data storage and metadata management. It also covers the different types of tables Hive supports, essential optimizations for...Discusshive
Constantin Lungudatawise.dev·Aug 8, 2024Using tempfile module in PythonIn this world, everything is ephemeral. If you need to create temporary files or directories in Python, check out the tempfile module. Whether you want to store intermediate results, manage temp data during execution or just test things out, it can h...DiscussPython for Data EngineersPython
Constantin Lungudatawise.dev·Aug 7, 2024A quick look at the json module in PythonIf you ever need to work with JSON files in Python, you're going to encounter the module with the same name. It help encode to and decode from JSON. Here are the basics: ➡ json.load imports contents from a JSON file to a Python object, based on conve...DiscussPython for Data EngineersPython
Constantin Lungudatawise.dev·Aug 6, 2024Why how the data is collected mattersI find that working with data daily significantly influences how you perceive the world around you. You begin to notice patterns and details you might otherwise overlook. Take retail, for example. In our places, it's common for pharmacies, gas statio...DiscussLearning Journeydataengineering
Constantin Lungudatawise.dev·Aug 6, 2024Retrying in Python using tenacitytenacity - (noun) the quality or fact of continuing to exist; persistence. If you ever need to retry something that might fail in Python, take a look at a specialized package like tenacity. It helps you properly cover common scenarios like retrying o...DiscussPython for Data EngineersPython
Victor OhachorforZero-Stack Engineerzerostackengineer.hashnode.dev·Aug 6, 2024Scatter Plot Basics: Everything You Need to KnowOverview Data visualization enables us to gain better understanding of our data. It transforms large and complex datasets into visual formats that are easier to understand and interpret, even by non-technical people. With data visualization, we can i...DiscussByteBite WisdomDataVisualization