TSTanupriya Singhintanupriya.com路Mar 27, 2025 路 8 min read20 essential PySpark operationsTable Of Contents: Setting Up PySpark Loading data Basic operations Column operations Row operations Aggregate functions Window functions Joins Performance Optimisation Best Practices and Tips Conclusion References As a Machine Learning...00
TSTanupriya Singhintanupriya.com路Jan 5, 2025 路 1 min readDemystifying Setup.py: The Blueprint Behind Python Packages 馃摝Ever wondered what makes 'pip install' work? Let's break down the mechanics behind setup.py 馃摝 At its core, setup.py is a Python script that defines your package's identity card. It contains essential metadata about your package - from its name and v...00
TSTanupriya Singhintanupriya.com路Dec 9, 2024 路 3 min readA Beginner's Guide to Spark: Insights from an MLEIntroduction to Spark Spark is a distributed computing system designed for large-scale data processing. Here are key reasons why Spark is suitable for ML pipelines: High Volume of Data: Many ML problems require processing large datasets, which mig...00
TSTanupriya Singhintanupriya.com路Oct 21, 2024 路 4 min readNotes from Andrej Karpathy's 1-Hour Lecture on LLMsBefore we begin, a short excerpt from the book The Defining Decade: The right time to take action is at the sweet spot鈥攚aiting too long or acting too early can lead to missed opportunities. Echoing this idea, I decided to learn about the Generative A...00
TSTanupriya Singhintanupriya.com路Jul 14, 2023 路 7 min readMy Journey as a Machine Learning EngineerIntroduction One of the questions I get as a Machine Learning Engineer is whether I am required to read research papers and be aware of the latest algorithms and models. The answer is no. Don鈥檛 get me wrong, it benefits from knowing the various model...00