Brijesh Prajapatibrijesh360.hashnode.dev·Jul 26, 2024The Data Science Life Cycle: From Raw Data to Insightful ResultsData science is a transformative field that extracts valuable insights from raw data. Understanding the data science life cycle is crucial for anyone looking to leverage data effectively, whether in business, research, or other domains. This article ...DiscussData Science
KALINGA SWAINkalingaswain.hashnode.dev·Feb 11, 2024EMR with EKSHi, welcome to the event! Amazon EMR is like the Rockstar of cloud big data. Picture this: petabyte-scale data parties, interactive analytics shindigs, and even machine learning raves—all happening with cool open-source crews like Apache Spark, Apach...Discuss·70 reads#AWSConsole
Maheshwar Ligadefortechwastitechwasti.com·Jan 14, 2024Parquet File Handling in Go: A Complete Guide!Parquet, a columnar storage file format, is efficient for large-scale data processing. Handling Parquet files in Go allows efficient data storage and retrieval. This guide covers the essentials of working with Parquet files in Go, including reading, ...Discuss·1.0K readsgo-languagegolang
Anuj Kumarvandata04.hashnode.dev·Jun 28, 2023Introduction to Big Data and Hadoop (Ecosystem)What is Big data? Big data refers to the large and complex sets of data generated by various sources in today's digital world. Also, any data that can be characterised by the 3 V's is considered to be big data and they are:- Volume: The volume of da...Discuss·2 likes·34 readsBigData
Renjitha Krenjithak.hashnode.dev·Apr 7, 2023Setting up Apache SparkIn this blog, I will be focusing on setting up the workspace for Windows so that we can get started with Apache Spark and do some hands-on in my upcoming series of Apache Kafka. If you haven't taken a look at it and wish to, here is the link https://...Discuss·1 like·86 readsspark
Renjitha Krenjithak.hashnode.dev·Mar 27, 2023Understanding MapReduce: A Beginners GuideMost of us have been hearing the term MapReduce for a long while now, I have been wondering what this term means, Let's try to understand the basics of the same. So, MapReduce is a powerful programming model and software framework for processing larg...Discuss·187 readsBigData
Priyank Patelpriyankpatel.dev·Feb 18, 2023Balancing Performance and Scalability with Elasticsearch Shards and ReplicasDisclaimer The entire article is based on a Stack Overflow response about shards and replicas, and all credit goes to Javanna for providing an outstanding explanation. The explanation is so simple that even if you have no idea what the hell shard an...Discuss·65 readselasticsearch
Natalia Zotkinanataly.hashnode.dev·Feb 2, 2023Personalization at Scale: Using Big Data to Create Tailored ProductsIn the digital age, getting a product or a service tailored exactly for you is no longer a pipe dream. Moreover, an increasing number of modern businesses strive to personalize their offerings to meet the expectations of every client. In this fierce ...Discuss·70 likes·28.5K readsPersonalization
Evan Chanevanchan.hashnode.dev·Jan 16, 2023Windowing Operations in PySpark(Note: this is adapted from my talk at 2021 Scale by the Bay, Location-Based Data Engineering for Good) If you are a data scientist, chances are you are coding Python and most likely using pandas. You might have heard of or are learning Apache Spark,...Discuss·106 readsPython
Uche Okoyeafewfigures.hashnode.dev·Jan 15, 2023Big DataBig data is a term used to describe the massive amount of data that organizations need to process and analyze in order to gain insights and make informed decisions. It can be anything from customer data, financial data, social media, healthcare recor...Discuss·2 likes·32 readsdata