© 2023 Hashnode
#hadoop
In the vast world of big data processing, Apache Hive has emerged as a powerful tool for querying and analyzing large datasets stored in distributed storage systems like Hadoop. However, as the volume…
K-Nearest Neighbors (KNN), a non-parametric lazy learning technique, is considered one of the best techniques for classification. Unlike other classification algorithms like Logistic Regression, Naïve…
Installing Javaa. apt-get update.b. apt-get install openjdk-8-jre.c. apt-get install openjdk-8-jdk. Installing ssh (Secure Shell)a. sudo apt-get -y install openssh-serverb. ssh-keygen -t rsac. cd .ssh…
Introduction to the Big Data Big data refers to the large and complex sets of data generated by various sources in today's digital world. With the rise of connected devices and the internet, the amoun…
Introduction to Big Data and Hadoop: Note: These are just study materials I made for myself for career development when I was learning BigData. Overview: Understand the concepts of Big Data. Explain…
Big data is a term used to describe the massive amount of data that organizations need to process and analyze in order to gain insights and make informed decisions. It can be anything from customer da…
What Is A Transactional Database? Transactional data is information captured from day-to-day business activities such as sales, discounts, payment methods, supplier purchase orders, customer support r…
Sometimes in a spark application, we need to share small data across all the machines for processing. For example, if you want to filter some set of words from a large dataset residing in a datalake. Or if we simply just want to know how ma…
What is MapReduce? MapReduce is a software framework for processing large data sets that are distributed over several machines. MapReduce facilitates concurrent processing by splitting petabytes of da…
Hadoop Distributed File System(HDFS) is the world’s most reliable storage system. It is best known for its fault tolerance and high availability. What is Hadoop HDFS? HDFS stores very large files runn…