K-mean clustering

K-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

Screenshot 2021-07-20 at 10-54-08 K Means Clustering Simplified in Python K Means Algorithm.png

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Use-case in Security Domain

1) Document Classification

Cluster documents in multiple categories based on tags, topics, and the content of the document. this is a very standard classification problem and k-means is a highly suitable algorithm for this purpose. the initial processing of the documents is needed to represent each document as a vector and uses term frequency to identify commonly used terms that help classify the document. the document vectors are then clustered to help identify similarity in document groups.

Screenshot 2021-07-20 at 11-09-35 Applying Machine Learning to classify an unsupervised text document.png

2) Delivery Store Optimization

Optimize the process of good delivery using truck drones by using a combination of k-means to find the optimal number of launch locations and a genetic algorithm to solve the truck route as a traveling salesman problem. here is a whitepaper on the same topic.

Screenshot 2021-07-20 at 11-08-33 How retail and E-Commerce can scale fast using logistics optimization.png

3) Identifying Crime Localities

With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality.

Screenshot 2021-07-20 at 11-05-47 Crime analysis using K-Means clustering.png

Post hidden from Hashnode

K-mean clustering and its Use-case in the security domain

K-mean clustering

What is K-Means Algorithm?

How does the K-Means Algorithm Work?

Use-case in Security Domain

Product

Explore

Company

Blogs

Partner with us

Support

Comparisons

Comparisons