A common job of machine learning algorithms is to recognize objects and being able to separate them into categories. This process is called classification, and it helps us segregate vast quantities of data into discrete values, i.e. :distinct, like 0/1, True/False, or a pre-defined output label class.
Classification in machine learning and statistics is a supervised learning approach in which the computer program learns from the data given to it and make new observations or classifications.
For example, spam detection in email service providers can be identified as a classification problem. This is a binary classification since there are only 2 classes as spam and not spam. A classifier utilizes some training data to understand how given input variables relate to the class. In this case, spam and non-spam emails have to be used as the training data. When the classifier is trained accurately, it can be used to detect an unknown email.
In short, classification is a form of “pattern recognition,”. Here, classification algorithms applied to the training data to find it's pattern and create a future data sets.
There are two types of learners in classification
- Lazy Learners: It only store the training data and wait until testing data comes out. lazy learners take less time to train data but more time to predict. Examples are- k-nearest neighbor, Case-based reasoning
- Eager Learners: On the other hand eager learners construct a classification model based on the given training data before receiving data for classification. it takes more time to train but less time to predict. Examples are-Decision Tree, Naive Bayes, Artificial Neural Networks(ANN).
Classification Terminologies In Machine Learning:
- Classifier: An algorithm that maps the input data to a specific category.
- Classification model: The model predicts or draws a conclusion to the input data given for training, it will predict the class or category for the data.
- Feature: A feature is an individual measurable property of a phenomenon being observed.
- Binary Classification: Classification task with two possible outcomes. Eg: Gender classification (Male / Female)
- Multi-class classification: Classification with more than two classes. In multi class classification each sample is assigned to one and only one target label. Eg: An animal can be cat or dog but not both at the same time
- Multi-label classification:This is a type of classification where each sample is assigned to a set of labels or targets. Eg: A news article can be about sports, a person, and location at the same time.
- Initialize: It is to assign the classifier to be used for
- Train the classifier: All classifiers in scikit-learn uses a fit(X, y) method to fit the model(training) for the given train data X and train label y.
- Predict the target: Given an unlabeled observation X, the predict(X) returns the predicted label y.
- Evaluate: This basically means the evaluation of the model i.e classification report, accuracy score, etc.
Now we will discuss the types of classification. There is a lot of classification algorithms available now but it is not possible to conclude which one is superior to other. It depends on the application and nature of available data set. So we will discuss about the advantage and disadvantages of all the classification types here so you may find which classification is more useful.
Logistic regression:
In this algorithm, the probabilities describing the possible outcomes of a single trial are modelled using a logistic function. it will have only two possible outcomes.
Advantages: It is most useful for understanding how a set of independent variables affect the outcome of the dependent variable.
Disadvantages: Works only when the predicted variable is binary, assumes all predictors are independent of each other and assumes data is free of missing values.
Naive Bayes:
It is suitable for solving multi-class prediction problems. If its assumption of the independence of features holds true, it can perform better than other models and requires much less training data. Naive Bayes is based on Bayes’ theorem, which is given as:
Advantages: This algorithm works quickly and can save a lot of time. Naive Bayes is suitable for solving multi-class prediction problems.
Disadvantages: Naive Bayes assumes that all predictors (or features) are independent, rarely happening in real life. This limits the applicability of this algorithm in real-world use cases.
Stochastic gradient descent:
It is a very effective and simple approach to fit linear models. Stochastic Gradient Descent is particularly useful when the sample data is in a large number. It supports different loss functions and penalties for classification.
Advantages: Efficiency and ease of implementation.
Disadvantages: Requires a number of hyper-parameters and it is sensitive to feature scaling.
K-Nearest Neighbours:
KNN works by finding the distances between a query and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label (in the case of classification) or averages the labels. It is a lazy type classification.
Advantages: This algorithm is simple to implement, robust to noisy training data, and effective if training data is large.
Disadvantages: The only disadvantage with the KNN algorithm is that there is no need to determine the value of K and computation cost is pretty high compared to other algorithms.
Decision Tree:
Decision tree uses the tree representation to solve the problem in which each leaf node corresponds to a class label and attributes are represented on the internal node of the tree. We can represent any boolean function on discrete attributes using the decision tree.
Advantages: Decision Tree is simple to understand and visualise, requires little data preparation, and can handle both numerical and categorical data.
Disadvantages: it can create complex trees that may bot categorize efficiently. They can be quite unstable because even a simplistic change in the data can hinder the whole structure of the decision tree.
Random Forest:
ensemble model made of many decision trees using bootstrapping, random subsets of features, and average voting to make predictions. This is an example of a bagging ensemble.
Advantages: Reduction in over-fitting and random forest classifier is more accurate than decision trees in most cases.
Disadvantages: The only disadvantage with the random forest classifiers is that it is quite complex in implementation and gets pretty slow in real-time prediction.
Support Vector Machine:
Support Vector Machine (SVM) is a supervised machine learning algorithm capable of performing classification, regression and even outlier detection.
Advantages: It uses a subset of training points in the decision function which makes it memory efficient and is highly effective in high dimensional spaces.
Disadvantages: The algorithm does not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.
This is all about classification and types of claasification of Machine Learning.