# Wine Classification Using Machine Learning

The classification techniques used here to classify the wines are:

1: CART

2: Logistic Regression

3: Random forest

4: Naive Bayes

5: Perception

6: SVM

7: KNN

## About data:

Here, this blog is about classifying the wine based one certain features. There are 13 features in this data set and 1 target value i.e 14 columns and 177 samples. The columns are ‘name’,’alcohol’,’malicAcid’,’ash’,’ashalcalinity’,’magnesium’,’totalPhenols’,’flavanoids’,’nonFlavanoidPhenols’,’proanthocyanins’,’colorIntensity’,’hue’,’od280_od315',’proline’

Here the target label is name. It has 3 classes in it ‘1’, ‘2’ and ‘3’. The wine falls under either of these three classes.

Here’s the dataset

**Modules involved:**

- Loading Data
- pre processing our data and finding correlation among features and target label
- Splitting the data into training samples and testing samples
- Using classification techniques and finding the MSE, RMSE , Precision , Recall , Accuracy of the model
- Concluding the best model.

**Loading Data:**

We need to upload or clone our dataset to azure notebook then we need to load it to our python file.

To load the Dataset to our ipynb file we need to run the following code.

## Pre Processing data:

Here in the above snipping, i found if there are any missing values or not. So , there are no missing values in my data set.

Now, lets check the correlation between the features and target label so that i can drop the column which is least correlated to the target label.

Here in this correlation matrix the ‘ash’ column is leastly correlated to the target label ‘name’ , So lets drop the column.

## Splitting the data into training samples and testing samples:

we just used sklearn library to split into train,test and we divided them into 70–30 ratio.

normalizing the input values of train and testing set.

## CART model:

We will first import the necessary libraries and train our decision tree model.

here is the code for it,

## Logistic regression:

**Logistic regression** is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

here’s the following code for logistic regression

## Random forest:

The **random forest** is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated **forest** of trees whose prediction by committee is more accurate than that of any individual tree.

The following is the code and accuracy

## Naive Bayes:

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

## SVM:

A **Support Vector Machine** (**SVM**) is a discriminative classifier formally **defined** by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples

## Perceptron:

A **Perceptron** is a neural network unit that does certain computations to detect features or business intelligence in the input data.

## KNN:

The k-nearest neighbors (**KNN**) algorithm is a simple, easy-to-implement supervised **machine learning** algorithm that can be used to solve both classification and regression problems

here’s the following code for it

## Conclusion:

So, finally we are successful in obtaining 96% accuracy through perceptron and logistic regression model .