Wine Classification Using Machine Learning

Pola Sumanth
4 min readOct 11, 2019

The classification techniques used here to classify the wines are:

1: CART

2: Logistic Regression

3: Random forest

4: Naive Bayes

5: Perception

6: SVM

7: KNN

About data:

Here, this blog is about classifying the wine based one certain features. There are 13 features in this data set and 1 target value i.e 14 columns and 177 samples. The columns are ‘name’,’alcohol’,’malicAcid’,’ash’,’ashalcalinity’,’magnesium’,’totalPhenols’,’flavanoids’,’nonFlavanoidPhenols’,’proanthocyanins’,’colorIntensity’,’hue’,’od280_od315',’proline’

Here the target label is name. It has 3 classes in it ‘1’, ‘2’ and ‘3’. The wine falls under either of these three classes.

Here’s the dataset

Modules involved:

  1. Loading Data
  2. pre processing our data and finding correlation among features and target label
  3. Splitting the data into training samples and testing samples
  4. Using classification techniques and finding the MSE, RMSE , Precision , Recall , Accuracy of the model
  5. Concluding the best model.

Loading Data:

We need to upload or clone our dataset to azure notebook then we need to load it to our python file.

To load the Dataset to our ipynb file we need to run the following code.

Pre Processing data:

Here in the above snipping, i found if there are any missing values or not. So , there are no missing values in my data set.

Now, lets check the correlation between the features and target label so that i can drop the column which is least correlated to the target label.

Here in this correlation matrix the ‘ash’ column is leastly correlated to the target label ‘name’ , So lets drop the column.

Splitting the data into training samples and testing samples:

we just used sklearn library to split into train,test and we divided them into 70–30 ratio.

normalizing the input values of train and testing set.

CART model:

We will first import the necessary libraries and train our decision tree model.

here is the code for it,

Logistic regression:

Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

here’s the following code for logistic regression

Random forest:

The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

The following is the code and accuracy

Naive Bayes:

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

SVM:

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples

Perceptron:

A Perceptron is a neural network unit that does certain computations to detect features or business intelligence in the input data.

KNN:

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems

here’s the following code for it

Conclusion:

So, finally we are successful in obtaining 96% accuracy through perceptron and logistic regression model .

--

--