Wine Classification Using Machine Learning

The classification techniques used here to classify the wines are:

1: CART

2: Logistic Regression

3: Random forest

4: Naive Bayes

5: Perception

6: SVM

7: KNN

Here, this blog is about classifying the wine based one certain features. There are 13 features in this data set and 1 target value i.e 14 columns and 177 samples. The columns are ‘name’,’alcohol’,’malicAcid’,’ash’,’ashalcalinity’,’magnesium’,’totalPhenols’,’flavanoids’,’nonFlavanoidPhenols’,’proanthocyanins’,’colorIntensity’,’hue’,’od280_od315',’proline’

Here the target label is name. It has 3 classes in it ‘1’, ‘2’ and ‘3’. The wine falls under either of these three classes.

Here’s the dataset

  1. Loading Data
  2. pre processing our data and finding correlation among features and target label
  3. Splitting the data into training samples and testing samples
  4. Using classification techniques and finding the MSE, RMSE , Precision , Recall , Accuracy of the model
  5. Concluding the best model.

We need to upload or clone our dataset to azure notebook then we need to load it to our python file.

To load the Dataset to our ipynb file we need to run the following code.

Here in the above snipping, i found if there are any missing values or not. So , there are no missing values in my data set.

Now, lets check the correlation between the features and target label so that i can drop the column which is least correlated to the target label.

Here in this correlation matrix the ‘ash’ column is leastly correlated to the target label ‘name’ , So lets drop the column.

we just used sklearn library to split into train,test and we divided them into 70–30 ratio.

normalizing the input values of train and testing set.

We will first import the necessary libraries and train our decision tree model.

here is the code for it,

Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

here’s the following code for logistic regression

The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

The following is the code and accuracy

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples

A Perceptron is a neural network unit that does certain computations to detect features or business intelligence in the input data.

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems

here’s the following code for it

So, finally we are successful in obtaining 96% accuracy through perceptron and logistic regression model .

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store