Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building…

Follow publication

THE NAIVE BAYES GUIDE

What is Naive Bayes?

Kopal Jain
Analytics Vidhya
Published in
4 min readMar 11, 2021

--

What is the Algorithm?

Naive Bayes (NB) is a supervised machine learning algorithm. NBs purpose is to predict the classification of a query sample by relying on labeled input data which are separated into classes. The name naive stems from the foundation that the algorithm is an independence assumption of the features and bayes stems from the foundation that the algorithm uses a statistical classification technique called Bayes Theorem.

How Does the Algorithm Work?

Step 1: Calculate the Prior Probability for given class labels in training data.

Step 2: Obtain Likelihood Probability with each feature attribute for each class.

Step 3: Calculate Posterior Probability using Bayes Theorem.

Bayes Theorem Equation
  • P(A|B) — the probability of event A occurring, given event B has occurred [Posterior Probability]
  • P(B|A) — the probability of event B occurring, given event A has occurred [Likelihood Probability]
  • P(A) — the probability of event A [Prior Probaility of A]
  • P(B) — the probability of event B [Prior Probaility of B]

Step 4: Return class label with a higher Posterior Probability → prediction of the query sample!

Example of the Algorithm

Let’s rewrite Bayes Theorem in terms of the Naive Bayes (Gaussian) equation…

Naive Bayes (Gaussian) Equation
  • P(Class) represents the prior probability of the class (y output).
  • P(Data) represents the prior probability of the predictor (X features).
  • P(Data|Class) represents the likelihood probability of the predictor given the class.
  • P(Class|Data) represents the posterior probability of the class given the predictor.

Let’s start with a mock dataframe with…

  • 4 columns (3 features [X: x₁, x₂, x₃] and 1 output [y — Class A or Class B])
  • 10 rows (observations) where 4 of them belong to Class A and 6 of them belong to Class B

The goal of this example is to predict the class of the query sample (A or B), where it has input values of 11, 7, and 22 for Feature 1, Feature 2, and Feature 3, respectively.

Note: Since we are examining the Gaussian Naive Bayes, plot the normal (or gaussian) distribution curve of each class per feature by using the mean (μ) and standard deviation (σ).

Naive Bayes (Gaussian) Algorithm

Let’s calculate the prior probability of P(Class) using the count of each class…

  • P(Class=A) → [4 /(4+6)] = 0.40
  • P(Class=B) → [6 /(6+4)] = 0.60

Let’s calculate the prior probability of P(Data)…

  • P(Data) is not calculated in this example since the feature values are continuous instead of categorical

Let’s calculate the likelihood probability of P(Data|Class) using the Normal Distribution plot above for each feature…

  • L(Feature 1=11|Class=A) → closer to 0
  • L(Feature 2=7|Class=A) → 0.65
  • L(Feature 3=22|Class=A) → 0.05
  • L(Feature 1=11|Class=B) → 0.35
  • L(Feature 2=7|Class=B) → 0.20
  • L(Feature 3=22|Class=B) → closer to 0

Let’s calculate the posterior probability of P(Class|Data) using the Naive Bayes Equation…

  • P(Class=A|Data)

= P(Class=A) x (L(Feature 1=11|Class=A) x L(Feature 2=7|Class=A) x L(Feature 3=22|Class=A))

= [0.60 x ((closer to 0) x 0.65 x 0.05)]

Note: If a probability is a small number (close to 0), take the logₑ() of the calculation to avoid underflow.

logₑ(P(Class=A|Data) = logₑ[0.60 x ((closer to 0) x 0.65 x 0.05)]

= logₑ(0.60) + logₑ(closer to 0) + logₑ(0.65) + logₑ(0.05)

= -0.51 + -101.71 + -0.43 + -3.00 = -105.65

  • P(Class=B|Data)

= P(Class=B) x (L(Feature 1=11|Class=B) x L(Feature 2=7|Class=B) x L(Feature 3=22|Class=B))

= [0.40 x (0.35 x 0.20 x (closer to 0))]

Similarly, take the logₑ() of the calculation to avoid underflow.

logₑ(P(Class=B|Data)) = logₑ[0.40 x (0.35 x 0.20 x (closer to 0))]

= logₑ(0.40) + logₑ(0.35) + logₑ(0.20) + logₑ(closer to 0)

= -0.92 + -1.05 + -1.61 + -94.84 = -98.42

Since Class B has the higher log of posterior probability (-98.42) when compared to the log of posterior probability of Class A (-105.65) → the query sample will be predicted to be classified as Class B!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Kopal Jain
Kopal Jain

Written by Kopal Jain

Genentech Data Engineer | Harvard Data Science Grad | RPI Biomedical Engineer

No responses yet

Write a response