# What is Support Vector Machine?

## Section 1: Defining the Model

What is the Algorithm?

Support Vector Machine (SVM) is a supervised machine learning algorithm. SVM’s purpose is to predict the classification of a query sample by relying on labeled input data which are separated into two group classes by using a margin. Specifically, the data is transformed into a higher dimension, and a support vector classifier is used as a threshold (or hyperplane) to separate the two classes with minimum error.

How Does the Algorithm Work?

Step 1: Transform training data from a low dimension into a higher dimension.

Step 2: Find a Support Vector Classifier [also called Soft Margin…

# What are K-Nearest Neighbors?

## Section 1: Defining the Model

What is the Algorithm?

K-Nearest Neighbors (KNN) is a supervised machine learning and lazy learning algorithm. KNNs purpose is to predict the classification of a query sample by relying on labeled input data which are separated into several classes. One of the most popular parameters to find the optimal value for is k, which refers to the number of nearest neighbors to include in the majority of the voting process.

How Does the Algorithm Work?

Step 1: Determine parameter k (number of nearest neighbors).

Step 2: Calculate the distance (ex: Euclidean Distance) between the query sample and all training samples.

# What is Logistic Regression?

## Section 1: Defining the Model

What is the Algorithm?

Logistic Regression (LR) is a supervised machine learning algorithm. LR’s purpose is to predict the classification of a query sample (eg. yes/no). It predicts the probability (between 0 and 1) of the action using labeled input data with the help of a sigmoid function. To determine the class outcome, a threshold value is selected as a cutoff for an event predicted to happen.

How Does the Algorithm Work?

Step 1: Perform linear regression on the query sample to predict the outcome as a continuous value.

# Why use Naive Bayes?

## Section 4: Evaluating the Model Tradeoffs

Reference How to Improve Naive Bayes? Section 3: Tuning the Model in Python, prior to continuing…

# A D V A N T A G E S

Q1: Is Naive Bayes a simple or difficult classifier to understand?

Q2: Is Naive Bayes an interpretable classifier or not an interpretable classifier?

Q3: Is Naive Bayes a fast or slow classifier?

Q4: Can Naive Bayes handle missing data or sensitive to missing data?

Q5: Does Naive Bayes increase in error as the number of features increases?

Q6: Is Naive Bayes more prone to overfitting or less prone to…

# How to Improve Naive Bayes?

## Section 3: Tuning the Model in Python

Reference How to Implement Naive Bayes? Section 2: Building the Model in Python, prior to continuing…

 Define Grid Search Parameters

`param_grid_nb = {    'var_smoothing': np.logspace(0,-9, num=100)}`
• `var_smoothing` is a stability calculation to widen (or smooth) the curve and therefore account for more samples that are further away from the distribution mean. In this case, np.logspace returns numbers spaced evenly on a log scale, starts from 0, ends at -9, and generates 100 samples.

Why this step: To set the selected parameters used to find the optimal combination. By referencing the sklearn.naive_bayes.GaussianNB

# How to Implement Naive Bayes?

## Section 2: Building the Model in Python

Reference What is Naive Bayes? Section 1: Defining the Model, prior to continuing…

 Import Libraries

`import numpy as npimport matplotlib.pyplot as pltimport pandas as pd`
• NumPy is a Python library used for working with arrays.
• Matplotlib is a Python library used for creating static, animated, and interactive visualizations.
• Pandas is a Python library used for providing fast, flexible, and expressive data structures.

Why this step: Python Libraries are a set of useful functions that eliminate the need for writing codes from scratch, especially when developing machine learning, deep learning, data science, data visualization applications, and more!

…

# What is Naive Bayes?

## Section 1: Defining the Model

What is the Algorithm?

Naive Bayes (NB) is a supervised machine learning algorithm. NBs purpose is to predict the classification of a query sample by relying on labeled input data which are separated into classes. The name naive stems from the foundation that the algorithm is an independence assumption of the features and bayes stems from the foundation that the algorithm uses a statistical classification technique called Bayes Theorem.

How Does the Algorithm Work?

Step 1: Calculate the Prior Probability for given class labels in training data.

Step 2: Obtain Likelihood Probability with each feature attribute for each class.

Step…

# Why use Support Vector Machine?

## Section 4: Evaluating the Model Tradeoffs

Reference How to Improve Support Vector Machine? Section 3: Tuning the Model in Python, prior to continuing…

# A D V A N T A G E S

Q1: Is Support Vector Machine a simple or difficult classifier to understand?

Q2: Can Support Vector Machine solve linear problems or non-linear problems?

Answer: Linear Problems & Non-Linear Problems

Q3: Does Support Vector Machine increase in error as the number of features increases?

Q4: Can Support Vector Machine handle outliers or is sensitive to outliers?

# D I S A D V A N T A G E S

Q5: Is Support Vector Machine a fast or slow classifier?

Q6: Can Support Vector Machine handle…

# How to Improve Support Vector Machine?

## Section 3: Tuning the Model in Python

Reference How to Implement Support Vector Machine? Section 2: Building the Model in Python, prior to continuing…

 Define Grid Search Parameters

`param_grid_svm = {    'C': [0.1, 1, 10, 100],                       'gamma': [1, 0.1, 0.01, 0.001],                    'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],    'class_weight': ['balanced']                    }`
• `C` is the penalty parameter of the error term; this parameter controls the trade-off between smooth decision boundary and classifying the training points correctly. Therefore, low C means low error, and high C means high error.
• `gamma` is the kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’; this parameter controls the curvature weight in a decision boundary. …

# How to Implement Support Vector Machine?

## Section 2: Building the Model in Python

Reference What is Support Vector Machine? Section 1: Defining the Model, prior to continuing…

 Import Libraries

`import numpy as npimport matplotlib.pyplot as pltimport pandas as pd`
• NumPy is a Python library used for working with arrays.
• Matplotlib is a Python library used for creating static, animated, and interactive visualizations.
• Pandas is a Python library used for providing fast, flexible, and expressive data structures.

Why this step: Python Libraries are a set of useful functions that eliminate the need for writing codes from scratch, especially when developing machine learning, deep learning, data science, data visualization applications, and more! ## Kopal Jain

Genentech Data Engineer | Harvard Data Science Grad | RPI Biomedical Engineer