What is the Algorithm?

**Support Vector Machine (SVM) **is a **supervised machine learning **algorithm. SVM’s purpose is to predict the classification of a query sample by relying on labeled input data which are separated into two group classes by using a **margin**. Specifically, the data is transformed into a higher dimension, and a support vector classifier is used as a threshold (or hyperplane) to separate the two classes with minimum error.

How Does the Algorithm Work?

Step 1: Transform training data from a low dimension into a higher dimension.

Step 2: Find a Support Vector Classifier [also called Soft Margin…

What is the Algorithm?

**K-Nearest Neighbors (KNN) **is a **supervised machine learning** and lazy learning algorithm. KNNs purpose is to predict the classification of a query sample by relying on labeled input data which are separated into several classes. One of the most popular parameters to find the optimal value for is **k, **which refers to the number of nearest neighbors to include in the majority of the voting process.

How Does the Algorithm Work?

Step 1: Determine parameter k (number of nearest neighbors).

Step 2: Calculate the distance (ex: Euclidean Distance) between the query sample and all training samples.

What is the Algorithm?

**Logistic Regression (LR) **is a **supervised machine learning** algorithm. LR’s purpose is to predict the classification of a query sample (eg. yes/no). It predicts the probability (between 0 and 1) of the action using labeled input data with the help of a sigmoid function. To determine the class outcome, a **threshold** value is selected as a cutoff for an event predicted to happen.

How Does the Algorithm Work?

Step 1: Perform linear regression on the query sample to predict the outcome as a continuous value.

*Reference **How to Improve Naive Bayes? Section 3: Tuning the Model in Python**, prior to continuing…*

Q1: Is Naive Bayes a simple or difficult classifier to understand?

Answer: Simple

Q2: Is Naive Bayes an interpretable classifier or not an interpretable classifier?

Answer: Interpretable

Q3: Is Naive Bayes a fast or slow classifier?

Answer: Fast

Q4: Can Naive Bayes handle missing data or sensitive to missing data?

Answer: Handle Missing Data

Q5: Does Naive Bayes increase in error as the number of features increases?

Answer: No Curse of Dimensionality

Q6: Is Naive Bayes more prone to overfitting or less prone to…

*Reference **How to Implement Naive Bayes? Section 2: Building the Model in Python**, prior to continuing…*

[10] Define Grid Search Parameters

`param_grid_nb = {`

'var_smoothing': np.logspace(0,-9, num=100)

}

`var_smoothing`

is a stability calculation to widen (or smooth) the curve and therefore account for more samples that are further away from the distribution mean. In this case, np.logspace returns numbers spaced evenly on a log scale, starts from 0, ends at -9, and generates 100 samples.

Why this step: To set the selected parameters used to find the optimal combination. By referencing the sklearn.naive_bayes.GaussianNB …

*Reference **What is Naive Bayes? Section 1: Defining the Model**, prior to continuing…*

[1] Import Libraries

`import numpy as np`

import matplotlib.pyplot as plt

import pandas as pd

**NumPy**is a Python library used for working with arrays.**Matplotlib**is a Python library used for creating static, animated, and interactive visualizations.**Pandas**is a Python library used for providing fast, flexible, and expressive data structures.

Why this step: **Python Libraries** are a set of useful functions that eliminate the need for writing codes from scratch, especially when developing machine learning, deep learning, data science, data visualization applications, and more!

[2]…

What is the Algorithm?

**Naive Bayes (NB) **is a **supervised machine learning **algorithm. NBs purpose is to predict the classification of a query sample by relying on labeled input data which are separated into classes. The name *naive *stems from the foundation that the algorithm is an independence assumption of the features and *bayes *stems from the foundation that the algorithm uses a statistical classification technique called **Bayes Theorem**.

How Does the Algorithm Work?

Step 1: Calculate the Prior Probability for given class labels in training data.

Step 2: Obtain Likelihood Probability with each feature attribute for each class.

Step…

*Reference **How to Improve Support Vector Machine? Section 3: Tuning the Model in Python**, prior to continuing…*

Q1: Is Support Vector Machine a simple or difficult classifier to understand?

Answer: Simple

Q2: Can Support Vector Machine solve linear problems or non-linear problems?

Answer: Linear Problems & Non-Linear Problems

Q3: Does Support Vector Machine increase in error as the number of features increases?

Answer: No Curse of Dimensionality

Q4: Can Support Vector Machine handle outliers or is sensitive to outliers?

Answer: Handle Outliers

Q5: Is Support Vector Machine a fast or slow classifier?

Answer: Slow

Q6: Can Support Vector Machine handle…

*Reference **How to Implement Support Vector Machine? Section 2: Building the Model in Python**, prior to continuing…*

[10] Define Grid Search Parameters

`param_grid_svm = {`

'C': [0.1, 1, 10, 100],

'gamma': [1, 0.1, 0.01, 0.001],

'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],

'class_weight': ['balanced']

}

`C`

is the penalty parameter of the error term; this parameter controls the trade-off between smooth decision boundary and classifying the training points correctly. Therefore, low C means low error, and high C means high error.`gamma`

is the kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’; this parameter controls the curvature weight in a decision boundary. …

*Reference **What is Support Vector Machine? Section 1: Defining the Model**, prior to continuing…*

[1] Import Libraries

`import numpy as np`

import matplotlib.pyplot as plt

import pandas as pd

**NumPy**is a Python library used for working with arrays.**Matplotlib**is a Python library used for creating static, animated, and interactive visualizations.**Pandas**is a Python library used for providing fast, flexible, and expressive data structures.

Why this step: **Python Libraries** are a set of useful functions that eliminate the need for writing codes from scratch, especially when developing machine learning, deep learning, data science, data visualization applications, and more!

…

Genentech Data Engineer | Harvard Data Science Grad | RPI Biomedical Engineer