Machine Learning: Fundamentals

Topics Covered

1.1 Introduction
1.2 Feature Type Representation
1.3 Data Quality
1.4 Natural Language Processing

1.1 Introduction

What is Machine Learning?

ML is a type of a.i. that provides computers w/ the ability to learn w/o being explicitily programmed.
ML focuses on the development of computer programs that can change when exposed to new data
Beware: full of imprecise words that play on our understanding of "learning" & consciousness

Data Types

Nominal
- Feature Type: Discrete
- Transformation: Any **permutation of values
- Representation: One hot encoding
- Comments: If all employeeID #'s were re-assigned, would it make a difference?
** permuation - when the order DOES matter

Ordinal
- Feature Type: Discrete
- Transformation:An order preserving change of values, i.e., new_value - f(old_value) where f is an *monotonic function.
- Representation: Integer
- Comments: An attribute encompassing the notion of good, better, best can be represented equally well by the values {1,2,3} or by {0.5, 1, 10}
**monotonic - is a function that is neigher non-decreasing/increasing

Interval
- Feature Type: continuous
- Transformation: new_value = a # old_value + b, where a & b are costants
- Representation: float
- Comments: Thus the $F^{o}$ & $C^{o}$ temperature scales different in terms of where their zero value is & the size of a unit (degree)

Ratio
- Feature Type: continuous
- Transformation: new_value = a * old_value
- Representation: float
- Comments: length can be measure in meters or feet

One Hot Encoding (see image below) is a process which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

Tensor refers to some multi-dimensional array of data.

It has 4 main attributes:

Rank: # of dimensions
Shape: # of elements in each dimension
Data Type: type of data in the tensor
Device: where the tensor is stored (CPU/GPU)

Example:

two_dim_tensor = [
    [1,2,3],
    [4,5,6]
]

Batch Size

The batch size is the # of samples that will be passed through to the network at one time.
"batch" aka "mini-batch"
Larger batches = faster training.

Epoch

#' of times you give data to the neural network
In term's of artificial neural net's, an epoch refers to 1 cycle through the full training dataset.
In other word's if we feed a neural net the training data for more than 1 epoch in different pattern's, we hope for a better generalization when given a new "unseen" input.
An iteration is the # of batches or steps through partioned packets of the training data needed to complete 1 epoch

Batch Size Vs. Epoch

Given 1,000 pictures of dogs: batch size = 10 & 1 epoch = 100 batches

Objective Function

The mathematical formula or metric that a model aims to optimize (reduce the error)
- Ex: The obj. function for linear regression is usally squared loss. Therefore when training a linear regression model, the goal is to minimize squared loss.
In some cases the goal is to maximize the obj. function (ex: if the obj func is accuracy, the goal is to maximize accuracy)

Activation Function

A function (ex: ReLu/sigmoid) that takes in the weighted sum of all the inputs from the previous layer & then generates & passes an output values (typically nonlinear) to the next layer

Normalization

The process of converting an actual range of values into a standard range of values typically -1 to +1 or 0 to 1.
- Ex: Suppose the natural range of a certain feature is 800 to 8,000. Through substraction & division you can normalize these values into the range -1 to +1.

Orthogonal

Any of the 2 rows, if 1 were to multiply them together, the covariance is just it's only on the diagonal
So like our eigenvecors, they're all orthogonal, if we put them together in a matrix that would be an orthogonal matrix.

Cost-Sensitive Measures

Binary Classification
- Precision: $(p) = \frac{a}{a+c}$
- Recall: $(r) = \frac{a}{a+b}$
- F-measure: $(F) = \frac{2+p}{r+p} = \frac{2a}{2a+b+c}$ --> Higher F1 == lower false neg. & false pos.
Multi-Class
- Micro
  - $Precision_{micro} = \frac{TP_{1}+...+TP_{k}}{TP_{1}+...+TP_{k}+FP_{1}+...+FP_{k}}$
  - $Recall_{micro} = \frac{TP_{1}+...+TP_{k}}{TP_{1}+...+TP_{k}+FN_{1}+...+Fn_{k}}$
  - $F1_{micro} = \frac{1}{2(TP_{1}+...+TP_{k})+FP_{1}+...+FP_{k}+FN_{1}+...+Fn_{k}}$
- Macro
  - $X_{macro} = \frac{X_{1}+...+X_{k}}{k}$

Machine Learning: Fundamentals

Topics Covered​

1.1 Introduction​

Data Types​

Topics Covered

1.1 Introduction

Data Types