Skip to main content

Machine Learning: Fundamentals

Topics Covered​

  • 1.1 Introduction
  • 1.2 Feature Type Representation
  • 1.3 Data Quality
  • 1.4 Natural Language Processing

1.1 Introduction​

What is Machine Learning?
  • ML is a type of a.i. that provides computers w/ the ability to learn w/o being explicitily programmed.
  • ML focuses on the development of computer programs that can change when exposed to new data
  • Beware: full of imprecise words that play on our understanding of "learning" & consciousness

Data Types​

  • Nominal
    • Feature Type: Discrete
    • Transformation: Any **permutation of values
    • Representation: One hot encoding
    • Comments: If all employeeID #'s were re-assigned, would it make a difference?

    ** permuation - when the order DOES matter

  • Ordinal
    • Feature Type: Discrete
    • Transformation:An order preserving change of values, i.e., new_value - f(old_value) where f is an *monotonic function.
    • Representation: Integer
    • Comments: An attribute encompassing the notion of good, better, best can be represented equally well by the values {1,2,3} or by {0.5, 1, 10}

    **monotonic - is a function that is neigher non-decreasing/increasing

  • Interval
    • Feature Type: continuous
    • Transformation: new_value = a # old_value + b, where a & b are costants
    • Representation: float
    • Comments: Thus the FoF^{o} & CoC^{o} temperature scales different in terms of where their zero value is & the size of a unit (degree)
  • Ratio
    • Feature Type: continuous
    • Transformation: new_value = a * old_value
    • Representation: float
    • Comments: length can be measure in meters or feet

One Hot Encoding (see image below) is a process which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.


  • Tensor refers to some multi-dimensional array of data.

It has 4 main attributes:

  • Rank: # of dimensions
  • Shape: # of elements in each dimension
  • Data Type: type of data in the tensor
  • Device: where the tensor is stored (CPU/GPU)

Example:

two_dim_tensor = [
[1,2,3],
[4,5,6]
]

Batch Size
  • The batch size is the # of samples that will be passed through to the network at one time.
  • "batch" aka "mini-batch"
  • Larger batches = faster training.
Epoch
  • #' of times you give data to the neural network
  • In term's of artificial neural net's, an epoch refers to 1 cycle through the full training dataset.
  • In other word's if we feed a neural net the training data for more than 1 epoch in different pattern's, we hope for a better generalization when given a new "unseen" input.
  • An iteration is the # of batches or steps through partioned packets of the training data needed to complete 1 epoch
Batch Size Vs. Epoch
  • Given 1,000 pictures of dogs: batch size = 10 & 1 epoch = 100 batches

Objective Function
  • The mathematical formula or metric that a model aims to optimize (reduce the error)
    • Ex: The obj. function for linear regression is usally squared loss. Therefore when training a linear regression model, the goal is to minimize squared loss.
  • In some cases the goal is to maximize the obj. function (ex: if the obj func is accuracy, the goal is to maximize accuracy)
Activation Function
  • A function (ex: ReLu/sigmoid) that takes in the weighted sum of all the inputs from the previous layer & then generates & passes an output values (typically nonlinear) to the next layer

Normalization
  • The process of converting an actual range of values into a standard range of values typically -1 to +1 or 0 to 1.
    • Ex: Suppose the natural range of a certain feature is 800 to 8,000. Through substraction & division you can normalize these values into the range -1 to +1.
Orthogonal
  • Any of the 2 rows, if 1 were to multiply them together, the covariance is just it's only on the diagonal
  • So like our eigenvecors, they're all orthogonal, if we put them together in a matrix that would be an orthogonal matrix.

Cost-Sensitive Measures
  • Binary Classification
    • Precision: (p)=aa+c(p) = \frac{a}{a+c}
    • Recall: (r)=aa+b(r) = \frac{a}{a+b}
    • F-measure: (F)=2+pr+p=2a2a+b+c(F) = \frac{2+p}{r+p} = \frac{2a}{2a+b+c} --> Higher F1 == lower false neg. & false pos.
  • Multi-Class
    • Micro

      • Precisionmicro=TP1+...+TPkTP1+...+TPk+FP1+...+FPkPrecision_{micro} = \frac{TP_{1}+...+TP_{k}}{TP_{1}+...+TP_{k}+FP_{1}+...+FP_{k}}
      • Recallmicro=TP1+...+TPkTP1+...+TPk+FN1+...+FnkRecall_{micro} = \frac{TP_{1}+...+TP_{k}}{TP_{1}+...+TP_{k}+FN_{1}+...+Fn_{k}}
      • F1micro=12(TP1+...+TPk)+FP1+...+FPk+FN1+...+FnkF1_{micro} = \frac{1}{2(TP_{1}+...+TP_{k})+FP_{1}+...+FP_{k}+FN_{1}+...+Fn_{k}}
    • Macro

      • Xmacro=X1+...+XkkX_{macro} = \frac{X_{1}+...+X_{k}}{k}