Skip to main content

Deep Learning: Fundamentals

Generative modeling is a branch of Machine Learning that deals with the creation of models that can generate new data points that are similar to the training data.

Deep Learning is a class of Machine Learning algorithm that uses multiple stacked layers of processing units to learn high-level representations from unstructured data.

A Neural Network consists of a series of stacked layers. Each layer contains units, that are connected to the previous layers units through a set of weights. You'll come to find out that there are many types of layers, but the most common is the fully-connected (dense) layer that connects all units in the layer directly to every unit in the previous layer.

  • Neural Networks where all adjacent layers are fully-connected are Multi-Layer Perceptrons (MLPs).

A sequencial model is useful for quickly defining a stack of layers (i.e. where one layer follows on directly from the previous layer without any branching).

  • A Functional API is recommended.

The loss function is used by the neural-net to compare its predicted output to the ground truth. It returns a single number for each observation: the greater the number, the worse the network has performed for this observation.

  • If your neural net is designed to solved a regression problem (i.e. the output is continuous) then you might use Mean-Squared-Loss.
  • If you're working on a classification problem where weach observatioin belongs to one class âž¡ Categorical-Cross-Entropy.
  • If you're working on a binary classification problem with one output unit/neuron, or multi-label problem where each observation can belong to multiple classes simultaneously âž¡ Binary-Cross-Entropy.

The optimizer is the algorithm that will be used to update the weights in the neural net based on the gradient of the loss function.

  • Shouldn't have to tweak the parameters, except for the LR. The greater the Learning Rate, the larger in change in weights at each training step.

To train the model against the data, call the fit method.

  • batch size determines how many observations will be parsed into the network at each training step.
  • epochs determines how many times the network will be shown the full training data (trained on the entire training set).
  • If shuffle is true, batches will be drawn randomly without replacement from the training data at each training step.