Skip to main content

Large Language Models

(Fix) Large Language Models (LLM): Are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

Context Window: The number of words that the model considers to predict the next word in the sequence.

Fine-Tuning: The process of taking a pre-trained model and training it further on a new dataset.

Hallucination: When a model generates text that isn't relevant or coherent (aka doesn't make sense) to the input.

Inference: The process of generating predictions from a trained model.

(Fix) Transformer: A type of neural network architecture that is designed to handle sequential data.

Temperature: A hyperparameter that controls the randomness of the predictions generated by the model:

  • Lower temperature values result in more deterministic predictions.
  • Higher temperature values result in more random predictions.