Large Language Models
(Fix) Large Language Models (LLM): Are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.
Context Window: The number of words that the model considers to predict the next word in the sequence.
Fine-Tuning: The process of taking a pre-trained model and training it further on a new dataset.
Hallucination: When a model generates text that isn't relevant or coherent (aka doesn't make sense) to the input.
Inference: The process of generating predictions from a trained model.
(Fix) Transformer: A type of neural network architecture that is designed to handle sequential data.
Temperature: A hyperparameter that controls the randomness of the predictions generated by the model:
- Lower temperature values result in more deterministic predictions.
- Higher temperature values result in more random predictions.