Saturday, November 11, 2023

GPT means Generative Pre-Training

Everybody defines GPT as thus:
Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017.[2] In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training" ...
Or this:
Generative pre-trained transformers (GPT) are a type of large language model (LLM)[1][2][3] and a prominent framework for generative artificial intelligence.[4][5] They are artificial neural networks that are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content.[2][3] As of 2023, most LLMs have these characteristics[6] and are sometimes referred to broadly as GPTs.[7]
original OpenAI paper:
Improving Language Understanding by Generative Pre-Training

... We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model onadiverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks ...

As you can see, OpenAI invented GPT, and it originally stood for Generative Pre-Training.

The original GPT did use the transformer architecture, and many do regard that as critical, but I am not sure it is the T in GPT.

No comments: