GPT-3 was recently released and has been trending in different circles.
Generative Pretrained Transformer 3, commonly known by its abbreviated form GPT-3, is an unsupervised Transformer language model and the successor to GPT-2. It was first described in May 2020. OpenAI stated that full version of GPT-3 contains 175 billion parameters, two orders of magnitude larger than the 1.5 billion parameters in the full version of GPT-2 (although GPT-3 models with as few as 125 million parameters were also trained).
OpenAI stated that GPT-3 succeeds at certain "meta-learning" tasks. It can generalize the purpose of a single input-output pair. The paper gives an example of translation and cross-linguistic transfer learning between English and Romanian, and between English and German.
GPT-3 dramatically improved benchmark results over GPT-2. OpenAI cautioned that such scaling up of language models could be approaching or encountering fundamental capability limitations. Pre-training GPT-3 required several thousand petaflop/s-days of compute, compared to tens of petaflop/s-days for the full GPT-2 model…
Can anyone offer their most straightforward explanation for some of the concepts involved?