Claude 2.1 AI model with 200K Context is Live

#llm #openai #anthropic #ai

...and available as both a UI chatbot and API.

Yesterday Claude 2 model received an update doubling its context window size to 200K. Here's the official intro by Anthropic. This happened 12 days after Open AI introduced GPT-4 Turbo which was upgraded from 32K to 128K context.

Is it a lot?

Following up on my recent post translating context window size from tokens to different kinds of artefacts. Here's an updated table:

Artefact	Tokens	# in 200k
Tweet	76	2,632
Book page (Robinson Crusoe)	282	709
Google Results Page (copy/paste, txt)	975	205
StackOverflow Question (copy/paste, txt)	947	211
apple.com(copy/paste, txt)	997	201
StackOverflow Question (Markdown)	1037	193
Blog post (Markdown)	4572	44
Linux Kernel average source file	5205	38
SCRUM Guide (2020)	5440	37
Wikipedia page (Albania, copy/paste, txt)	42492	4.7
Wikipedia page (Albania, source, wikitext)	76462	2.6
apple.com(source, HTML)	92091	2.2
“The Lean Startup” book	113264	1.8
128K Context	128000	1.6
PM BoK (4th edition, 2008)	228880	0.87
Google Results Page (source, HTML)	246781	0.81
Linux Kernel largest source file	69039016	0.0029

Context Size timeline

Here's a quick rundown of model release dates and context windows:

November 21, 2023 : Claude 2.1 - 200K
November 6, 2023 : GPT-4 Turbo - 128K
June 12, 2023 : gpt-3.5-turbo-0613 - 16K
May 11, 2023 : Claude100K - 100K
March 14, 2023 : GPT4 - 8K and 32K
March 14, 2023 : Claude - 9K
November 30, 2022 : ChatGPT/GPT3.5 - 4K
June 11, 2020 : GPT3 - 2K

P.S.:

On a side matter. I'm fascinated by how little progress in internet search has been made. Both Bing AI and Google Bard produce complete nonsense should your request require more than a few top results to produce a meaningful result :)

Top comments (3)

Ranjan Dailata • Nov 22 '23

What a competition between the Open AI and Claude. Looks like they are battling on how to beat with each other on solving the context window limits :)

Maxim Saplin • Nov 22 '23

Context window size is an easy to understand and compete on metric, like the CPU frequency in the old days)

Ranjan Dailata • Nov 22 '23

Sorry, there are much more hidden things with the Max Token, there's research going on with the Sliding Window based token generation, however at the moment, it's impossible to build an LLM with the infinite context window, the LLM would go wild and loose context and won't be able to generate the next word per say as per the statistical next word prediction.

More research is required in this space, and it can be done by the dedicated LLM vendors such as Open AI, Anthropic, Cohere etc.