The New Claude 3 Models Are Great, But Are They Game-Changing?

For some people...yes, but

Disclosure: I’m an investor in OpenAI (a competitor to Anthropic, which makes Claude). I’m also a fan and customer of Anthropic. They have graciously spent time with me. I’m glad they exist. These views are my own and not based on any inside information about either company.

Anthropic created a lot of buzz and excitement recently with the launch of their Claude 3 set of LLMs (Large Language Models).

I’ve been tinkering and playing with it since launch. It is indeed impressive, and there is cause for some excitement (especially for developers).

There are 3 models: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus in ascending sequence of capability. They get points for that hierarchy of naming — clever. I wonder if they had Claude help come up with it. 🙂 

Look At The Benchmarks, Baby!

Let’s first jump to the main reason people are excited:

Claude 3 Opus is the first time we have a Large Language Model that surpasses GPT-4 in capabilities across a wide variety of benchmarks. This is illustrated by the chart below.

Whether you believe in any individual benchmark or not, I think the important takeaway here is the overall trend. Yes, these numbers are published by Anthropic itself. And yes, benchmarks aren’t perfect (by any means). And yes, some benchmarks can be gameed.

But still…this is, in a word, remarkable. Just demonstrates how quickly Generative AI is evolving. We went from “These LLMs are just fancy auto-suggest” to the point that they are now measurably able to actually reason and apply logic and analysis at a pretty high level. Amazing.

From Anthropic, Inc.

See the full post about the the launch of Claude 3

Claude 3 Is Great, But Is It Game-Changing?

There are two high-level improvements that are noteworthy here. Claude 3 is “smarter” and it is faster. This is likely relevant to a small number of people — particularly developers building AI apps where those things really, really matter.

But, for the vast majority of us, Claude 3 is an important milestone along the AI journey, but not a change in direction or trajectory. It’s an incremental improvement (200,000 tokens, low latency, strong reasoning) not a breakthrough in what’s possible. We’ll likely not be switching to the consumer Claude product (and away from ChatGPT).

That’s totally OK. We need LLMs to get better/faster/smarter — both closed-source ones like Claude and OpenAI’s GPT and open source ones like LLama and Mistral (more on those in a future post).