Claude 3 Outperforms GPT-4 in Chatbot Arena

Anthropic’s large language model (LLM), Claude 3 Opus, has surpassed OpenAI’s GPT-4 in Chatbot Arena, a popular leaderboard used by AI researchers to gauge the relative capabilities of AI language models. This is the first time GPT-4 has been outperformed since the launch of Chatbot Arena in May 2023.

The Chatbot Arena presents a user with a chat input box and two windows showing output from two unlabeled LLMs. The user’s task is to rate which output is better based on any criteria they deem fit. Through thousands of these subjective comparisons, Chatbot Arena calculates the “best” models in aggregate and populates the leaderboard, updating it over time.

Anthropic’s smaller model, Haiku, has also been turning heads with its performance on the leaderboard. Independent AI researcher Simon Willison noted that for the first time, the best available models are from a vendor that isn’t OpenAI. This is reassuring as it indicates a diversity of top vendors in this space. However, it took a year for anyone else to catch up to GPT-4, which is over a year old at this point.

The defeat of GPT-4 in the Arena marks a notable moment in the relatively short history of AI language models. The leaderboard is run by the Large Model Systems Organization (LMSYS ORG), a research organization dedicated to open models that operate as a collaboration between students and faculty at UC Berkeley, University of California San Diego, and Carnegie Mellon University.

Claude 3 Outperforms GPT-4 in Chatbot Arena

Related articles: