Meta Releases Llama 3, Claiming It’s Among the Best Open Models Available

Meta has released the latest entry in its Llama series of open generative AI models: Llama 3. The company has debuted two models in its new Llama 3 family, with the rest to come at an unspecified future date. The new models — Llama 3 8B, which contains 8 billion parameters, and Llama 3 70B, which contains 70 billion parameters — are described as a “major leap” compared to the previous generation.

Meta supports its claim of Llama 3’s superiority by pointing to the models’ scores on popular AI benchmarks like MMLU (which attempts to measure knowledge), ARC (which attempts to measure skill acquisition), and DROP (which tests a model’s reasoning over chunks of text). Llama 3 8B bests other open models such as Mistral’s and Google’s, both of which contain 7 billion parameters, on at least nine benchmarks.

The larger-parameter-count Llama 3 model, Llama 3 70B, is competitive with flagship generative AI models, including Gemini 1.5 Pro, the latest in Google’s Gemini series. Llama 3 70B beats Gemini 1.5 Pro on MMLU, HumanEval, and GSM-8K. While it doesn’t rival Anthropic’s most performant model, Claude 3 Opus, Llama 3 70B scores better than the second-weakest model in the Claude 3 series, Claude 3 Sonnet, on five benchmarks.

Meta also developed its own test set covering use cases ranging from coding and creative writing to reasoning to summarization, and Llama 3 70B came out on top against Mistral’s Mistral Medium model, OpenAI’s GPT-3.5, and Claude Sonnet.

More qualitatively, Meta says that users of the new Llama models should expect more “steerability,” a lower likelihood to refuse to answer questions, and higher accuracy on trivia questions, questions pertaining to history and STEM fields such as engineering and science, and general coding recommendations. This is in part thanks to a much larger dataset: a collection of 15 trillion tokens, or a mind-boggling ~750,000,000,000 words — seven times the size of the Llama 2 training set.

NIMBUS27

Read more at: techcrunch.com