Large Language Models Benchmarks

Italian Benchmark Evaluates Large Language Models, Includes AI Translation

A new community-driven initiative evaluates large language models using Italian-native tasks, with AI translation among the ...

5don MSN

Yann LeCun: Meta 'fudged a little bit' when benchmark-testing Llama 4 model

The testing sparked internal frustration about the progress of the Llama models. Yann LeCun, Meta’s outgoing chief AI ...

MiroMind’s MiroThinker 1.5 delivers trillion-parameter performance from a 30B model — at 1/20th the cost

Joining the ranks of a growing number of smaller, powerful reasoning models is MiroThinker 1.5 from MiroMind, with just 30 ...

Becker's Hospital Review

AI misrepresents medical risk terms: Study

Large language models frequently misrepresent verbal risk terms used in medicine, potentially amplifying patient misunderstandings and diverging from established clinical definitions, according to a ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

6don MSN

Another Chinese quant fund joins DeepSeek in AI race with model rivalling GPT-5.1, Claude

Beijing-based Ubiquant launches code-focused systems claiming benchmark wins over US peers despite using far fewer parameters ...

Fox21Online

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source models move beyond chat-based applications and into production ...

12d

Unlocking Business Value With Open-Weight Large Language Models

Open-weight LLMs can unlock significant strategic advantages, delivering customization and independence in an increasingly AI ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results