LLM Benchmarks Dashboard

Benchmarks

Choose a benchmark to view description and bar chart results per model.

MRCR Reasoning & comprehension benchmark. View MRCR →
Tau2Bench Multi-turn instruction following. View Tau2Bench →
VitaBench Vision-text alignment & grounding. View VitaBench →
MultiChallenge Compositional, multi-skill tasks. View MultiChallenge →
IFBench Instruction following and fidelity. View IFBench →

Agentic Index

Average of MRCR, Tau2Bench, VitaBench, MultiChallenge, IFBench.

Agentic Index Per-model mean across all benchmarks. View Agentic Index →