Benchmarks
Choose a benchmark to view description and bar chart results per model.
MRCR
Reasoning & comprehension benchmark.
View MRCR →
Tau2Bench
Multi-turn instruction following.
View Tau2Bench →
VitaBench
Vision-text alignment & grounding.
View VitaBench →
MultiChallenge
Compositional, multi-skill tasks.
View MultiChallenge →
IFBench
Instruction following and fidelity.
View IFBench →
Agentic Index
Average of MRCR, Tau2Bench, VitaBench, MultiChallenge, IFBench.
Agentic Index
Per-model mean across all benchmarks.
View Agentic Index →