LLM Benchmarks - qq3dworld.com

Benchmarks

Choose a benchmark to view description and bar chart results per model.

MRCR Reasoning & comprehension benchmark. View MRCR →

Tau2Bench Multi-turn instruction following. View Tau2Bench →

VitaBench Vision-text alignment & grounding. View VitaBench →

MultiChallenge Compositional, multi-skill tasks. View MultiChallenge →

IFBench Instruction following and fidelity. View IFBench →

Average of MRCR, Tau2Bench, VitaBench, MultiChallenge, IFBench.

Agentic Index Per-model mean across all benchmarks. View Agentic Index →