



Web Site AI Model Web Page | |
Provider The entity that provides this model. | |
Chat Input a message to start chatting | |
Release Date When the model was first released. | 1 year ago Apr 29, 2025 |
Modalities Types of data this model can process | - |
API Providers The providers that offer this model. (This is not an exhaustive list.) | - |
Knowledge Cut-off Date When the model's knowledge was last updated. | - |
Open Source Whether the model's code is available for public use. | Yes (Source) |
Pricing Input Cost for processing tokens in your prompts | - |
Pricing Output Cost for tokens generated by the model | - |
MMLU Massive Multitask Language Understanding - Tests knowledge across 57 subjects including mathematics, history, law, and more | - |
MMLU-Pro A more robust MMLU benchmark with harder, reasoning-focused questions, a larger choice set, and reduced prompt sensitivity | - |
MMMU Massive Multitask Multimodal Understanding - Tests understanding across text, images, audio, and video | - |
HellaSwag A challenging sentence completion benchmark | - |
HumanEval Evaluates code generation and problem-solving capabilities | - |
MATH Tests mathematical problem-solving abilities across various difficulty levels | - |
GPQA Tests PhD-level knowledge in chemistry, biology, and physics through multiple choice questions that require deep domain expertise | - |
IFEval Tests model's ability to accurately follow explicit formatting instructions, generate appropriate outputs, and maintain consistent instruction adherence across different tasks | - |
SimpleQA Assessing the accuracy of simple questions | - |
AIME 2024 | Source |
AIME 2025 | Source |
Aider Polyglot Multilingual programming benchmark. | - |
LiveCodeBench v5 Benchmark for real-time programming | - |
Global MMLU (Lite) A simplified version of the benchmark for assessing the universality of models at the global level. | - |
MathVista Evaluates the mathematical reasoning abilities of AI models within visual contexts | - |
Mobile Application | - |
Compare AI. Test. Benchmarks. Mobile Apps Chatbots, Sketch
Copyright © 2026 All Right Reserved.