




Mistral Large 2, developed by Mistral, offers a 128K-token context window and is priced at $3.00 per million input tokens and $9.00 per million output tokens. Released on July 24, 2024, the model scored 84.0 on the MMLU benchmark in a 5-shot evaluation, demonstrating strong performance in diverse tasks.
Web Site AI Model Web Page | |
Provider The entity that provides this model. | |
Chat Input a message to start chatting | - |
Release Date When the model was first released. | 1 year ago Jun 24, 2024 |
Modalities Types of data this model can process | text |
API Providers The providers that offer this model. (This is not an exhaustive list.) | Azure AI, AWS Bedrock, Google AI Studio, Vertex AI, Snowflake Cortex |
Knowledge Cut-off Date When the model's knowledge was last updated. | Unknown |
Open Source Whether the model's code is available for public use. | Yes |
Pricing Input Cost for processing tokens in your prompts | $3.00 per million tokens |
Pricing Output Cost for tokens generated by the model | $9.00 per million tokens |
MMLU Massive Multitask Language Understanding - Tests knowledge across 57 subjects including mathematics, history, law, and more | 84% 5-shot Source |
MMLU-Pro A more robust MMLU benchmark with harder, reasoning-focused questions, a larger choice set, and reduced prompt sensitivity | 50.69% Source |
MMMU Massive Multitask Multimodal Understanding - Tests understanding across text, images, audio, and video | Not available |
HellaSwag A challenging sentence completion benchmark | Not available |
HumanEval Evaluates code generation and problem-solving capabilities | Not available |
MATH Tests mathematical problem-solving abilities across various difficulty levels | 1.13% Source |
GPQA Tests PhD-level knowledge in chemistry, biology, and physics through multiple choice questions that require deep domain expertise | 24.94% |
IFEval Tests model's ability to accurately follow explicit formatting instructions, generate appropriate outputs, and maintain consistent instruction adherence across different tasks | 84.01% |
SimpleQA Assessing the accuracy of simple questions | - |
AIME 2024 | - |
AIME 2025 | - |
Aider Polyglot Multilingual programming benchmark. | - |
LiveCodeBench v5 Benchmark for real-time programming | - |
Global MMLU (Lite) A simplified version of the benchmark for assessing the universality of models at the global level. | - |
MathVista Evaluates the mathematical reasoning abilities of AI models within visual contexts | - |
Mobile Application | - |
Compare AI. Test. Benchmarks. Mobile Apps Chatbots, Sketch
Copyright © 2026 All Right Reserved.