




OpenAI o4-mini is the newest lightweight model in the o-series, engineered for efficient and capable reasoning across text and visual tasks. Optimized for speed and performance, it excels in code generation and image-based understanding, while maintaining a balance between latency and reasoning depth. The model supports a 200,000-token context window with up to 100,000 output tokens, making it suitable for extended, high-volume interactions. It handles both text and image inputs, producing textual outputs with advanced reasoning capabilities. With its compact architecture and versatile performance, o4-mini is ideal for a wide array of real-world applications demanding fast, cost-effective intelligence.
Web Site AI Model Web Page | |
Provider The entity that provides this model. | |
Chat Input a message to start chatting | |
Release Date When the model was first released. | 1 year ago Apr 16, 2025 |
Modalities Types of data this model can process | text images |
API Providers The providers that offer this model. (This is not an exhaustive list.) | OpenAI API |
Knowledge Cut-off Date When the model's knowledge was last updated. | - |
Open Source Whether the model's code is available for public use. | No |
Pricing Input Cost for processing tokens in your prompts | $1.10 per million tokens |
Pricing Output Cost for tokens generated by the model | $4.40 per million tokens |
MMLU Massive Multitask Language Understanding - Tests knowledge across 57 subjects including mathematics, history, law, and more | fort |
MMLU-Pro A more robust MMLU benchmark with harder, reasoning-focused questions, a larger choice set, and reduced prompt sensitivity | - |
MMMU Massive Multitask Multimodal Understanding - Tests understanding across text, images, audio, and video | 81.6% Source |
HellaSwag A challenging sentence completion benchmark | - |
HumanEval Evaluates code generation and problem-solving capabilities | 14.28% Source |
MATH Tests mathematical problem-solving abilities across various difficulty levels | - |
GPQA Tests PhD-level knowledge in chemistry, biology, and physics through multiple choice questions that require deep domain expertise | 81.4% Source |
IFEval Tests model's ability to accurately follow explicit formatting instructions, generate appropriate outputs, and maintain consistent instruction adherence across different tasks | - |
SimpleQA Assessing the accuracy of simple questions | - |
AIME 2024 | 93.4% Source |
AIME 2025 | 92.7% Source |
Aider Polyglot Multilingual programming benchmark. | - |
LiveCodeBench v5 Benchmark for real-time programming | - |
Global MMLU (Lite) A simplified version of the benchmark for assessing the universality of models at the global level. | - |
MathVista Evaluates the mathematical reasoning abilities of AI models within visual contexts | - |
Mobile Application | |
MathArena | |
| Avg. Score | 87% |
| AIME 2025 A test based on problems from the American Invitational Mathematics Examination, designed to assess the mathematical skills of models. | 92% |
| HMMT February 2025 A test based on problems from the Harvard-MIT Mathematics Tournament, February 2025, designed to assess the mathematical skills of models. | 83% |
| BRUMO 2025 | 87% |
| SMT 2025 A test based on problems from the Stanford Math Tournament, 2025, designed to assess the mathematical skills of models. | 89% |
| CMIMC 2025 A test based on problems from the Canadian Mathematical Olympiad, 2025, designed to assess the mathematical skills of models. | 84% |
Compare AI. Test. Benchmarks. Mobile Apps Chatbots, Sketch
Copyright © 2026 All Right Reserved.