Our powerful and most efficient workhorse model designed for speed and low-cost.
Speed and value at scale
Ideal for tasks like summarization, chat applications, data extraction, and captioning.
-
Thinking budget
Control how much 2.5 Flash reasons to balance latency and cost.
-
Natively multimodal
Understands input across text, audio, images and video.
-
Long context
Explore vast datasets with a 1-million token context window.
Adaptive and budgeted thinking
Adaptive controls and adjustable thinking budgets allow you to balance performance and cost.
-
Calibrated
The model explores diverse thinking strategies, leading to more accurate and relevant outputs.
-
Controllable
Developers have fine-grained control over the model's thinking process, allowing them to manage resource usage.
-
Adaptive
When no thinking budget is set, the model assesses the complexity of a task and calibrates the amount of thinking accordingly.
Preview
Native audio
Converse in more expressive ways with native audio outputs that capture the subtle nuances of how we speak. Seamlessly switch between 24 languages, all with the same voice.
Benchmarks
Benchmark |
Gemini 2.5 Flash
Preview (05-20)
Thinking |
Gemini 2.0 Flash
|
OpenAI o4-mini
|
Claude 3.7 Sonnet
64k Extended thinking
|
Grok 3 Beta
Extended thinking
|
DeepSeek R1
|
|
---|---|---|---|---|---|---|---|
Input price
|
$/1M tokens | $0.15 | $0.10 | $1.10 | $3.00 | $3.00 | $0.55 |
Output price
|
$/1M tokens |
$0.60
no thinking $3.50 thinking |
$0.40 | $4.40 | $15.00 | $15.00 | $2.19 |
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
11.0% | 5.1% | 14.3% | 8.9% | — | 8.6%* | |
Science
GPQA diamond
|
single attempt (pass@1) | 82.8% | 60.1% | 81.4% | 78.2% | 80.2% | 71.5% |
|
multiple attempts | — | — | — | 84.8% | 84.6% | — |
Mathematics
AIME 2025
|
single attempt (pass@1) | 72.0% | 27.5% | 92.7% | 49.5% | 77.3% | 70.0% |
|
multiple attempts | — | — | — | — | 93.3% | — |
Code generation
LiveCodeBench v5
|
single attempt (pass@1) | 63.9% | 34.5% | — | — | 70.6% | 64.3% |
|
multiple attempts | — | — | — | — | 79.4% | — |
Code editing
Aider Polyglot
|
61.9% / 56.7%
whole / diff-fenced
|
22.2%
whole
|
68.9% / 58.2%
whole / diff
|
64.9%
diff
|
53.3%
diff
|
56.9%
diff
|
|
Agentic coding
SWE-bench Verified
|
60.4% | — | 68.1% | 70.3% | — | 49.2% | |
Factuality
SimpleQA
|
26.9% | 29.9% | — | — | 43.6% | 30.1% | |
Factuality
FACTS Grounding
|
85.3% | 84.6% | 62.1% | 78.8% | 74.8% | 56.8% | |
Visual reasoning
MMMU
|
single attempt (pass@1) | 79.7% | 71.7% | 81.6% | 75.0% | 76.0% | no MM support |
|
multiple attempts | — | — | — | — | 78.0% | no MM support |
Image understanding
Vibe-Eval (Reka)
|
65.4% | 56.4% | — | — | — | no MM support | |
Long context
MRCR v2
|
128k (average) | 74.0% | 36.0% | 49.0% | — | 54.0% | 45.0% |
|
1M (pointwise) | 32.0% | 6.0% | — | — | — | — |
Multilingual performance
Global MMLU (Lite)
|
88.4% | 83.4% | — | — | — | — |
2.0 Flash | 2.5 Flash | |
Model deployment status | General availability | Preview |
Supported data types for input | Text, Image, Video, Audio | Text, Image, Video, Audio |
Supported data types for output | Text | Text |
Supported # tokens for input | 1M | 1M |
Supported # tokens for output | 8k | 64k |
Knowledge cutoff | June 2024 | January 2025 |
Tool use |
Search as a tool Code execution |
Function calling Structured output Search as a tool Code execution |
Best for |
Low latency scenarios Automating tasks |
Cost-efficient thinking Well-rounded capabilities |
Availability |
Google AI Studio Gemini API Vertex AI Gemini App |
Google AI Studio Gemini API Vertex AI Gemini App |