Gemini 3.5 Flash: Google's Fastest Frontier Model Reaches General Availability

Gemini CLIMay 19, 2026

Google launched Gemini 3.5 Flash as generally available on May 19, 2026, at Google I/O 2026, positioning it as the new flagship model for agentic and coding tasks. The model surpasses Gemini 3.1 Pro on nearly all benchmarks — including Terminal-Bench 2.1 (76.2%), MCP Atlas (83.6%), and CharXiv Reasoning (84.2%) — while running 4× faster than competing frontier models. Priced at $1.50/$9.00 per million input/output tokens with a 1M-token context window, Gemini 3.5 Flash is immediately available via the Gemini API, Google AI Studio, and Google Antigravity, and becomes the default model powering the Gemini app and AI Mode in Google Search.

Sources & Mentions

5 external resources covering this update

Gemini 3.5 Flash (722 pts, 511 comments)

Hacker News

With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

TechCrunch

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Simon Willison

AINews: Google I/O 2026: Gemini 3.5 Flash, Omni, Spark, and Antigravity 2.0

Latent Space

Google unveils AI model Gemini 3.5 and AI agent Gemini Spark

CNBC

Gemini 3.5 Flash Reaches General Availability at Google I/O 2026

On May 19, 2026, Google launched Gemini 3.5 Flash as a generally available model at Google I/O 2026. The release marks a significant shift in Google's model strategy: a Flash-tier model that outperforms the previous flagship on frontier benchmarks, delivered at lower cost and dramatically higher speed.

Performance That Surpasses Gemini 3.1 Pro

Gemini 3.5 Flash outperforms Gemini 3.1 Pro across nearly all benchmark categories. On Terminal-Bench 2.1, it scores 76.2%; on MCP Atlas, 83.6%; on CharXiv Reasoning, 84.2%; and on the GDPval-AA Arena it reaches 1,656 Elo — all above Gemini 3.1 Pro's corresponding scores.

Speed is an equally important dimension. Gemini 3.5 Flash runs 4× faster than comparable frontier models in output tokens per second under standard conditions, and up to 12× faster in optimized Antigravity environments. Google's own demonstration showed 93 parallel sub-agents building a functioning operating system in 12 hours for less than $1,000 in API credits — a practical illustration of what higher throughput enables at scale.

Pricing and Context Window

The model offers a 1,048,576-token input context window and 65,536-token output limit, consistent with the Gemini 3 series. Developers pay $1.50 per million input tokens and $9.00 per million output tokens, with a 90% discount on cached inputs ($0.15 per million). Compared to Gemini 3.1 Pro ($2/$12 per million tokens), this represents roughly a 25–40% cost reduction while delivering superior benchmark performance.

That said, independent analysis (notably from Simon Willison) points out that Gemini 3.5 Flash is approximately 3–5× more expensive than Gemini 3 Flash Preview. This continues an industry-wide pattern of rising inference costs as model capability increases. Developers migrating workloads from older Flash models should revalidate their cost projections before switching.

Capabilities and Modalities

Gemini 3.5 Flash supports text, image, audio, and video inputs with text output. Dynamic thinking is enabled by default, and developers can configure four thinking levels — minimal, low, medium, and high — to tune the quality/cost tradeoff at inference time. Built-in tools include function calling, structured output, Google Search grounding, and code execution. The model carries a January 2026 knowledge cutoff.

Availability

Gemini 3.5 Flash is available immediately via:

Gemini API (Google AI Studio and direct API access using model ID gemini-3.5-flash)
Google Antigravity (the agent-first IDE and execution platform)
GitHub Copilot (generally available for all Copilot users)
Gemini Enterprise Agent Platform and Gemini Enterprise
The Gemini app — now the default model globally
AI Mode in Google Search — deployed globally

Gemini 3.5 Pro — the heavier, orchestration-focused counterpart in the 3.5 series — is used internally at Google but will not be publicly available until June 2026.

Developer Implications for Gemini CLI Users

For Gemini CLI users, gemini-3.5-flash is available immediately as a model ID in API calls and sessions. Its speed advantage is directly relevant to agentic workflows: lower per-token latency reduces end-to-end task time when the CLI orchestrates sequences of tool calls. Teams running multi-agent pipelines should evaluate whether switching from Gemini 3.1 Pro delivers a net cost reduction, given that the new model combines improved benchmark scores with lower per-token pricing.

Mentioned onHacker News TechCrunch Simon Willison Latent Space CNBC