Gemini 3 Flash takes the throne: Google's new model challenges OpenAI with speed, economy, and uncompromising performance

2026-01-12 08:28:33

Google has officially launched Gemini 3 Flash, positioning it as the default model across the entire global Gemini platform. The debut marks a significant acceleration in the technological race with OpenAI, with the Mountain View company already processing over 1 trillion tokens per day through its API.

Performance that amazes: the numbers speak for themselves

What makes this launch interesting is not just speed— a theme that will recur frequently— but the benchmarks that dispel any hesitation. In the Humanity’s Last Exam test, Gemini 3 Flash scored 33.7%, virtually on par with GPT-5.2 (34.5%) and not far from Google’s Pro (37.5%). The difference? Flash costs much less and is faster.

But the real knockout comes with MMMU-Pro, the multimodal reasoning benchmark where the new model crushes the competition with 81.2%, literally surpassing all competitors. These are not just numbers on paper: they mean you can upload a video, an audio, a drawing, and get sophisticated responses without waiting minutes.

Speed as a competitive weapon: the speed selector at users’ service

Google has deliberately emphasized a crucial aspect: the new model is three times faster than Gemini 2.5 Pro. It’s not just a technical metric; it’s a tangible experience. The Gemini app now offers an implicit speed selector: you can use Flash for almost everything—video analysis, data extraction, visual reasoning—without compromising quality, or select the Pro model for advanced programming questions or complex mathematics.

This flexibility is strategically designed. For reasoning tasks, the new model consumes 30% fewer tokens compared to 2.5 Pro, translating into concrete savings for companies even if the token price has slightly increased.

The price tells a story: economic efficiency

Gemini 3 Flash costs $0.50 per 1 million input tokens and $3.00 per 1 million output tokens, compared to $0.30 and $2.50 of the previous model. The obvious question: why pay more?

The answer lies in the combined speed and efficiency. If the model is three times faster and uses 30% fewer tokens for certain tasks, the overall cost per transaction could actually decrease. Tulsee Doshi, Senior Director of Product for Gemini, emphasized that “Flash is the workhorse model” for companies handling massive volumes of requests. It’s not the smartest model; it’s the most economically smart model.

Already in production: JetBrains, Figma, Harvey are not waiting

Google doesn’t talk about future possibilities but about current realities. Companies like JetBrains, Figma, Cursor, Harvey, and Latitude are already leveraging Gemini 3 Flash via Vertex AI and Gemini Enterprise. For developers, the model is available in preview via API and in Antigravity, the programming tool launched last month.

In the verified SWE-bench benchmark for programming, the model achieves 78%, second only to GPT-5.2. This means it is sufficiently sophisticated for real coding tasks, even if not the best choice for complex algorithms and borderline optimizations.

The context of the AI war: what is really happening

This launch doesn’t come out of nowhere. Weeks ago, Sam Altman reportedly sent an internal “Code Red” memo because ChatGPT traffic was declining while Google’s market share among consumers was growing. OpenAI responded by releasing GPT-5.2 and new generative image models, boasting an eightfold increase in message volume since November 2024.

Google doesn’t directly enter this controversy. Doshi preferred a diplomatic tone: “What’s happening is that all these models continue to be extraordinary, challenge each other, push the boundary. And I think it’s fantastic that companies are releasing these models.”

Translation: yes, there is fierce competition, but Google legitimizes it as a positive stimulus for innovation.

Global availability: the default model from today

Gemini 3 Flash replaces Gemini 2.5 Flash as the default in the Gemini app and AI search. Global users don’t need to do anything: they will immediately see the new model. For those who prefer the Pro model, it remains selectable from the menu.

The model is now available in the United States for search, with a global rollout underway. The app supports uploads of videos, audio, sketches, documents—the model processes them and generates analysis, quizzes, recommendations, tables.

Gemini 3 Flash is not the most powerful model overall, but it is the smartest in terms of economy and speed. In a competition where all players achieve similar technical points, the winner is the one who delivers results faster at the most competitive price. This is the selector Google has chosen to differentiate itself.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.