Scan to Download Gate App

More Download Options

Don't remind me again today

Gemini 3 Strikes Late at Night: Outperforming GPT 5.1, the Era of Large Models from Google Has Arrived

DeepFlowTech

2025-11-19 01:35:34

Gemini 3 hasn't appeared yet, and Twitter has crashed in advance.

No model release has attracted more attention than Gemini 3. Based on Gemini's previous frequency of updates every 3 months, the AI community has been eagerly anticipating Gemini 3 since September.

Today, the head of Google Developer Relations and Google AI Studio posted a tweet containing only the word “Gemini,” and the anticipation that had been building for months finally reached a breaking point, instantly igniting discussions related to the topic on Twitter.

Interestingly, just before the release node, Twitter unexpectedly crashed a few times. Although the “puppet master” was Cloudflare, the timing of this crash was so precise that it made people suspect that someone was pulling the strings behind the scenes (whispering crickets: after all, Twitter is the main promotional ground for various models).

I wonder what Musk is feeling right now after just releasing Grok 4.1 this morning, anyway, the memes from netizens are everywhere.

Just now, Gemini 3 has finally officially debuted. Let's see how powerful it is under the spotlight.

The most intelligent model

It turns out that Google did not disappoint those who were waiting, Gemini 3 has officially launched, redefining SOTA once again, and Ultraman and Musk have also sent their congratulations.

Google defines it as “an important step towards AGI” and emphasizes that it is currently the most powerful agent in the world in terms of multimodal understanding and depth of interaction.

Gemini 3 not only refreshed the SOTA standard in basic reasoning abilities, but also attempted to reshape the developer ecosystem and AI-assisted experience by launching the brand new Google Antigravity platform and Deep Think mode.

The all-dominating reasoning monster

Gemini 3 Pro is officially referred to as “the most advanced inference model”, significantly surpassing its predecessor Gemini 2.5 Pro in almost all mainstream AI benchmark tests, and comprehensively outperforming major competitors like Claude Sonnet 4.5 and GPT-5.1.

Gemini 3 Pro has topped the LMArena Leaderboard with a groundbreaking high score of 1501 Elo, achieving the highest scores in Humanity’s Last Exam (37.5% without using any tools) and GPQA Diamond (91.9%), demonstrating doctoral-level reasoning abilities. It has also set a new standard for cutting-edge models in mathematics, reaching the latest SOTA level of 23.4% on MathArena Apex.

In addition to text and logic, Gemini 3 Pro has redefined the limits of multimodal reasoning. It scored high marks of 81% and 87.6% on MMMU-Pro and Video-MMMU respectively, which means it excels at both analyzing complex scientific charts and understanding dynamic video streams.

It is worth mentioning that it achieved a score of 72.1% on SimpleQA Verified, demonstrating significant progress in factual accuracy - it is not only strong but also reliable.

Rejecting the flattering thinking partner

The evolution of the Gemini 3 Pro is not only in its performance scores but also in the quality of interaction. It eliminates the clichés and excessive flattery commonly found in AI, becoming smart, concise, and direct: telling you what you need to hear, not just what you want to hear.

It acts as a true thinking partner, providing you with new ways to understand information and express yourself, from translating obscure scientific concepts through generating high-fidelity visual code to creative brainstorming.

Gemini 3 Deep Think

Gemini 3 Deep Think mode further expands the boundaries of intelligence, bringing significant advancements in reasoning and multimodal understanding capabilities of Gemini 3, helping you solve more complex problems.

In the tests, Gemini 3 Deep Think performed better than the already impressive scores of Gemini 3 Pro in both Humanity's Last Exam (41.0% without tools) and GPQA Diamond (93.8%). Additionally, it achieved an unprecedented score of 45.1% on ARC-AGI-2 (code execution, validated by the ARC Prize), showcasing its ability to tackle entirely new challenges.

The Gemini 3 Deep Think mode performs exceptionally well in some of the most challenging AI benchmarks.

Learn, Build, and Plan

Learning anything

Gemini has been designed from the beginning to seamlessly integrate various modalities of information on any topic, including text, images, videos, audio, and code. Gemini 3 combines its advanced reasoning, visual and spatial understanding capabilities, leading multilingual performance, and a million-token context window, further expanding the boundaries of multimodal reasoning, helping you learn in the way that suits you best.

For example, if you want to learn how to cook family traditional dishes, Gemini 3 can interpret and translate handwritten recipes in different languages, generating recipes that can be shared with family.

Alternatively, if you want to learn a new topic, you can provide academic papers, long video lectures or tutorials, and it can generate interactive flashcards, visualizations, or code in other formats to help you master the relevant knowledge.

It can even analyze your pickleball match videos, identify areas for improvement, and create a training plan to help you enhance your skills comprehensively.

To help you better understand the information online, the AI mode in search now uses Gemini 3 to deliver a new generative UI experience, such as immersive visual layouts, interactive tools, and simulations, all of which are generated instantaneously based on your queries.

Develop anything

Building on the success of 2.5 Pro, Gemini 3 delivers on the promise of turning any developer's idea into reality. It excels in zero-shot generation, capable of handling complex prompts and instructions, resulting in richer and more interactive web user interfaces.

Gemini 3 is the best Vibe coding and Agent coding model built by Google to date, making Google's products more autonomous and significantly improving developer efficiency. It ranks first on the WebDev Arena leaderboard, achieving an impressive Elo score of 1487. Additionally, it scored 54.2% in the Terminal-Bench 2.0 test, which assesses the model's ability to use tools for operating a computer through the terminal. At the same time, it also significantly outperformed the 2.5 Pro version (scoring 76.2%) in the SWE-bench Verified test, which measures the performance of coding agents.

Now, users can build using Google AI Studio, Vertex AI, Gemini CLI, and Gemini 3 in Google's brand new intelligent agent development platform, Google Antigravity. It is also compatible with third-party platforms such as Cursor, GitHub, JetBrains, Manus, and Replit.

For example, to create a retro 3D spaceship game with richer visual effects and stronger interactivity.

For example, writing richer and more interactive Web UIs and applications:

Plan anything

Since the Gemini 2 intelligent body, Gemini has significantly enhanced its planning capability in long-term tasks.

The planning capabilities of Gemini 3 were further validated in the Vending-Bench 2 test: Gemini 3 topped the leaderboard in the simulated vending machine operation test, managing virtual business operations through long-term planning throughout.

In the complete simulation of annual operations, Gemini 3 Pro consistently maintains stable tool invocation and decision-making coherence, achieving higher investment returns while continuously focusing on task objectives.

Gemini 3 Pro demonstrates superior long-term planning capabilities, able to generate higher returns compared to other cutting-edge models.

Gemini Agent can also help organize your Gmail inbox.

Gemini 3 is now fully open. Starting today, regular users and subscribers can use the new model through the Gemini App and search AI mode; developers and enterprise clients can also access it through AI Studio, Vertex AI, and other channels. As for the highly anticipated “Deep Thinking Mode,” it is expected to launch exclusively for Google AI Ultra subscribers in the coming weeks.

Additionally, according to previously leaked model cards, there are many key pieces of information worth noting: Google trained this model from scratch using TPU, and as an MoE, it has 1M inputs and 64k token outputs. MoE means they can afford to make it inexpensive.

In terms of pricing, Gemini 3.0 Pro introduces a tiered pricing mechanism based on context length: for tasks under 200k tokens, the input/output prices are $2.00/$12.00 (per million tokens); for over 200k tokens, they are $4.00 and $18.00 respectively.

Brand new “Intelligent Agent First” development experience

Google Antigravity is Google's brand new intelligent agent development platform that enables developers to operate at a higher, task-oriented level. Leveraging the advanced reasoning, tool usage, and agent programming capabilities of Gemini 3, Google Antigravity transforms AI assistance from a tool in the developer's toolbox into an active collaborator.

Although the core of Google Antigravity is a familiar AI IDE (Integrated Development Environment) experience, its agent has been upgraded to a dedicated interface and granted direct access to the editor, terminal, and browser. Now, the agent can autonomously plan and execute complex end-to-end software tasks on your behalf while validating its own code.

In addition to the Gemini 3 Pro, Google Antigravity is closely integrated with Google's latest browser control model, the Gemini 2.5 Computer Use model, as well as its top-notch image editing model, Nano Banana (Gemini 2.5 Image).

First-hand experience

Since the Gemini 3 Pro preview version has launched on the AI Studio platform, we also had a hands-on experience.

Prompt : SVG of NEW YORK SKYLINE Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block.

Prompt: Create a visually stunning Space Invaders game.

A pelican riding a bicycle has posed challenges for many large models, and this time we let Gemini 3 give it a try. Prompt: An animated SVG of a pelican riding a bicycle.

Compared to the previous version, Gemini 3 has made significant progress, but there are still bugs, such as the bicycle pedals spinning in the air.

We have changed to a clearer prompt: Create a single, complete, self-contained animated SVG code (no external files or images) of a cute pelican riding a bicycle from a side view. This time, the bicycle generated by Gemini 3 seems to be missing pedals.

Written at the end

In the poll initiated by X blogger Chubby “Which company will have the best LLM by the end of 2026?”, Google Gemini is far ahead.

This rebound in market confidence is also reflected in the data, as Alphabet CEO Sundar Pichai reviewed the progress of Gemini in the past two years on the official blog: AI Overviews has reached 2 billion monthly active users, the Gemini app has surpassed 650 million monthly active users, and over 70% of cloud customers and 13 million developers are using its generative models.

Looking back over the past two years, from the hasty response and stock price crash at the launch of Bard (the predecessor of Gemini), to the painful realization of merging with Google DeepMind, recalling the founder, and winning the Nobel Prize, Google has completed a textbook-like “elephant turn.”

The giant that once defined the Transformer and is now “All in Gemini” is ready for a full counterattack.

As for whether it can really end the “best LLM” debate? Don't rush, let the bullets (and servers) fly a little longer.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.