Meta: Can afford a trillion in computing power but can't keep the key people

2026-02-28 09:31:18

Written by: Ada, Deep Tide TechFlow

Pang Ruiming hadn’t even settled into his seat at Meta before leaving.

In July 2025, Zuckerberg secured this highly sought-after Chinese AI infrastructure engineer from Apple with a multi-year compensation package worth over $200 million. Pang was assigned to Meta’s Superintelligence Lab to build the infrastructure for the next-generation AI models.

Seven months later, OpenAI poached him.

According to The Information, OpenAI launched a months-long recruitment campaign targeting Pang Ruiming. Although he told colleagues he was “very happy working at Meta,” he ultimately chose to leave. Bloomberg reported that his compensation at Meta was tied to milestones, and leaving early meant forfeiting most of his unvested equity.

$200 million can’t buy seven months of loyalty.

This is not just a simple job switch story.

One person’s departure, a signal from many

Pang Ruiming is not the first to leave.

Last week, Mat Velloso, head of Meta’s Superintelligence Lab developer platform, also announced his departure. He joined Meta less than eight months ago after leaving Google DeepMind in July 2024. Going further back, in November 2025, Yann LeCun, a Turing Award winner and Meta’s Chief AI Scientist who had been with the company for 12 years, announced his departure to pursue his “world model” vision. Recently, Russ Salakhutdinov, Vice President of Meta’s Generative AI Research and a core disciple of Geoffrey Hinton, also publicly announced his exit.

To understand the talent drain at Meta AI, we must first grasp how damaging Llama 4 really is.

In April 2025, Meta proudly released the Llama 4 series, including Scout and Maverick models. Official papers boasted impressive data, claiming to outperform GPT-4.5 and Claude Sonnet 3.7 on core benchmarks like MATH-500 and GPQA Diamond.

However, this flagship model, carrying Meta’s ambitions, quickly revealed its flaws in third-party blind tests within the open-source community. Its generalization and reasoning abilities fell far short of the hype. Facing strong community skepticism, Yann LeCun, the Chief AI Scientist, finally admitted that during testing, “different model versions were used on different test sets to optimize the final scores.”

In rigorous AI academia and engineering circles, this crossed an unforgivable red line. In other words, Meta trained Llama 4 to be a “test-taking machine” that only excels at past exam questions, rather than a truly cutting-edge intelligent model. It was like giving math exams to a math champion, programming tests to a programming champion—each looks strong individually, but they are not the same model.

This practice is called “cherry-picking” in AI academia, and “exam cheating” in exam-oriented education.

For Meta, which has always positioned itself as an “open-source lighthouse,” this scandal destroyed its most valuable asset in the developer ecosystem: trust. The immediate consequence was that Zuckerberg lost confidence in the original GenAI team’s engineering standards, leading to a series of appointments of external executives and sidelining core infrastructure departments.

He spent between $14.3 billion and $15 billion acquiring a 49% stake in data annotation company Scale AI, and parachuted 28-year-old Scale CEO Alexandr Wang as Meta’s Chief AI Officer, establishing the Meta Superintelligence Lab (MSL). Under this new structure, LeCun, a Turing Award winner, now reports to the young Wang. In October, Meta cut about 600 positions at MSL, including members of FAIR, the research division LeCun founded.

Meanwhile, the flagship model originally scheduled for release in summer 2025, Llama 4 Behemoth, was repeatedly delayed—from summer to fall, and finally indefinitely shelved.

Meta shifted focus to developing next-generation models codenamed “Avocado” (text) and “Mango” (image/video). Reports suggest Avocado aims to compete with GPT-5 and Gemini 3 Ultra. Originally scheduled for late 2025, it was postponed to Q1 2026 due to underperformance in testing and training. Meta is considering a closed-source release, abandoning its traditional open-source approach for the Llama series.

Meta made two fatal errors in AI modeling: first, faking benchmark results—destroying developer trust; second, forcing FAIR, a foundational research team that takes a decade to mature, into a product-oriented organization driven by quarterly KPIs. These two issues are the root causes of the current talent exodus.

Self-developed chips: another broken leg

Talent is running, and chips are also facing issues.

According to The Information, Meta recently canceled its most advanced AI training chip project.

Meta’s self-developed chip plan is called MTIA (Meta Training and Inference Accelerator). The initial roadmap was ambitious: MTIA v4 (“Santa Barbara”), v5 (“Olympus”), and v6 (“Universal Core”) were planned for delivery between 2026 and 2028. Olympus was designed as Meta’s first chip based on 2nm chiplet architecture, aiming to cover high-end model training and real-time inference, ultimately replacing Nvidia in Meta’s training clusters.

Now, this cutting-edge training chip project has been canceled.

Meta has made some progress on inference chips. The “Iris” MTIA v3 inference chip has been deployed at scale in Meta’s data centers, mainly for Facebook Reels and Instagram recommendation systems, reportedly reducing overall costs by 40-44%. But inference and training are different beasts. Inference runs models; training develops models. Meta can produce inference chips but cannot yet build training chips capable of competing directly with Nvidia.

This is not the first time. In 2022, Meta attempted to develop inference chips internally but failed in small-scale deployment and quickly abandoned the project, turning to Nvidia for large orders.

The setback in self-developed chips has accelerated Meta’s rush to purchase hardware externally.

$135 billion panic buying

In January 2026, Meta announced capital expenditures of $115 billion to $135 billion for the year—almost double the $72.2 billion spent in 2025. Most of this money is allocated to chips.

Within ten days, three major deals were finalized:

On February 17, Meta signed a multi-year, cross-generational strategic partnership with Nvidia. Meta will deploy “millions” of Nvidia Blackwell and new Vera Rubin GPUs, plus Grace CPUs. Analysts estimate the deal is worth hundreds of billions of dollars, making Meta the first supercomputing customer to deploy Nvidia’s Grace CPUs at scale.

On February 24, Meta signed a chip deal with AMD valued between $60 billion and $100 billion. Meta will purchase AMD’s latest MI450 series GPUs and sixth-generation EPYC CPUs. As part of the deal, AMD issued Meta warrants for up to 160 million common shares, representing about 10% of AMD, at $0.01 per share, vested in milestones.

On February 26, The Information reported that Meta signed a multi-year deal with Google to lease Google Cloud’s TPU chips for training and running its next-generation large language models. The two are also discussing Meta’s direct purchase of TPU deployments starting in 2027.

A social media giant, within ten days, simultaneously placed orders with three chip suppliers totaling potentially over $100 billion.

This is not diversification. It’s panic buying.

Three layers of compute anxiety

Why is Meta in such a rush?

First, self-developed chips are no longer reliable. The most advanced training chip project was canceled, meaning Meta will have to rely on external hardware for AI training in the foreseeable future. While MTIA inference chips can handle recommendation systems, training frontier models like Avocado to compete with GPT-5 requires hardware from Nvidia or equivalent.

Second, competitors won’t wait. OpenAI has secured massive resources from Microsoft, SoftBank, and the Abu Dhabi Sovereign Fund. Anthropic has locked in 1 million TPU and Trainium chips from Google and Amazon. Google’s Gemini 3 is fully trained on TPUs. If Meta cannot secure enough compute power, it risks losing its place in the race.

Third—and perhaps most fundamentally—Zuckerberg needs to use “purchasing power” to compensate for “R&D shortcomings.” The failures of Llama 4, talent drain, and chip setbacks have made Meta’s AI narrative fragile in Wall Street’s eyes. Signing big deals with Nvidia, AMD, and Google signals: “We have money, we are buying, we are still in the game.”

Meta’s current strategy is: if software can’t be fixed, then buy hardware; if talent can’t be retained, then buy chips. But AI competition isn’t won by writing checks. Compute power is necessary but not sufficient. Without top-tier model teams and a clear technical roadmap, even the most expensive chips are just costly inventory.

Buyers’ dilemma

Looking back at Meta’s three deals in February, one interesting detail is often overlooked.

Meta is buying current Blackwell and future Vera Rubin GPUs from Nvidia; from AMD, it’s purchasing MI450 and upcoming MI455X; from Google, it’s leasing current TPU chips with plans to buy directly next year.

Three suppliers, three completely different hardware architectures and software ecosystems.

This means Meta must switch back and forth among Nvidia’s CUDA, AMD’s ROCm, and Google’s XLA/JAX. While multi-supplier strategies can diversify supply chain risks and lower hardware costs, they exponentially increase engineering complexity.

This is Meta’s most critical weakness right now: enabling a trillion-parameter model to train efficiently across three vastly different low-level programming models and hardware architectures requires more than engineers familiar with CUDA; it demands architects capable of building cross-platform training frameworks from scratch.

There are probably fewer than 100 such people worldwide. Pang Ruiming is one of them.

Spending $100 billion to acquire the world’s most complex hardware setup, while losing the brains capable of wielding that hardware—that’s the most surreal scene in Zuckerberg’s high-stakes gamble.

Zuckerberg’s gamble

Zooming out, Zuckerberg’s AI strategy over the past 18 months closely resembles his previous all-in approach to the metaverse:

Spot the trend, pour in money, hire aggressively, face setbacks, pivot strategy, then pour in more money.

From 2021 to 2023, it was the metaverse—losing hundreds of millions annually, with the stock price dropping from $380 to $88. From 2024 to 2026, it’s AI—again, reckless spending, frequent reorganizations, and a narrative of “trust me, I have a vision.”

The difference is, this time AI is more tangible than the metaverse. Meta has the money to burn, and its advertising business generates abundant cash flow—Q4 2025 revenue hit $59.9 billion, up 24% year-over-year.

The problem: money can buy chips, compute, and even the people sitting in the seats, but not the people who stay.

Pang Ruiming chose OpenAI; Russ Salakhutdinov chose to leave; LeCun chose to start his own venture.

Zuckerberg’s current bet is that as long as Meta can buy enough chips, build enough data centers, and spend enough money, it can find or cultivate the talent to use these resources.

This bet might pay off. Meta is still one of the wealthiest tech companies globally, with over $100 billion in operating cash flow as its strongest moat. From OpenAI to Anthropic, from Google to other competitors, Meta continues to poach talent. According to QuantumBit, nearly 40% of Meta’s Superintelligence team of 44 members come from OpenAI.

But the brutal truth of AI competition is that compute reserves, talent rosters, and model benchmarks are all public. The Llama 4 benchmark scandal proved that in this industry, you can’t sustain a lead with just PPTs and PR.

Ultimately, the market only cares about one thing: how good is your model?

The food chain in AI

As AI arms races enter 2026, the ranking of the food chain is becoming clearer:

At the top are OpenAI and Google. OpenAI has the strongest models, the largest user base, and the most aggressive funding. Google has full vertical integration—self-developed chips, models, and cloud infrastructure. Anthropic follows closely, leveraging Claude’s product strength and dual compute supply from Google and Amazon, firmly in the first tier.

Meta? It has spent the most, signed the largest chip contracts, and reorganized most frequently, but so far, it has yet to produce a leading model that convinces the market.

Meta’s AI story is somewhat like Yahoo in 2005. Yahoo was also one of the wealthiest internet companies, making numerous acquisitions and investments, but couldn’t produce a search engine comparable to Google. Money isn’t everything. Zuckerberg needs to clarify what Meta’s AI goal truly is, rather than chasing every hot trend.

Of course, it’s too early to write Meta’s obituary. With 3.58 billion monthly active users, $59.9 billion quarterly revenue, and the world’s largest social data set, Meta possesses assets that are hard for any competitor to replicate.

If the next-generation model codenamed Avocado can be delivered on schedule in 2026 and re-enter the top tier, Zuckerberg’s heavy spending and restructuring will be seen as strategic resilience. But if it falls short again, the $135 billion spent will only produce a warehouse of powered silicon wafers.

After all, Silicon Valley’s AI arms race has never lacked big-spending super buyers. What’s missing is people who know how to forge the future with that compute power.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.