Nvidia Has Released Its Best AI Version But It’s Still Behind China



In short

  • NVIDIA unveiled the Nemotron 3 Ultra at Computex on June 1, a 550-billion-parameter open-weight model.
  • The model offers more than 300 tokens per second on the previously released DeepInfra, running three to five times faster than Chinese players.
  • But Kimi K2.6 from Moonshot AI still leads the open intelligence category.

Jensen Huang took to the stage at Computex in Taipei on Sunday, wearing a leather jacket, and unveiled the Nemotron 3 Ultra —Nvidia’s biggest open-source AI model ever and, at the moment, the smartest open-source model made in America. It’s good. It is not enough to beat China.

The model has a total of 550 billion parts but runs on 55 billion active parts at any given time, using a design called a professional mixer. Parameters are what determine the level of knowledge of an AI model, with a higher number meaning more powerful.

To understand how the multi-specialist model works, think of a hospital with hundreds of specialists: When a patient comes in, only the right doctors appear—not the entire staff. This method makes the cost of driving this model much lower than its title calculates, which is why Nvidia would say. 5x faster and the cost is 30% lower than other similar open source solutions.

Independent evaluator Artificial Analysis, which associated with Nvidia in the pre-release evaluation, put the Nemotron 3 Ultra at 48 on the Intelligence Index-a composite indicator that combines 10 evaluations based on reasoning, copying, general knowledge, and performance, scoring on a numerical scale where higher means mean intelligence.

This makes it a super light weight US brand with a comfortable margin. The closest American choices are Gemma 4 31B from Google at 39, Nemotoron 3 Super at 36, and OpenAI’s gpt-oss-120b at 33.

The differences between the former are striking. Nemotron 3 Super, which was released in March 2026 for 120 billion units, was already seen as an open model of independent agents. Ultra jumped 12 indexes above it, which in benchmark terms is a big jump.

What is the Nemotron family?

Nvidia has been in the modeling business longer than most people think. The first model bearing the name Nemotron dropped in November 2023, the third generation was announced in December 2025.

The family comes in three sizes: Nano for light tasks, Super for medium-sized businesses, and Ultra for demanding tasks. All three share an architecture that combines Mamba-2 components, Transformer’s standard care, and professional mixing techniques.

Mamba-2 is a long-term memory replacement at a low cost – essential when you need a model that can store a million tokens in memory at once. Nemotron 3 Ultra supports a 1-million-token data window, meaning that an agent can, in theory, have a large codebase or hundreds of search documents at once.

The Ultra version also includes a method called multi-token prediction (MTP), which allows the model to predict several future tokens at once instead of one at a time, speeding up generation. All three types of Nemotron 3 were after training using reinforcement learning in multiple social settings, teaching them to plan and perform multiple tasks instead of just answering questions.

Ultra weights are visible and his training recipes are being released. Do you need a large computer to use it? Actually, yes – the 550-billion-parameter model lives in the datacenter sector. But you can access it through Nvidia’s API or cloud providers without owning your own hardware, so everyone already uses GPT or Claude through the browser.

Faster type, less brain

The issue of speed is where the Nemotron 3 Ultra really shines. On release DeepInfra endthe brand issued more than 300 tokens per second. The Chinese models in its smart range – DeepSeek V4 Pro and Kimi K2.6 – are sent at 50-100 tokens per second through their trading APIs today. This fast pace also affects international deployments, especially for freelancers working on long-term projects that wait for each step quickly.

But raw speed does not eliminate the intellectual competition. The published Analysis Production Chart explains the real story clearly. On the vertical axis-intelligence-Nemotron 3 Ultra sits at 48 which is good, but China’s Kimi K2.6 from Moonshot AI sits at 54. The difference of six points in the index represents a big difference: Kimi K2.6 was released in April 2026 and is currently in fourth place among all Google points, closed or open Anthrop worldwide. OpenAI benchmarks – all built in 57.

The light weight US situation is not unusual. China’s labs have been flooding open spaces with strong models while American companies—OpenAI, Anthropic, Google—keep their best practices behind APIs. Like Decrypt was reported in MarchChina’s open model jumps from about 1.2% of global model consumption by the end of 2024 to about 30% by the end of 2025. he publicly unveiled a five-year plan spending $26 billion on open AI development.

The Nemotron 3 Ultra is the most visible result of that bet so far. Nvidia also announced that it is already working on Nemotoron 4-the next generation-which was created through the Nemotoron Coalition, a group of eight AI labs including Mistral AI and Perplexity that Nvidia gathered in March 2026 to collaborate on creating open models on the borders of the DGX Cloud infrastructure. Nemotron 3 Ultra ships June 4.

Daily Debrief A letter

Start each day with top stories right here, including originals, podcasts, videos and more.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *