Nvidia debuts Cosmos 3 model for physical AI

Illustration of robots working together on a giant brain
(Google Gemini)

Nvidia unveiled its new Cosmos 3 model, an open world foundation model for physical AI based on a new kind of architecture.

Nvidia said its Cosmos 3 model can understand and generate text, images, video, ambient sound and actions. 

Most are already familiar with the "mixture of experts" architecture that allows a combination of "expert" models to handle different tasks. But Cosmos 3 is based on a "mixture of transformers" architecture which pairs a reasoning transformer with an expert generation transformer, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories.

There are three iterations of Cosmos 3 - Super, Nano and Edge - to serve different use cases.

“The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world," Nvidia CEO Jensen Huang said in a statement.

Read the full press release here