Nvidia unveiled its new Cosmos 3 model, an open world foundation model for physical AI based on a new kind of architecture.
Nvidia said its Cosmos 3 model can understand and generate text, images, video, ambient sound and actions.
Most are already familiar with the "mixture of experts" architecture that allows a combination of "expert" models to handle different tasks. But Cosmos 3 is based on a "mixture of transformers" architecture which pairs a reasoning transformer with an expert generation transformer, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories.
There are three iterations of Cosmos 3 - Super, Nano and Edge - to serve different use cases.
“The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world," Nvidia CEO Jensen Huang said in a statement.
Read the full press release here.