In what was undoubtedly one of the most anticipated and highly attended CES keynotes of all time, Nvidia CEO Jensen Huang unveiled an impressively wide-ranging set of announcements spanning many of the hottest topics in tech, including AI, robotics, autonomous vehicles, and more.
Clad in a Las Vegas-glitz version of his trademark black leather jacket, the tech industry leader worked through the company’s latest GeForce RTX 50 series graphics cards, new Nemotron AI foundation model families, and AI blueprints for AI-powered agents.
He also highlighted extensions to the company’s Omniverse digital twin and simulation platform, which extends AI into the physical world, as well as new safety certifications for its autonomous driving platform. Additionally, he introduced a mini desktop-sized AI supercomputer called Project Digits, powered by the Grace Blackwell GPU. Needless to say, it was a lot to take in.
One of the most intriguing – though likely least understood – announcements was a set of foundation models and platform capabilities dubbed Cosmos. Defined as a suite of world foundation models, advanced tokenizers, safety guardrails, and an advanced video processing pipeline, Cosmos is designed to bring the training capabilities and advanced outcomes of generative AI from the digital realm into the physical world.
In other words, instead of using generative AI to create new digital outputs based on training across billions of documents, images, and other digital content, Cosmos can generate new physical actions – let’s call them analog outputs – by leveraging data it has been trained on from digitally simulated environments.
While the concept is complex, the real-world implications are both simple and profound. For applications like robotics, autonomous vehicles, and other mechanical systems, Cosmos enables these systems to react to physical stimuli in more accurate, safe, and helpful ways. For instance, humanoid robots can be trained to physically replicate the most effective or safest way to perform a task, whether it’s flipping an omelet or handling parts on a production line. Similarly, an autonomous car can dynamically adapt to varying situations and environments.
Also see: AI Agents Explained: The Next Evolution in Artificial Intelligence
Much of this type of training currently relies on manual efforts, such as filming humans performing the same action hundreds of times or having autonomous cars drive millions of miles. Even then, thousands of people must spend significant time hand-labeling and tagging those videos. With Cosmos, these training methods can be automated, dramatically reducing costs, saving time, and expanding the range of data available for the training process.
Nvidia Cosmos is a world foundation model development platform that incorporates generative models, a data curator, tokenizers, and a framework to accelerate physical AI development.
Cosmos works as an extension of Nvidia’s Omniverse digital simulation environment. It translates the digital physics of models and systems created in Omniverse into physical actions in the real world. While this distinction may seem subtle, it is critically important because it enables Cosmos to produce GenAI-powered physical outputs.
At the core of Cosmos are world foundation models, built from millions of hours of video content, which possess an understanding of the physical world. Cosmos takes the digital models of physical objects and environments created in Omniverse, integrates them into these world foundation models, and generates photorealistic video outputs of how the models are predicted to behave in real-world scenarios.
These videos then serve as synthetic data sources, which can be used to train models running in robotic systems, autonomous cars, and other GPU-powered mechanical systems. The result is systems that can respond more effectively across diverse environments.
Another noteworthy aspect is that Nvidia is making its Cosmos world foundation models available for free to encourage advancements in robotics and autonomous vehicles, as well as foster further experimentation.
In the short term, the immediate impact of Cosmos will be limited, as it primarily targets a niche audience developing advanced robotics and autonomous vehicle applications. However, in the long term, its influence could be profound, potentially speeding up the development of these product categories and improving the accuracy and safety of these systems.
More importantly, it demonstrates Nvidia’s ability to anticipate and prepare for emerging tech trends such as robotics. It also underscores the often-overlooked but ongoing transformation of Nvidia into a software company building platforms for these new applications. For those curious about where the company is headed and how it plans to sustain its impressive growth, these developments offer intriguing and important insights.
Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech
Leave a Comment