For most people, the AI boom has unfolded on a screen. Tools like ChatGPT, Claude, and Copilot have transformed how we draft emails, summarize documents, write code, and communicate. But while generative AI has changed how we work, it hasn’t yet reshaped the physical world where most economic activity actually happens.
Fei-Fei Li believes that’s about to change.
The Stanford professor– and one of the most influential voices in AI– argues that the next era of AI won’t be defined by chat boxes. It will be defined by systems that understand and act within the real world. She calls this next generation of AI “spatial intelligence,” and her new venture, World Labs, is betting it will unlock the next major wave of industrial and economic value.
When AI Leaves the Screen
Picture a utility operator facing wildfire-level winds. A spatially aware AI doesn’t wait for a prompt. It forecasts how conditions will evolve, reroutes power, dispatches a drone to inspect a transformer predicted to fail, and alerts first responders— before anything burns.
Or imagine a hospital in peak winter demand. A world model anticipates bottlenecks, simulates staffing scenarios, reorganizes beds, and directs autonomous robots delivering medications. This isn’t just data analysis; it’s real-time coordination.
These are the kinds of scenarios Li envisions— and the foundation of World Labs’ mission. Alongside co-founders Justin Johnson, Christoph Lassner, and Ben Mildenhall, all influential figures in computer vision and graphics, she’s developing models that understand objects, motion, cause-and-effect, and physical constraints. In other words, the ingredients of reality.
The company’s first product, Marble, offers an early glimpse of how spatial intelligence works. Provide a short text description and it generates an explorable 3D environment. It feels less like prompting an AI and more like stepping into a world spun up on demand.
Why the Physical Economy Is the Real Prize
LLMs dominate headlines, but most of the global economy operates in places language models can’t see— factories, warehouses, fields, hospitals, energy grids, construction sites.
These environments are governed by physics, timing, and uncertainty. Humans grasp these intuitively because we’ve spent our lives navigating them. Machines have not. As Li often notes, humans are “embodied agents”— we learn through movement, interaction, and consequence. AI systems trained solely on text lack that grounding, creating a gap between what they can describe and what they can actually do.
World models aim to close that gap by giving machines an intuition for how the world works. In this new paradign, AI is not defined by language, but by space.
As she explains in an interview with Lenny Rachitsky, “A language model reads a book and spits out the next sentence. A world model watches a movie, predicts the whole plot twist, and lets you rewrite the ending on the fly. It's not just describing; it's simulating physics, emotions, chaos-all the messy stuff of real life.”
The implications for business are significant. Companies will be able to model decisions before acting on them, reducing risk and accelerating execution. A manufacturing line can be redesigned digitally before any equipment is moved. A logistics network can be tested virtually before trucks or containers are rerouted. Hospitals can simulate patient flows before adjusting staffing or capacity. Construction firms can explore hundreds of design variations before committing materials.
In each case, decision-making becomes more informed and less reactive. Instead of building first and adjusting later, organizations can explore possibilities in a virtual setting where mistakes are free and insights accumulate quickly.
The markets for this technology are massive, with the scale of opportunity rivaling the early days of cloud computing, mobile, and the commercial internet.
Inside Marble and the World of Embodied AI
One of the most transformative applications of world models is “embodied AI”— the intelligence layer behind robots, drones, autonomous vehicles, and industrial automation. Today, these systems learn slowly because real-world training is expensive and error-prone.
World models change the equation by giving machines a safe, rich environment to learn thousands of hours of behavior. It’s the foundation robotics has lacked for decades.
World Labs’ debut product, Marble, is already being used to generate virtual sets and 3D scenes in hours instead of weeks. But Li sees it as a precursor to far broader enterprise capabilities: modeling facilities pre-construction, testing operational strategies before rollout, rehearsing safety scenarios before incidents, and designing customer experiences before physical investment.
Simulation becomes a strategic superpower. Spatial intelligence becomes the bridge between digital planning and physical execution.
AI Serving Humanity
With each breakthrough in AI that Li has driven, from ImageNet to LLMs to spatial intelligence and the launch of Marble, her work keeps circling back to the same principle: intelligence is only meaningful when it serves humanity. And as world models push AI from words into the physical world, that principle becomes the real competitive edge.
In Li’s view, the next era of innovation won’t be won by companies that simply deploy bigger models. It will be won by leaders who understand that AI’s power— and its risk— grows with its reach. Li is betting that spatial intelligence will redefine how industries operate, how we design and build, and how we respond to the world’s most complex challenges. But she’s equally clear about something Silicon Valley too often forgets: the future of AI isn’t predetermined. It depends on the choices we make.
If world models deliver on their promise, they won’t just help machines understand our world. They’ll help us redesign it with more foresight, more capability, and, if we choose wisely, more humanity.
You must be logged in to post a comment.