World Labs Marble: Fei-Fei Li's Vision for Spatial Intelligence
AI pioneer Fei-Fei Li launches Marble, a commercial platform that generates explorable 3D worlds from text and images, marking a new frontier in spatial AI.

From ImageNet to World Models
For context on how world models fit into AI video evolution, see our overview of world models as the next frontier.
Fei-Fei Li revolutionized computer vision with ImageNet, the dataset that made modern deep learning possible. Now, after a year of building World Labs with $230 million in funding, she has launched Marble, the company's first commercial product.
The thesis is simple: AI has conquered text, then images, then video. The next frontier is spatial intelligence, the ability to perceive, generate, and interact with 3D worlds.
What Marble Does
Marble generates persistent, downloadable 3D environments from multiple input types:
- ✓Text prompts
- ✓Single images
- ✓Videos
- ✓Panoramas
- ✓3D layouts
Unlike real-time world models from competitors like Decart's Oasis or Google's Genie, Marble creates stable worlds with minimal morphing. You generate once, then explore freely without the AI "forgetting" what it created.
The Chisel Editor
AI-Native 3D Editing
Chisel decouples spatial structure from visual style. Block out your layout first, then apply text-based styling guidance.
This hybrid approach sets Marble apart from text-to-scene models. Instead of hoping the AI understands your spatial intent, you define the geometry explicitly. The AI handles aesthetics, materials, and lighting.
Think of it like sketching a floor plan before asking an interior designer to decorate. The control over spatial relationships remains yours.
Export Formats and Compatibility
Generated worlds export in three formats:
| Format | Use Case |
|---|---|
| Gaussian Splats | Real-time rendering, novel views |
| Meshes | Game engines, CAD integration |
| Videos | Content creation, pre-vis |
All Marble worlds are VR-compatible with Vision Pro and Quest 3 headsets out of the box.
Pricing Structure
World Labs offers four tiers:
| Tier | Price | Generations | Key Features |
|---|---|---|---|
| Free | $0 | 4/month | Text, image, or panorama input |
| Standard | $20/month | 12/month | Multi-image/video input, advanced editing |
| Pro | $35/month | 25/month | Scene expansion, commercial rights |
| Max | $95/month | 75/month | All features, maximum generations |
The free tier lets you evaluate the technology. For production work requiring commercial rights, the Pro tier at $35/month represents reasonable entry pricing for a capability this novel.
Why Spatial Intelligence Matters
"Spatial intelligence is the defining challenge of the next decade." - Fei-Fei Li
Li argues that current AI has a fundamental limitation: it reasons poorly about 3D space. Language models hallucinate physics. Video models create impossible geometries. Image generators struggle with consistent spatial relationships.
For robotics, this matters enormously. A robot navigating a kitchen needs spatial understanding, not frame prediction. For VFX, directors need explorable environments, not fixed camera paths.
Use Cases Taking Shape
Gaming Generate ambient environments and background spaces. Indie developers can create exploration areas that would require months of traditional art production.
Visual Effects Pre-visualization becomes interactive. Block out a scene spatially, then explore camera angles before committing to shots.
Architecture Convert floor plans to explorable walkthroughs. Clients experience spaces before construction begins.
Education Li envisions students walking inside a cell, surgeons practicing inside anatomical simulations.
World Expansion and Composer Mode
Two features address scale limitations:
World Expansion lets you extend a generated world once, adding detail to edge regions where quality typically degrades. This pushes the boundaries of explorable space beyond initial generation limits.
Composer Mode combines multiple worlds into larger environments. Generate individual rooms, then stitch them into a complete building.
These tools acknowledge current constraints while providing practical workarounds.
The Competition Landscape
Marble enters a crowded field:
| Product | Approach | Differentiator |
|---|---|---|
| Decart Oasis | Real-time game generation | Interactive, but worlds shift during exploration |
| Google Genie | Game world generation | Frame prediction without true 3D |
| Odyssey | Persistent world models | Enterprise focus |
| World Labs Marble | Static 3D generation | Downloadable, editable, VR-ready |
The trade-off is clear. Real-time models like Oasis offer immediacy but instability. Marble prioritizes persistence and editability over interactivity.
Connecting to Video Generation
For background on diffusion architectures used in spatial AI, see our technical overview of diffusion transformers.
How does 3D world generation relate to video? They share mathematical foundations in diffusion models, but solve different problems.
Video generation creates temporal sequences, frame after frame. Spatial AI creates geometric representations, surfaces and volumes. Video answers "what happens next?" Spatial AI answers "what exists here?"
The convergence point: navigable video. Generate a 3D world, then render video as you move through it. This approach offers camera control impossible with pure video generation.
Limitations to Consider
Marble is not a complete solution:
- ○No animated characters or dynamic elements
- ○Generation caps may limit production workflows
- ○Edge degradation requires expansion passes
- ○Static environments only
For animated content, you still need video generation models. Marble excels at environments and spaces, not actors or actions.
The Bigger Picture
Fei-Fei Li sees spatial intelligence as essential for AI progress:
"I think all of us have a responsibility in ushering AI to a better state as it becomes more powerful. All of us should want humanity to prevail and thrive."
Her vision extends beyond entertainment. Medical simulations where students explore anatomy. Scientific visualizations where researchers navigate molecular structures. Robotic training environments generated on demand.
Marble is step one, a commercial proof of concept. The research continues toward more dynamic, interactive, and physically accurate world generation.
Getting Started
World Labs offers a free tier with 4 generations per month. Enough to evaluate the technology and understand its constraints.
For creators already working in 3D, the mesh export capability integrates with existing pipelines. For video producers, the video export provides pre-visualization capabilities unavailable elsewhere.
Related reading: Our guide to AI video character consistency covers techniques for maintaining coherence across generated content, a challenge Marble addresses through persistent 3D representation.
The transition from 2D generation to 3D world creation represents a fundamental shift in what AI can produce. Marble makes that shift accessible.
Was this article helpful?

Alexis
AI EngineerAI engineer from Lausanne combining research depth with practical innovation. Splits time between model architectures and alpine peaks.
Related Articles
Continue exploring with these related posts

Runway GWM-1: The General World Model That Simulates Reality in Real Time
Runway's GWM-1 marks a paradigm shift from generating videos to simulating worlds. Explore how this autoregressive model creates explorable environments, photorealistic avatars, and robot training simulations.

YouTube Brings Veo 3 Fast to Shorts: Free AI Video Generation for 2.5 Billion Users
Google integrates its Veo 3 Fast model directly into YouTube Shorts, offering free text-to-video generation with audio for creators worldwide. Here is what it means for the platform and AI video accessibility.

Video Language Models: The Next Frontier After LLMs and AI Agents
World models are teaching AI to understand physical reality, enabling robots to plan actions and simulate outcomes before moving a single actuator.