World Labs Marble: Fei-Fei Li's Vision for Spatial Intelligence

The researcher who gave machines the ability to see is now teaching them to imagine entire worlds. With World Labs Marble, Fei-Fei Li takes the next step beyond video generation into persistent, explorable 3D environments.

From ImageNet to World Models

💡

For context on how world models fit into AI video evolution, see our overview of world models as the next frontier.

Fei-Fei Li revolutionized computer vision with ImageNet, the dataset that made modern deep learning possible. Now, after a year of building World Labs with $230 million in funding, she has launched Marble, the company's first commercial product.

The thesis is simple: AI has conquered text, then images, then video. The next frontier is spatial intelligence, the ability to perceive, generate, and interact with 3D worlds.

$230M

Funding Raised

Pricing Tiers

Native Output

What Marble Does

Marble generates persistent, downloadable 3D environments from multiple input types:

✓Text prompts
✓Single images
✓Videos
✓Panoramas
✓3D layouts

Unlike real-time world models from competitors like Decart's Oasis or Google's Genie, Marble creates stable worlds with minimal morphing. You generate once, then explore freely without the AI "forgetting" what it created.

The Chisel Editor

🔨

AI-Native 3D Editing

Chisel decouples spatial structure from visual style. Block out your layout first, then apply text-based styling guidance.

This hybrid approach sets Marble apart from text-to-scene models. Instead of hoping the AI understands your spatial intent, you define the geometry explicitly. The AI handles aesthetics, materials, and lighting.

Think of it like sketching a floor plan before asking an interior designer to decorate. The control over spatial relationships remains yours.

Export Formats and Compatibility

Generated worlds export in three formats:

Format	Use Case
Gaussian Splats	Real-time rendering, novel views
Meshes	Game engines, CAD integration
Videos	Content creation, pre-vis

💡

All Marble worlds are VR-compatible with Vision Pro and Quest 3 headsets out of the box.

Pricing Structure

World Labs offers four tiers:

Tier	Price	Generations	Key Features
Free	$0	4/month	Text, image, or panorama input
Standard	$20/month	12/month	Multi-image/video input, advanced editing
Pro	$35/month	25/month	Scene expansion, commercial rights
Max	$95/month	75/month	All features, maximum generations

The free tier lets you evaluate the technology. For production work requiring commercial rights, the Pro tier at $35/month represents reasonable entry pricing for a capability this novel.

Why Spatial Intelligence Matters

"Spatial intelligence is the defining challenge of the next decade." - Fei-Fei Li

Li argues that current AI has a fundamental limitation: it reasons poorly about 3D space. Language models hallucinate physics. Video models create impossible geometries. Image generators struggle with consistent spatial relationships.

✗Current Approaches

Video models generate frame sequences without true 3D understanding. Camera movements reveal inconsistencies. Objects change position or disappear.

✓Spatial Intelligence

Native 3D representation enables physically consistent worlds. Move the camera freely. The environment persists because it exists as geometry, not pixels.

For robotics, this matters enormously. A robot navigating a kitchen needs spatial understanding, not frame prediction. For VFX, directors need explorable environments, not fixed camera paths.

Use Cases Taking Shape

Gaming Generate ambient environments and background spaces. Indie developers can create exploration areas that would require months of traditional art production.

Visual Effects Pre-visualization becomes interactive. Block out a scene spatially, then explore camera angles before committing to shots.

Architecture Convert floor plans to explorable walkthroughs. Clients experience spaces before construction begins.

Education Li envisions students walking inside a cell, surgeons practicing inside anatomical simulations.

World Expansion and Composer Mode

Two features address scale limitations:

World Expansion lets you extend a generated world once, adding detail to edge regions where quality typically degrades. This pushes the boundaries of explorable space beyond initial generation limits.

Composer Mode combines multiple worlds into larger environments. Generate individual rooms, then stitch them into a complete building.

These tools acknowledge current constraints while providing practical workarounds.

The Competition Landscape

Marble enters a crowded field:

Product	Approach	Differentiator
Decart Oasis	Real-time game generation	Interactive, but worlds shift during exploration
Google Genie	Game world generation	Frame prediction without true 3D
Odyssey	Persistent world models	Enterprise focus
World Labs Marble	Static 3D generation	Downloadable, editable, VR-ready

The trade-off is clear. Real-time models like Oasis offer immediacy but instability. Marble prioritizes persistence and editability over interactivity.

Connecting to Video Generation

💡

For background on diffusion architectures used in spatial AI, see our technical overview of diffusion transformers.

How does 3D world generation relate to video? They share mathematical foundations in diffusion models, but solve different problems.

Video generation creates temporal sequences, frame after frame. Spatial AI creates geometric representations, surfaces and volumes. Video answers "what happens next?" Spatial AI answers "what exists here?"

The convergence point: navigable video. Generate a 3D world, then render video as you move through it. This approach offers camera control impossible with pure video generation.

Limitations to Consider

Marble is not a complete solution:

○No animated characters or dynamic elements
○Generation caps may limit production workflows
○Edge degradation requires expansion passes
○Static environments only

For animated content, you still need video generation models. Marble excels at environments and spaces, not actors or actions.

The Bigger Picture

Fei-Fei Li sees spatial intelligence as essential for AI progress:

"I think all of us have a responsibility in ushering AI to a better state as it becomes more powerful. All of us should want humanity to prevail and thrive."

Her vision extends beyond entertainment. Medical simulations where students explore anatomy. Scientific visualizations where researchers navigate molecular structures. Robotic training environments generated on demand.

Marble is step one, a commercial proof of concept. The research continues toward more dynamic, interactive, and physically accurate world generation.

Getting Started

World Labs offers a free tier with 4 generations per month. Enough to evaluate the technology and understand its constraints.

For creators already working in 3D, the mesh export capability integrates with existing pipelines. For video producers, the video export provides pre-visualization capabilities unavailable elsewhere.

💡

Related reading: Our guide to AI video character consistency covers techniques for maintaining coherence across generated content, a challenge Marble addresses through persistent 3D representation.

The transition from 2D generation to 3D world creation represents a fundamental shift in what AI can produce. Marble makes that shift accessible.