Meta Pixel
AlexisAlexis
6 min read
1151 words

World Labs Marble: Fei-Fei Li's Vision for Spatial Intelligence

AI pioneer Fei-Fei Li launches Marble, a commercial platform that generates explorable 3D worlds from text and images, marking a new frontier in spatial AI.

World Labs Marble: Fei-Fei Li's Vision for Spatial Intelligence
The researcher who gave machines the ability to see is now teaching them to imagine entire worlds. With World Labs Marble, Fei-Fei Li takes the next step beyond video generation into persistent, explorable 3D environments.

From ImageNet to World Models

💡

For context on how world models fit into AI video evolution, see our overview of world models as the next frontier.

Fei-Fei Li revolutionized computer vision with ImageNet, the dataset that made modern deep learning possible. Now, after a year of building World Labs with $230 million in funding, she has launched Marble, the company's first commercial product.

The thesis is simple: AI has conquered text, then images, then video. The next frontier is spatial intelligence, the ability to perceive, generate, and interact with 3D worlds.

$230M
Funding Raised
4
Pricing Tiers
3D
Native Output

What Marble Does

Marble generates persistent, downloadable 3D environments from multiple input types:

  • Text prompts
  • Single images
  • Videos
  • Panoramas
  • 3D layouts

Unlike real-time world models from competitors like Decart's Oasis or Google's Genie, Marble creates stable worlds with minimal morphing. You generate once, then explore freely without the AI "forgetting" what it created.

The Chisel Editor

🔨

AI-Native 3D Editing

Chisel decouples spatial structure from visual style. Block out your layout first, then apply text-based styling guidance.

This hybrid approach sets Marble apart from text-to-scene models. Instead of hoping the AI understands your spatial intent, you define the geometry explicitly. The AI handles aesthetics, materials, and lighting.

Think of it like sketching a floor plan before asking an interior designer to decorate. The control over spatial relationships remains yours.

Export Formats and Compatibility

Generated worlds export in three formats:

FormatUse Case
Gaussian SplatsReal-time rendering, novel views
MeshesGame engines, CAD integration
VideosContent creation, pre-vis
💡

All Marble worlds are VR-compatible with Vision Pro and Quest 3 headsets out of the box.

Pricing Structure

World Labs offers four tiers:

TierPriceGenerationsKey Features
Free$04/monthText, image, or panorama input
Standard$20/month12/monthMulti-image/video input, advanced editing
Pro$35/month25/monthScene expansion, commercial rights
Max$95/month75/monthAll features, maximum generations

The free tier lets you evaluate the technology. For production work requiring commercial rights, the Pro tier at $35/month represents reasonable entry pricing for a capability this novel.

Why Spatial Intelligence Matters

"Spatial intelligence is the defining challenge of the next decade." - Fei-Fei Li

Li argues that current AI has a fundamental limitation: it reasons poorly about 3D space. Language models hallucinate physics. Video models create impossible geometries. Image generators struggle with consistent spatial relationships.

Current Approaches
Video models generate frame sequences without true 3D understanding. Camera movements reveal inconsistencies. Objects change position or disappear.
Spatial Intelligence
Native 3D representation enables physically consistent worlds. Move the camera freely. The environment persists because it exists as geometry, not pixels.

For robotics, this matters enormously. A robot navigating a kitchen needs spatial understanding, not frame prediction. For VFX, directors need explorable environments, not fixed camera paths.

Use Cases Taking Shape

Gaming Generate ambient environments and background spaces. Indie developers can create exploration areas that would require months of traditional art production.

Visual Effects Pre-visualization becomes interactive. Block out a scene spatially, then explore camera angles before committing to shots.

Architecture Convert floor plans to explorable walkthroughs. Clients experience spaces before construction begins.

Education Li envisions students walking inside a cell, surgeons practicing inside anatomical simulations.

World Expansion and Composer Mode

Two features address scale limitations:

World Expansion lets you extend a generated world once, adding detail to edge regions where quality typically degrades. This pushes the boundaries of explorable space beyond initial generation limits.

Composer Mode combines multiple worlds into larger environments. Generate individual rooms, then stitch them into a complete building.

These tools acknowledge current constraints while providing practical workarounds.

The Competition Landscape

Marble enters a crowded field:

ProductApproachDifferentiator
Decart OasisReal-time game generationInteractive, but worlds shift during exploration
Google GenieGame world generationFrame prediction without true 3D
OdysseyPersistent world modelsEnterprise focus
World Labs MarbleStatic 3D generationDownloadable, editable, VR-ready

The trade-off is clear. Real-time models like Oasis offer immediacy but instability. Marble prioritizes persistence and editability over interactivity.

Connecting to Video Generation

💡

For background on diffusion architectures used in spatial AI, see our technical overview of diffusion transformers.

How does 3D world generation relate to video? They share mathematical foundations in diffusion models, but solve different problems.

Video generation creates temporal sequences, frame after frame. Spatial AI creates geometric representations, surfaces and volumes. Video answers "what happens next?" Spatial AI answers "what exists here?"

The convergence point: navigable video. Generate a 3D world, then render video as you move through it. This approach offers camera control impossible with pure video generation.

Limitations to Consider

Marble is not a complete solution:

  • No animated characters or dynamic elements
  • Generation caps may limit production workflows
  • Edge degradation requires expansion passes
  • Static environments only

For animated content, you still need video generation models. Marble excels at environments and spaces, not actors or actions.

The Bigger Picture

Fei-Fei Li sees spatial intelligence as essential for AI progress:

"I think all of us have a responsibility in ushering AI to a better state as it becomes more powerful. All of us should want humanity to prevail and thrive."

Her vision extends beyond entertainment. Medical simulations where students explore anatomy. Scientific visualizations where researchers navigate molecular structures. Robotic training environments generated on demand.

Marble is step one, a commercial proof of concept. The research continues toward more dynamic, interactive, and physically accurate world generation.

Getting Started

World Labs offers a free tier with 4 generations per month. Enough to evaluate the technology and understand its constraints.

For creators already working in 3D, the mesh export capability integrates with existing pipelines. For video producers, the video export provides pre-visualization capabilities unavailable elsewhere.

💡

Related reading: Our guide to AI video character consistency covers techniques for maintaining coherence across generated content, a challenge Marble addresses through persistent 3D representation.

The transition from 2D generation to 3D world creation represents a fundamental shift in what AI can produce. Marble makes that shift accessible.

Was this article helpful?

Alexis

Alexis

AI Engineer

AI engineer from Lausanne combining research depth with practical innovation. Splits time between model architectures and alpine peaks.

Related Articles

Continue exploring with these related posts

Enjoyed this article?

Discover more insights and stay updated with our latest content.

World Labs Marble: Fei-Fei Li's Vision for Spatial Intelligence