Fei-Fei Li is a Stanford professor, co-director of Stanford Institute for Human-Centered Artificial Intelligence, and co-founder of World Labs. She created ImageNet, the dataset that sparked the deep learning revolution.
Justin Johnson is her former PhD student, ex-professor at Michigan, ex-Meta researcher, and now co-founder of World Labs.
Together, they just launched Marble—the first model that generates explorable 3D worlds from text or images.
In this episode Fei-Fei and Justin explore why spatial intelligence is fundamentally different from language, what's missing from current world models (hint: physics), and the architectural insight that transformers are actually set models, not sequence models.