1,752 subscribers
Vai offline con l'app Player FM !
Podcast che vale la pena ascoltare
SPONSORIZZATO
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
Manage episode 470658955 series 2355587
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design.
The complete show notes for this episode can be found at https://twimlai.com/go/722.
746 episodi
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 470658955 series 2355587
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design.
The complete show notes for this episode can be found at https://twimlai.com/go/722.
746 episodi
همه قسمت ها
×

1 Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06




1 Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 1:09:07






1 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 42:11


1 Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721 49:29


1 Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720 1:07:05




1 AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718 1:44:59


1 Speculative Decoding and Efficient LLM Inference with Chris Lott - #717 1:16:30








1 Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713 1:08:49
Benvenuto su Player FM!
Player FM ricerca sul web podcast di alta qualità che tu possa goderti adesso. È la migliore app di podcast e funziona su Android, iPhone e web. Registrati per sincronizzare le iscrizioni su tutti i tuoi dispositivi.