Reward Models | Data Brew | Episode 40

Data Brew by Databricks

Contenuto fornito da Databricks. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da Databricks o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

9M ago 39:58

MP3•Pagina principale dell'episodio

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 30, 2025 14:27 (2M ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

44 episodi

#Databricks #Data Analytics #Apache Spark #Delta Lake #Machine Learning #Data Engineering #Artificial Intelligence #Tech #Data Science #Science #Lifestyle #Podcasting Education