An Investigation Of Model-Free Planning AI Safety Fundamentals: Alignment podcast

circa un anno fa 8:11

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on January 02, 2025 12:05 (21d ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Contenuto fornito da BlueDot Impact. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da BlueDot Impact o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent’s effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

Source:

https://arxiv.org/abs/1901.03559

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.