“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans

LessWrong (Curated & Popular)

Contenuto fornito da LessWrong. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da LessWrong o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

Species Unite

1
Elizabeth MeLampy: Forget the Camel 39:33

18 giorni fa39:33

Riproduci in seguito

Liste

Like aggiunto

39:33

"The basic premise of the event is that hunters hunt rattlesnakes from the surrounding environment all across West Texas, and bring them into the roundup for the weekend. And during the roundup, these snakes are kept in a pit and then, one by one, beheaded and skinned in front of in front of audiences." - Elizabeth MeLampy Elizabeth MeLampy is a lawyer dedicated to animal rights and protection, and her passion for this work shines through in her latest book, Forget the Camel, the Madcap World of Animal Festivals and What They Say About Being Human . To research the book, Elizabeth traveled across the country, immersing herself in a wide range of animal festivals — from the Iditarod dog sled race to the rattlesnake roundup in Sweetwater, Texas. Elizabeth examines these festivals as revealing microcosms of our broader relationship with animals. Whether it's rattlesnake hunts, frog-jumping contests, ostrich races, or groundhog celebrations, these events reflect the ways humans use animals to express cultural identity, community pride, and historical traditions. Yet beneath the pageantry and excitement lies a deeper question: Is our fascination with these spectacles worth the toll it takes on the animals involved? With compassion and insight, Elizabeth invites readers to consider whether there’s a more ethical and empathetic way to honor our stories — one that respects both animals and the traditions they inspire. Please listen, share and read, Forget the Camel. It will be released on April 8th, 2025. https://apollopublishers.com/index.php/forget-the-camel/…

circa un anno fa 7:58

MP3•Pagina principale dell'episodio

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon.
Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution).
See Twitter thread and project page at emergent-misalignment.com.
Abstract
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range [...]
---
Outline:
(00:55) Abstract
(02:37) Introduction
The original text contained 2 footnotes which were omitted from this narration.
The original text contained 1 image which was described by AI.
---
First published:
February 25th, 2025
Source:
https://www.lesswrong.com/posts/ifechgnJRtJdduFGC/emergent-misalignment-narrow-finetuning-can-produce-broadly
---
Narrated by TYPE III AUDIO.
---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

475 episodi

#Tech #Society #Philosophy #LessWrong #LessWrong Curated