LW - Transformers Represent Belief State Geometry in their Residual Stream by Adam Shai

The Nonlinear Library: LessWrong

Contenuto fornito da The Nonlinear Fund. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da The Nonlinear Fund o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

26d ago 20:34

MP3•Pagina principale dell'episodio

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Transformers Represent Belief State Geometry in their Residual Stream, published by Adam Shai on April 16, 2024 on LessWrong. Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup. Introduction What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because We have a formalism that relates training data to internal structures in LLMs. Conceptually, our results mean that LLMs synchronize to their internal world model as they move through the context window. The computation associated with synchronization can be formalized with a framework called Computational Mechanics. In the parlance of Computational Mechanics, we say that LLMs represent the Mixed-State Presentation of the data generating process. The structure of synchronization is, in general, richer than the world model itself. In this sense, LLMs learn more than a world model. We have increased hope that Computational Mechanics can be leveraged for interpretability and AI Safety more generally. There's just something inherently cool about making a non-trivial prediction - in this case that the transformer will represent a specific fractal structure - and then verifying that the prediction is true. Concretely, we are able to use Computational Mechanics to make an a priori and specific theoretical prediction about the geometry of residual stream activations (below on the left), and then show that this prediction holds true empirically (below on the right). Theoretical Framework In this post we will operationalize training data as being generated by a Hidden Markov Model (HMM)[2]. An HMM has a set of hidden states and transitions between them. The transitions are labeled with a probability and a token that it emits. Here are some example HMMs and data they generate. Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM. Through the discussion of the theoretical framework, let's assume a simple HMM with the following structure, which we will call the Z1R process[3] (for "zero one random"). The Z1R process has 3 hidden states, S0,S1, and SR. Arrows of the form Sxa:p%Sy denote P(Sy,a|Sx)=p%, that the probability of moving to state Sy and emitting the token a, given that the process is in state Sx, is p%. In this way, taking transitions between the states stochastically generates binary strings of the form ...01R01R... where R is a random 50/50 sample from { 0, 1}. The HMM structure is not directly given by the data it produces. Think of the difference between the list of strings this HMM emits (along with their probabilities) and the hidden structure itself[4]. Since the transformer only has access to the strings of emissions from this HMM, and not any information about the hidden states directly, if the transformer learns anything to do with the hidden structure, then it has to do the work of inferring it from the training data. What we will show is that when they predict the next token well, transformers are doing even more computational work than inferring the hidden data generating process! Do Transformers Learn a Model of the World...

1659 episodi

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech