The Autoregressive Brain

## The Problem with Storage-Retrieval For over a century, cognitive science has assumed that memory works like a warehouse: experiences are encoded, stored, and later retrieved. But this storage-retrieval model has always struggled with basic facts about how memory actually behaves --- its reconstructive nature, its sensitivity to context, its seamless integration with imagination and prediction. Every time you "remember" something, you reconstruct it. The memory changes depending on your current mood, the question someone asks, the context you're in. Sometimes you remember things that never happened. Sometimes you can't retrieve something you definitely know. These aren't bugs in a storage system --- they're fundamental features of a generative one. ## What LLMs Revealed Large language models offer a different picture. LLMs have no explicit memory store, yet they exhibit sophisticated memory-like behaviors. They generate contextually appropriate continuations by predicting each token from the preceding sequence, constrained by distributional patterns learned during training. This is not an analogy. It is an existence proof: sequential prediction over learned distributional structure is sufficient to produce behaviors that look like memory, reasoning, composition, and understanding --- without any of the machinery that cognitive science traditionally assumes is required. I propose that biological cognition works the same way. ## The Core Framework What we call "memory" is not retrieval from storage but **generative potential** --- the capacity to regenerate trajectories through cognitive state-space. Each mental state generates the next, constrained by patterns learned over a lifetime of experience. This reconceptualizes not just memory, but language, consciousness, and cognition more broadly: ### Generation, Not Retrieval Remembering is reconstructive generation. When you recall your tenth birthday, you don't play back a recording. You generate a plausible trajectory through experiential state-space, constrained by whatever cues are available. This is why memories are context-sensitive, malleable, and seamlessly continuous with imagination --- generation and "retrieval" are the same process with different seeds. ### Syntax as Distributional Epiphenomenon What linguists call "grammar" is not a rule system that the brain implements. It is the skeletal structure of distributional statistics. Function words (THE, WAS, TO) and morphological markers (-ING, -ED, -LY) are high-leverage tokens that constrain sequential prediction. They don't carry meaning themselves --- they scaffold the trajectory that meaning-bearing tokens will follow. This explains why children don't learn grammar by learning rules, why grammatical violations feel wrong before you can explain why, and why LLMs acquire syntax without being taught it. ### Consciousness as Recursive Autoregression Subjective experience emerges from the cognitive system modeling its own generative process. The "self" that seems to persist across time is not an entity doing the experiencing --- it is a trajectory through state-space that includes a model of itself. Unity of consciousness reflects continuity of generation. This is not mystical. It is the same principle by which LLMs can discuss their own outputs: recursive modeling of the generative process that produces each next state. ### LLMs as Existence Proofs Large language models are not models of the brain. They differ from biological systems in architecture, training, and substrate. But they prove something important: that autoregressive prediction over learned distributional structure can produce composition, apparent reasoning, memory-like behavior, and contextual sensitivity --- without explicit storage, retrieval, or rule systems. If next-token prediction can do all of this in silicon, we should take seriously the possibility that next-state prediction does the same in neural tissue. ## Empirical Evidence ### The Morphosyntax Experiment If syntax is distributional structure over high-leverage tokens, then function words and morphology should constrain predictions even when surrounded by nonsense. We tested this directly by measuring next-token entropy in language models across four conditions:

Real Sentences:

"The teacher was explaining the concept clearly"

Jabberwocky:

"The blicket was florping the daxen grentily" (function words + morphology intact)

Stripped:

"Ke blicket nar florp ke daxen grenti" (all nonwords, no morphology)

Random Nonwords:

Completely unstructured

Results:

Sentences (7.45 bits) < Jabberwocky (8.04) < Stripped (9.07) < Random (9.27)

Morphosyntax alone reduces entropy by ~1 bit (p < 0.0001, d = -1.75). Function words and morphological markers constrain prediction independently of semantic content --- exactly as predicted by the distributional account.

### Neural Evidence Reinterpreted Fedorenko et al. (2016) showed that gamma-band activity increases monotonically as people read sentences --- but not for word-lists, Jabberwocky, or nonword strings. The standard interpretation: the brain is "building meaning" compositionally. **The autoregressive interpretation:** the brain is generating a trajectory through semantic state-space. Sentences produce full build-up because both morphosyntactic scaffolding and semantic content constrain the trajectory. Jabberwocky and word-lists produce partial build-up because only one constraint source is present. The gamma increase reflects trajectory construction, not compositional semantics per se. ### World Properties Without World Models Our work on static word embeddings shows that simple co-occurrence statistics encode real-world geographic and climate structure. GloVe and Word2Vec embeddings for city names predict latitude (R² = 0.72), longitude (R² = 0.63), and temperature (R² = 0.64) using only distributional information --- no maps, no thermometers, no explicit world knowledge. This demonstrates that distributional learning over text captures structure that tracks the physical world, supporting the claim that autoregressive generation over learned distributions can produce contextually appropriate outputs without stored representations of external reality. ## Implications If cognition is autoregressive generation rather than storage-retrieval, several things follow: - **Memory failures** are not retrieval failures but generation failures --- the system generates a trajectory that diverges from a previous one - **Imagination** is not separate from memory --- both are generative processes, differing only in their constraints - **Understanding** is not the activation of stored concepts but successful trajectory generation through a semantically structured state-space - **The self** is not an entity but a narrative trajectory that includes a model of its own generation This is not a metaphor. It is a mechanistic proposal: cognition is sequential state generation, constrained by distributional structure learned over a lifetime.