Given the mechanistic interpretability findings? I'm not sure how people still s...

famouswaffles · 2026-03-08T21:54:30 1773006870

People just overstate their understanding and knowledge, the usual human stuff. The same user has a comment in this thread that contains:

'If you actually know what models are doing under the hood to product output that...'

Any one that tells you they know 'what models are dong under the hood' simply has no idea what they're talking about, and it's amazing how common this is.

sulam · 2026-03-08T23:28:23 1773012503

Fair, I should define what I mean by under the hood. By “under the hood” I mean that models are still just being fed a stream of text (or other tokens in the case of video and audio models), being asked to predict the next token, and then doing that again. There is no technique that anyone has discovered that is different than that, at least not that is in production. If you think there is, and people are just keeping it secret, well, you clearly don’t know how these places work. The elaborations that make this more interesting than the original GPT/Attention stuff is 1) there is more than one model in the mix now, even though you may only be told you’re interacting with “GPT 5.4”, 2) there’s a significant amount of fine tuning with RLHF in specific domains that each lab feels is important to be good at because of benchmarks, strategy, or just conviction (DeepMind, we see you). There’s also a lot work being put into speeding up inference, as well as making it cheaper to operate. I probably shouldn’t forget tool use for that matter, since that’s the only reason they can count the r’s in strawberry these days.

None of that changes the concept that a model is just fundamentally very good at predicting what the next element in the stream should be, modulo injected randomness in the form of a temperature. Why does that actually end up looking like intelligence? Well, because we see the model’s ability to be plausibly correct over a wide range of topics and we get excited.

Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

stavros · 2026-03-08T23:50:37 1773013837

Everybody says "but they just predict tokens" as if that's not just "I hope you won't think too much about this" sleight of hand.

Why does predicting the next token mean that they aren't AGI? Please clarify the exact logical steps there, because I make a similar argument that human brains are merely electrical signals propagating, and not real intelligence, but I never really seem to convince people.

conception · 2026-03-09T00:50:26 1773017426

More take an episode like Loops from Radiolab where a person’s memory resets back to a specific set of inputs/state and pretty responds the same way over and over again - very much like predicting the next token. Almost all human interaction is reflexive not thoughtful. Even now as you read this and process it, there’s not a lot of thought - but a whole lot of prediction and pattern matching going on.

ACCount37 · 2026-03-09T01:16:09 1773018969

"Predict next token" describes an interface. That tells you very little of what actually goes on inside the thing.

You can "predict next token" using a human, an LLM, or a Markov chain.

sulam · 2026-03-09T07:24:34 1773041074

Because there are some really fundamental things they cannot do with next token prediction. For instance, their memory is akin to someone who reads the phone book and memorizes the entire thing, but can't tell you what a phone number is for. Moreover, they can mimic semantic knowledge, because they have been trained on that knowledge, but take them out of their training distribution and they get into a "creative story-telling" mode very quickly. They can quote me all the rules of chess, but when it comes to actually making a chess move they break those rules with abandon simply because they didn't actually understand the rules. Chess is instructive in another way, too, in that you can get them to play a pretty solid opening game, maybe 10, 15 moves in, but then they start forgetting pieces, creating board positions that are impossible to reach, etc. They have memorized the forms of a board, know the names of the pieces, but they have no true understanding of what a chess game is. Coding is similar, they're fine when you give them Python or Bash shell scripts to write, they've been heavily trained on those, but ask them to deal with a system that has a non-standard stack and they will go haywire if you let their context get even medium sized. Something else they lack is any kind of learning efficiency as you or I would understand the concept. By this I mean the entire Internet is not sufficient to train today's models, the labs have to synthesize new data for models to train on to get sufficient coverage of a given area they want the model to be knowledgeable about. Continuous learning is a well-known issue as well, they simply don't do it. The labs have created memory, which is just more context engineering, but it's not the same as updating as you interact with them. I could go on.

At the end of the day next token prediction is a sleight of hand. It produces amazingly powerful affects, I agree. You can turn this one magic trick into the illusion of reasoning, but what it's doing is more of a "one thing after another" style story-telling that is fine for a lot of things, but doesn't get to the heart of what intelligence means. If you want to call them intelligent because they can do this stuff, fine, but it's an alien kind of intelligence that is incredibly limited. A dog or a cat actually demonstrate more ability to learn, to contextualize, and to make meaning.

pu_pe · 2026-03-09T09:10:20 1773047420

You didn't actually give an example of what the issue with next token prediction is. You just mentioned current constraints (ie generalization and learning are difficult, needs mountains of data to train, can't play chess very well) that are not fundamental problems. You can trivially train a transformer to play chess above the level any human can play at, and they would still be doing "next token prediction". I wouldn't be surprised if every single thing you list as a challenge is solved in a few years, either through improvement at a basic level (ie better architectures) or harnessing.

We don't know how human brains produce intelligence. At a fundamental level, they might also be doing next token prediction or something similarly "dumb". Just because we know the basic mechanism of how LLMs work doesn't mean we can explain how they work and what they do, in a similar way that we might know everything we need to know about neurons and we still cannot fully grasp sentience.

sulam · 2026-03-09T15:30:13 1773070213

I use the chess example because it’s especially instructive. It would NOT be trivial to train an LLM to play chess, next token prediction breaks down when you have so many positions to remember and you can’t adequately assign value to intermediate positions. Chess bots work by being trained on how to assign value to a position, something fundamentally different than what an LLM is doing.

A simpler example — without tool use, the standard BPE tokenization method made it impossible for state of the art LLMs to tell you how many ‘r’s are in strawberry. This is because they are thinking in tokens, not letters and not words. Can you think of anything in our intelligence where the way we encode experience makes it impossible for us to reason about it? The closest thing I can come to is how some cultures/languages have different ways of describing color and as a result cannot distinguish between colors that we think are quite distinct. And yet I can explain that, think about it, etc. We can reason abstractly and we don’t have to resort to a literal deus ex machina to do so.

Not being able to explain our brain to you doesn’t mean I can’t notice things that LLMs can’t do, and that we can, and draw some conclusions.

pu_pe · 2026-03-09T17:46:04 1773078364

There are chess engines based on transformers, even DeepMind released one [1]. It achieved ~2900 Elo. It does have peculiarities for example in the endgame that are likely derived from its architecture, though I think it definitely qualifies as an example of the fact that simply because something is a next token predictor doesn't mean it cannot perform tasks that require intelligence and planning.

The r in strawberry is more of a fundamental limitation of our tokenization procedures, not the transformer architecture. We could easily train a LLM with byte-size tokens that would nail those problems. It can also be easily fixed with harnessing (ie for this class of problems, write a script rather than solve it yourself). I mean, we do this all the time ourselves, even mathematicians and physicists will run to a calculator for all kinds of problems they could in principle solve in their heads.

[1] https://arxiv.org/abs/2402.04494

Otterly99 · 2026-03-09T13:14:26 1773062066

But chess models aren't trained the same way LLMs are trained. If I am not mistaken, they are trained directly from chess moves using pure reinforcement learning, and it's definitely not trivial as for instance AlphaZero took 64 TPUs to train.

ACCount37 · 2026-03-09T14:27:23 1773066443

You can train them in a very similar way.

Modern LLMs often start at "imitation learning" pre-training on web-scale data and continue with RLVR for specific verifiable tasks like coding. You can pre-train a chess engine transformer on human or engine chess parties, "imitation learning" mode, and then add RL against other engines or as self-play - to anneal the deficiencies and improve performance.

This was used for a few different game engines in practice. Probably not worth it for chess unless you explicitly want humanlike moves, but games with wider state and things like incomplete information benefit from the early "imitation learning" regime getting them into the envelope fast.

pu_pe · 2026-03-09T17:47:58 1773078478

I meant trivial in the sense it's a solved problem, I'm sure it still costs a non-negligible amount of money to train it. See for example the chess transformer built by DeepMind a couple of years ago which I referred to in a sibling comment [1].

[1] https://arxiv.org/abs/2402.04494

Otterly99 · 2026-03-12T15:04:44 1773327884

Thank you for the link.

I admit, my knowledge of reinforcement learning is a bit outdated so it seemed to me that it was unattainable for a non-specialized model to train efficiently on something like chess, which has a huge state space.

stavros · 2026-03-09T08:32:02 1773045122

None of this is a logical certainty of "X, therefore Y", it's just opinions. You can trivially add memory to a model by continuing to train it, we just don't do it because it's expensive, not because it can't be done.

Also, the phone book example is off the mark, because if I take a human who's never seen a phone and ask them to memorise the phone book, they would (or not), while not knowing what a phone number was for. Did you expect that a human would just come up on knowledge about phones entirely on their own, from nothing?

bob1029 · 2026-03-09T09:02:41 1773046961

Next token prediction is about predicting the future by minimizing the number of bits required to encode the past. It is fundamentally causal and has a discrete time domain. You can't predict token N+2 without having first predicted token N+1. The human brain has the same operational principles.

famouswaffles · 2026-03-09T00:58:39 1773017919

Next-token prediction is just the training objective. I could describe your reply to me as “next-word prediction” too, since the words necessarily come out one after another. But that framing is trivial. It tells you what the system is being optimized to do, not how it actually does it.

Model training can be summed up as 'This what you have to do (objective), figure it out. Well here's a little skeleton that might help you out (architecture)'.

We spend millions of dollars and months training these frontier models precisely because the training process figures out numerous things we don't know or understand. Every day, Large Language Models, in service of their reply, in service of 'predicting the next token', perform sophisticated internal procedures far more complex than anything any human has come up with or possesses knowledge of. So for someone to say that they 'know how the models work under the hood', well it's all very silly.

heavyset_go · 2026-03-09T00:08:15 1773014895

> Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

It's sad that you have to add this postscript lest you be accused of being ignorant or anti-AI because you acknowledge that LLMs are not AGI.

torginus · 2026-03-09T00:03:42 1773014622

If you typed your comment by reading all the others' in the chain, then you responded by typing your response in one go, then you 'just' did next-token prediction based on textual input.

I would still argue that does not prevent you from having intelligence, so that's why this argument is silly.

sulam · 2026-03-08T21:29:43 1773005383

They have a _text_ model. There is some correlation between the text model and the world, but it’s loose and only because there’s a lot of text about the world. And of course robotics researchers are having to build world models, but these are far from general. If they had a real world model, I could tell them I want to play a game of chess and they would be able to remember where the pieces are from move to move.

ACCount37 · 2026-03-08T22:16:33 1773008193

What makes you think that text is inherently a worse reflection of the world than light is?

All world models are lossy as fuck, by the way. I could give you a list of chess moves and force you to recover the complete board state from it, and you wouldn't fare that much better than an off the shelf LLM would. An LLM trained for it would kick ass though.

RugnirViking · 2026-03-09T13:13:30 1773062010

> I could give you a list of chess moves and force you to recover the complete board state from it, and you wouldn't fare that much better than an off the shelf LLM would

idk, I would expect anyone with an understanding of the rules of chess, and an understanding of whatever notation the moves are in, would be able to do it reasonably well? does that really sound so hard to you? people used to play correspondance chess. Heck I remember people doing it over email.

In comparison, current ai models start to completely lose the plot after 15 or so moves, pulling out third, fourth and fifth bishops, rooks etc from thin air, claiming checkmate erroneously etc, to the point its not possible to play a game with them in a coherent manner.

ACCount37 · 2026-03-09T14:27:47 1773066467

I would expect that off the shelf GPT-5.4 would be able to do it when prompted carefully, yes. Through reasoning - by playing every move step by step and updating the board one move at a time to arrive at a final board state.

On the other hand, recovering the full board state in a single forward pass? That takes some special training.

Same goes for meatbag chess. A correspondence chess aficionado might be able to take a glance at a list of moves and see the entire game unfold in his mind's eye. A casual player who only knows how to play chess at 600 ELO on a board that's in front of him would have to retrace every move carefully, and might make errors while at it.

sulam · 2026-03-10T01:28:55 1773106135

Try to play a simple over the board style game with 5.4 with whatever notation you chose to use (or just descriptions, literally anything). Prediction: it will start out fine, but the mid game will be very hard to keep it on track, and the endgame will make you give up.

teeray · 2026-03-08T23:36:50 1773013010

> What makes you think that text is inherently a worse reflection of the world than light is?

What does the color green look like?

joquarky · 2026-03-09T01:55:02 1773021302

A color without form can't look like anything.

ACCount37 · 2026-03-09T00:37:45 1773016665

It doesn't look like anything to me.

abcde666777 · 2026-03-08T22:53:20 1773010400

"What makes you think that text is inherently a worse reflection of the world than light is?"

Come on man, did you think before you asked that one :)?

10xDev · 2026-03-08T22:06:32 1773007592

People are finding it hard to grasp emergent properties can appear at very large scales and dimensions.