Once one starts thinking of them as "concept models" rather than language models...

Once one starts thinking of them as "concept models" rather than language models or fact models, "hallucinations" become something not to be so fixated on. We transform tokens into 12k+ length embeddings... right at the start. They stop being language immediately.

They aren't fact machines. They are concept machines.