That’s true yeah. The model can do that because calculating latents is independe...

		valine 11 months ago \| parent \| context \| favorite \| on: Beyond Semantics: Unreasonable Effectiveness of Re... That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.