Hacker Newsnew | past | comments | ask | show | jobs | submit | kevin0091's commentslogin

It is a calculation not learning


the human, or the LLM?


The reading list is old about one year, for instance in 2025, one may use KTO for math, RLOO for CoT, DPO for function calling and optimization.

In 2025 one should only focus should be distillation & optimization.

In 2025 CoT is not new, the corrected CoT is the key and all you need.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: