Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think remembering sources is an intrinsically difficult problem.

I think it's hard for language models, and indeed, after further thought, I'm pretty sure they shouldn't remember sources: https://www.jerf.org/iri/post/2023/understanding_gpt_better/

But the problem in general is probably not a particular stopper.

A bigger problem is people thinking that we should be taking language models and using them as AIs themselves, rather than seeing them as a component of an AI, which the more I think about, the more obvious it is that this is the case. Language models aren't what we want. They're a necessary step to what we do want and I don't expect them to go anywhere. But history will look back and laugh at this misguided attempt to make them be the entire AI.

Also, if you are implying you can just spew out the exact sources you learned some fact from... no you can't. You think so only because you haven't tried. Ask someone around you for five random "facts" for you to cite your source from, then try to write on a sheet of paper where you heard this fact from for the first time. Then hand the sheet of paper back to your friend and have them verify your claims... so, note, immediately you have the problem that an unverifiable claim is a fail, so "in elementary school" doesn't cut it any more than you can cite "at the library" as a source on your high school essay. I'm not looking for words that "identify" a several year span of time. You need to give me book, page, and line, or timestamps on a video, or something like that. No, you can't. Nobody can just spit out a bibliographic citation from memory for when they learned that Pluto isn't a planet. I can give you "in the news" for that, sure, but what news source was first? What is the title of that article? Who wrote it? What was the exact date? Of course not.



Steve Hsu has recently claimed[1] to have started a startup in stealth to solve the hallucination problem over a corpus as large as 10,000 pages of dense, college-level textbooks, to the point where it can answer the end-of-chapter questions in the textbook with almost 100% accuracy (albeit not math questions) - I hope without using the answer key. Not sure if their approach is more robust than Supabase[2] or similar approaches, and no indication of whether it might scale up to something on the order of a search engine corpus, but it's something.

[1] https://www.youtube.com/watch?v=peHkL_MaxTU&t=1558s [2] https://news.ycombinator.com/item?id=34695306




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: