Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When it comes to the evals for this kind of thing, is there a standard set of test data out there that one can work with to benchmark against? ie a collection of documents with questions that should result in particular documents or chunks being cited as the most relevant match.


Yes check out haiku-rag benchmarks and evaluations




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: