Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents?

Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: