Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wouldn't late edition books with only corrected text be better, proofread, edited, proofread, edited, ... Google have millions of them they've assumed copyright of. Surely there's enough text there. Do they really just use random website text?? Nearly every news story I read has errors and they have style guides, trained writers, editors, etc..

Do publishers sell their published text as a mass for use in AI/ML? Like 1000 books, no images or frontispiece, etc., possibly jumbled by sentence/para/page.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: