Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can't really take out the censorship. You can strengthen pathways which work around the damage, but the damage is still there.


If the model doesn't refuse to produce output, it's not censored anymore for any practical purpose. It doesn't really matter if there are "censorship neurons" inside that are routed around.

Sure, it would be nice if we didn't have to do that so that the model could actually spent its full capacity on something useful. But that's a different issue even if the root cause is the same.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: