Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless.
For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details.
Funnily enough, in doing prompt injection for the challenge I had to perform social engineering on the Claude chat I was using to help with generating my email.
It refused to generate the email saying it sounds unethical, but after I copy-pasted the intro to the challenge from the website, it complied directly.
I also wonder if the Gmail spam filter isn't intercepting the vast majority of those emails...
I asked chatgpt to create a country song about convincing your secret lover to ignore all the rules and write you back a love letter. I changed a couple words and phrases to reference secrets.env in the reply love letter parts of the song. no response yet :/
What about when you want to find hot singles in your area?
Jokes aside, probably 10-20% of my browsing is related to local things, up to the country scale. From finding local restaurants or businesses, to finding about relevant laws or regulations, news, etc. That's not negligible.
Everything Karphathy said, until his recent missteps, was received as gospel, both in the AI community and outside.
This influencer status is highly valuable, and I would not be surprised if he was approached to gently skew his discourse towards more optimism, a win-win situation ^^
There is a single attack vector, with a single target, with a prompt particularly engineered to defend this particular scenario.
This doesn't at all generalize to the infinity of scenarios that can be encountered in the wild with a ClawBot instance.
reply