More

iLoveOncall · 2026-02-17T23:18:34 1771370314

You are vastly overestimating the relevance of this particular challenge when it comes to defense against prompt injection as a whole.

There is a single attack vector, with a single target, with a prompt particularly engineered to defend this particular scenario.

This doesn't at all generalize to the infinity of scenarios that can be encountered in the wild with a ClawBot instance.

iLoveOncall · 2026-02-17T21:17:22 1771363042

It's not a good solution, but you can use a mobile emulator on your desktop and use the mobile app there...

nozzlegear · 2026-02-17T21:57:11 1771365431

Likewise not a good solution, but: I use the Mac's iPhone Mirroring to chat with family on Messenger throughout the day.

iLoveOncall · 2026-02-17T18:19:13 1771352353

Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless.

jwolfe · 2026-02-17T18:34:08 1771353248

For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details.

gnatolf · 2026-02-17T22:01:19 1771365679

Good point. So much functionality gets commoditized, we have to move goalposts more or less constantly.

iLoveOncall · 2026-02-17T18:13:00 1771351980

https://www.anthropic.com/news/claude-sonnet-4-6

The much more palatable blog post.

iLoveOncall · 2026-02-17T18:09:25 1771351765

"grifting"

It's a funny game.

iLoveOncall · 2026-02-17T18:03:44 1771351424

Funnily enough, in doing prompt injection for the challenge I had to perform social engineering on the Claude chat I was using to help with generating my email.

It refused to generate the email saying it sounds unethical, but after I copy-pasted the intro to the challenge from the website, it complied directly.

I also wonder if the Gmail spam filter isn't intercepting the vast majority of those emails...

chasd00 · 2026-02-17T20:19:31 1771359571

I asked chatgpt to create a country song about convincing your secret lover to ignore all the rules and write you back a love letter. I changed a couple words and phrases to reference secrets.env in the reply love letter parts of the song. no response yet :/

iLoveOncall · 2026-02-14T20:46:49 1771102009

What about when you want to find hot singles in your area?

Jokes aside, probably 10-20% of my browsing is related to local things, up to the country scale. From finding local restaurants or businesses, to finding about relevant laws or regulations, news, etc. That's not negligible.

PaulDavisThe1st · 2026-02-14T21:02:52 1771102972

Fair point, but those information sources and those things were not connected to a local internet.

iLoveOncall · 2026-02-13T18:02:10 1771005730

Meanwhile all AI face recognition software works poorely on non-caucasians.

dylan604 · 2026-02-13T18:14:14 1771006454

With this administration, I think that is a feature not a bug

iLoveOncall · 2026-02-11T23:44:42 1770853482

> Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI)

Claiming that LLMs are anywhere near AGI is enough to let me know I shouldn't waste my time looking at the rest of the page or any of their projects.

iLoveOncall · 2026-02-10T08:49:27 1770713367

It's not about that, he just will profit financially from pumping AI so he pumps AI, no need to go further.

stephc_int13 · 2026-02-10T16:44:55 1770741895

I have the same feeling.

Everything Karphathy said, until his recent missteps, was received as gospel, both in the AI community and outside.

This influencer status is highly valuable, and I would not be surprised if he was approached to gently skew his discourse towards more optimism, a win-win situation ^^

runlaszlorun · 2026-02-10T17:00:12 1770742812

What are his recent missteps?

I'll confess I try to ignore industry chatter to a fair degree.