More

taspeotis · 2026-04-24T00:34:36 1776990876

No the "X" is pronounced "ten" like in "Mac OS X"

hirvi74 · 2026-04-24T01:39:04 1776994744

Makes sense. I am running MacOS Tahoetl.

taspeotis · 2026-04-24T00:25:33 1776990333

https://theonion.com/new-device-desirable-old-device-undesir...

taspeotis · 2026-04-22T17:24:26 1776878666

Why would you though?

And by the way: Thanks for relentlessly holding new models’ feet to the pelican SVG fire.

refulgentis · 2026-04-22T17:34:12 1776879252

Because I want to read about Qwen, not someone's one-off vibe test followed by 1:1 conversations. (case in miniature here: which is the last comment in this thread that says something about Qwen? The root post. Is that fun policing? Yes, apologies.)

simonw · 2026-04-22T18:02:30 1776880950

There's a bunch of useful information in my comment that's independent of the fact that it drew a pelican:

1. You can run this on a Mac using llama-server and a 17GB downloaded file

2. That version does indeed produce output (for one specific task) that's of a good enough quality to be worth spending more time checking out this model

3. It generated 4,444 tokens in 2min 53s, which is 25.57 tokens/s

refulgentis · 2026-04-22T18:16:49 1776881809

Right, that is exactly what I meant by "the root post [had info about Qwen]" - you shouldn't feel I'm being critical of you or asking you to do anything different, at all. I admire you deeply and feel humbled* by interacting with you, so I really want that to be 100% clear, because this is the 2nd time I'm reading that it might be personal.

* er, that probably sounds strange, but I did just spend 6 weeks working on integrating the Willison Trifecta for my app I've been building for 2.5 years, and I considered it a release blocker. It's a simple mental model that is a significant UX accomplishment IMHO.

mlyle · 2026-04-22T19:32:55 1776886375

I like the pelican-bicycle test because it's pretty predictive of how the model does helping me with TikZ. And I hate writing TikZ.

interstice · 2026-04-22T19:31:25 1776886285

Somewhat ironically - as of when I write this this tangent is dominating the size of this topic.

subscribed · 2026-04-23T08:47:39 1776934059

I understand your reasoning and it's valid, but I think the best you can do is indeed collapse the thread (not sure if any mobile clients do better than that?)

It's perhaps not a serious test, it isn't to me, but on the edges of jokes about pelicans they're usually some useful things people smarter than me say, and additionally if providers are spending some time on making pelicans or svg look better, this benefits all of us.

So, no hard feelings, you're understood (and I'm not trying to be patronising, I'm just awkward with the language), but pelicans are here to stay because it seems that the consensus is they're beneficial and on topic.

All the best!

rob · 2026-04-22T17:56:54 1776880614

I think it's to help drive traffic to his blog now that he's accepted sponsors in the header of every page. I do see this pelican thing come up from him on every model post that gets released.

simonw · 2026-04-22T18:12:32 1776881552

The traffic I get from a comment with a link to a pelican is pretty tiny.

ai_critic · 2026-04-22T18:56:17 1776884177

"Create me an SVG to drive MAXIMUM ENGAGEMENT for my sponsors".

Missing an opportunity here, lol.

taspeotis · 2026-04-22T13:13:28 1776863608

This is an ad

derwiki · 2026-04-22T13:43:18 1776865398

And very clearly LLM written. It shouldn’t bother me as much as it does, but it does. And I know I do it too.

taspeotis · 2026-04-22T00:26:53 1776817613

Matt Levine writes a bit about this - the Elon Musk Mars Conglomerate. And really if you're investing into e.g. SpaceX you're not investing into SpaceX you're investing into the Elon Musk Mars Conglomerate. And most people seem to want that.

Tesla's the odd one out: it's public but it's still in there, although Musk would probably prefer it to be private too.

fnordpiglet · 2026-04-22T00:40:21 1776818421

Tesla is the free cashflow play that is probably the most important for mars as there is no distilled fermented dinosaur juice on mars, but considerably more by ratio of lithium / oil than the Earth. Our flintstone fire mobiles won’t work so well there, and battery / solar will be important there for everything, including mobility and armies of slave robots.

monocasa · 2026-04-22T02:10:04 1776823804

Mars gets less sunlight on a good day for solar power; the inverse cube law really hits you harder than you'd think. And that's before accounting for the planet wide dust storms that can last for months.

We're probably looking at nuclear fission generators to get started, then converting to geothermal at any appreciable (and maybe fusion, inshallah).

fnordpiglet · 2026-04-22T05:22:52 1776835372

Regardless, fission, geo, fusion don’t fit well on a rover. The boring company makes the tunnels, Tesla makes the vehicles and robots, and batteries. Likely we will still use solar despite poor relative performance for bootstrap.

monocasa · 2026-04-22T16:31:56 1776875516

RTGs do. That's what Perseverance and Curiosity use today.

utopiah · 2026-04-22T08:51:10 1776847870

Right, right, all those facts... that's nothing compare to Musk's genius and will! /s

mandeepj · 2026-04-22T01:07:22 1776820042

> Elon Musk Mars Conglomerate

That’s SpaceX’s version of Tesla’s self driving car pipe dream

Edit - I use self-driving car and Autopilot interchangeably

jamiequint · 2026-04-22T02:06:04 1776823564

It's so pipe-dreamy that I used it for an hour today through SF rush hour traffic. Clearly never going to work though, right? right???

SpicyLemonZest · 2026-04-22T02:30:35 1776825035

Did you follow Tesla's published instructions on how to use it (https://www.tesla.com/ownersmanual/modely/en_us/GUID-2CB6080...)? You're explicitly forbidden, for example, from assuming that it's going to make the right decision at intersections; you must manually inspect each intersection and evaluate whether it's "safe and/or appropriate" to continue. You're also not allowed to look away from the road or use your phone. YMMV, but to me that level of required attention doesn't match the term "self-driving".

What I see a lot of people do, unfortunately, is reconcile this contradiction by not following the published limitations of the "Full Self-Driving (Supervised)" product. They assume that Elon Musk wouldn't call it that if it couldn't be trusted to do what they expect. Then they get into fatal crashes, and someone sues, and Tesla argues that they can't be held accountable for bad drivers who don't follow the rules.

jamiequint · 2026-04-22T02:46:55 1776826015

Your claim was that the product doesn't work, and I'm telling you it works without intervention consistently and in complicated traffic situations.

Any argument about how people don't pay enough attention since it isn't yet certified as a L4 system is irrelevant and tangential to the point.

mandeepj · 2026-04-22T03:51:27 1776829887

Your definition of Tesla's self-driving product is very different than what Tesla itself promised, and that's what the person you are replying to...is telling you as well.

jamiequint · 2026-04-22T04:11:57 1776831117

Anyone who thinks it is pipe dream given how it works today + rate of change is clueless, and that is putting it kindly.

SpicyLemonZest · 2026-04-22T04:24:23 1776831863

I don't think L4 autonomy is a pipe dream. Indeed, it exists today and is widely available in the same city you drove your Tesla in. I think it's a pipe dream for Tesla specifically to achieve it, because for bizarre and idiosyncratic reasons Elon Musk won't let them use LiDAR or mount a roof sensor. They've been stuck at L2 for a decade now, and I don't see much reason to think that making that system incrementally more reliable will ever "unlock" L4.

jamiequint · 2026-04-22T04:45:28 1776833128

In practice, Tesla on HW4 drives indistinguishably different from Waymo.

SpicyLemonZest · 2026-04-22T05:10:07 1776834607

It does! A system which drives indistinguishably different from Waymo 99.999% of the time is L2. You might very well never experience that unlucky 1 mile in 100,000, but if there's 1M Teslas on the road driving a daily average of 33 miles, it's going to happen hundreds of times each day. An L4 system must guarantee that it can come safely to a stop before human intervention is required, and I don't think you can achieve that guarantee by pushing the nines on an L2 system.

jamiequint · 2026-04-22T05:32:26 1776835946

I've been in Waymos that have needed teleop rescue multiple times in the last year so by that metric it's not a L4 system either.

ignoramous · 2026-04-22T01:10:51 1776820251

Isn't Tesla FSD good enough and trending in the right direction to be called a "pipe dream"?

taspeotis · 2026-04-17T07:14:50 1776410090

It’s actually the second one https://news.ycombinator.com/item?id=47687248

vie00001 · 2026-04-17T07:18:04 1776410284

Isn’t that the same one?

taspeotis · 2026-04-14T22:57:21 1776207441

https://marginlab.ai/trackers/claude-code/

comboy · 2026-04-15T11:14:17 1776251657

I think API is fine, likely only subscription is affected. Not to mention trivial heuristics to differentiate repeated API calls / same data and potential CLI usage although that would be true malice.

It seemed to me that it was performing better through opencode using API but did not test extensively.

chillacy · 2026-04-15T02:08:09 1776218889

If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.

taspeotis · 2026-04-13T02:43:19 1776048199

Hi, thanks for Claude Code. I was wondering though if you'd considering adding a mode to make text green and characters come down from the top of the screen individually, like in The Matrix?

visarga · 2026-04-13T12:23:10 1776082990

No, I want a little monkey doing tricks. /s

taspeotis · 2026-04-13T13:00:45 1776085245

This guy? https://en.wikipedia.org/wiki/BonziBuddy

taspeotis · 2026-04-08T07:00:09 1775631609

I mean if people have judged this important enough to be on the front page of HN ... I guess it's important enough to be on the front page?

But any combination of the Claude models are up or down on any given day: https://status.claude.com/

taspeotis · 2026-04-08T05:31:50 1775626310

> don't have the resources to buy 20000$ of tokens to go debug them

$20,000 - how many developers do these hardware companies have that they need to spend that much? Claude Team Premium is US$125/mo for a seat and even cheaper if you buy annually...

stratos123 · 2026-04-08T11:53:34 1775649214

$20000 is what the Antropic report says they spent on scanning OpenBSD [1].

[1] "Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings.", https://red.anthropic.com/2026/mythos-preview/

taspeotis · 2026-04-10T06:48:26 1775803706

That's for OpenBSD, typical IoT firmware is tiny by comparison: a few init.rc scripts, some cron jobs, a php-cgi web UI, and glue code with hardcoded API keys. The total lines of code are orders of magnitude smaller, so the audit surface and expected cost are too.

yencabulator · 2026-04-08T20:38:12 1775680692

Running a "too advanced" harness against a Claude Code subscription gets your organization banned, even if it's a shell wrapper over `claude -p`. You probably can't reproduce this research with a fixed-price subscription.