Hacker Newsnew | past | comments | ask | show | jobs | submit | taspeotis's commentslogin

No the "X" is pronounced "ten" like in "Mac OS X"

Makes sense. I am running MacOS Tahoetl.


Why would you though?

And by the way: Thanks for relentlessly holding new models’ feet to the pelican SVG fire.


Because I want to read about Qwen, not someone's one-off vibe test followed by 1:1 conversations. (case in miniature here: which is the last comment in this thread that says something about Qwen? The root post. Is that fun policing? Yes, apologies.)

There's a bunch of useful information in my comment that's independent of the fact that it drew a pelican:

1. You can run this on a Mac using llama-server and a 17GB downloaded file

2. That version does indeed produce output (for one specific task) that's of a good enough quality to be worth spending more time checking out this model

3. It generated 4,444 tokens in 2min 53s, which is 25.57 tokens/s


Right, that is exactly what I meant by "the root post [had info about Qwen]" - you shouldn't feel I'm being critical of you or asking you to do anything different, at all. I admire you deeply and feel humbled* by interacting with you, so I really want that to be 100% clear, because this is the 2nd time I'm reading that it might be personal.

* er, that probably sounds strange, but I did just spend 6 weeks working on integrating the Willison Trifecta for my app I've been building for 2.5 years, and I considered it a release blocker. It's a simple mental model that is a significant UX accomplishment IMHO.


I like the pelican-bicycle test because it's pretty predictive of how the model does helping me with TikZ. And I hate writing TikZ.

Somewhat ironically - as of when I write this this tangent is dominating the size of this topic.

I understand your reasoning and it's valid, but I think the best you can do is indeed collapse the thread (not sure if any mobile clients do better than that?)

It's perhaps not a serious test, it isn't to me, but on the edges of jokes about pelicans they're usually some useful things people smarter than me say, and additionally if providers are spending some time on making pelicans or svg look better, this benefits all of us.

So, no hard feelings, you're understood (and I'm not trying to be patronising, I'm just awkward with the language), but pelicans are here to stay because it seems that the consensus is they're beneficial and on topic.

All the best!


I think it's to help drive traffic to his blog now that he's accepted sponsors in the header of every page. I do see this pelican thing come up from him on every model post that gets released.

The traffic I get from a comment with a link to a pelican is pretty tiny.

"Create me an SVG to drive MAXIMUM ENGAGEMENT for my sponsors".

Missing an opportunity here, lol.


This is an ad

And very clearly LLM written. It shouldn’t bother me as much as it does, but it does. And I know I do it too.

Matt Levine writes a bit about this - the Elon Musk Mars Conglomerate. And really if you're investing into e.g. SpaceX you're not investing into SpaceX you're investing into the Elon Musk Mars Conglomerate. And most people seem to want that.

Tesla's the odd one out: it's public but it's still in there, although Musk would probably prefer it to be private too.


Tesla is the free cashflow play that is probably the most important for mars as there is no distilled fermented dinosaur juice on mars, but considerably more by ratio of lithium / oil than the Earth. Our flintstone fire mobiles won’t work so well there, and battery / solar will be important there for everything, including mobility and armies of slave robots.

Mars gets less sunlight on a good day for solar power; the inverse cube law really hits you harder than you'd think. And that's before accounting for the planet wide dust storms that can last for months.

We're probably looking at nuclear fission generators to get started, then converting to geothermal at any appreciable (and maybe fusion, inshallah).


Regardless, fission, geo, fusion don’t fit well on a rover. The boring company makes the tunnels, Tesla makes the vehicles and robots, and batteries. Likely we will still use solar despite poor relative performance for bootstrap.

RTGs do. That's what Perseverance and Curiosity use today.

Right, right, all those facts... that's nothing compare to Musk's genius and will! /s

> Elon Musk Mars Conglomerate

That’s SpaceX’s version of Tesla’s self driving car pipe dream

Edit - I use self-driving car and Autopilot interchangeably


It's so pipe-dreamy that I used it for an hour today through SF rush hour traffic. Clearly never going to work though, right? right???

Did you follow Tesla's published instructions on how to use it (https://www.tesla.com/ownersmanual/modely/en_us/GUID-2CB6080...)? You're explicitly forbidden, for example, from assuming that it's going to make the right decision at intersections; you must manually inspect each intersection and evaluate whether it's "safe and/or appropriate" to continue. You're also not allowed to look away from the road or use your phone. YMMV, but to me that level of required attention doesn't match the term "self-driving".

What I see a lot of people do, unfortunately, is reconcile this contradiction by not following the published limitations of the "Full Self-Driving (Supervised)" product. They assume that Elon Musk wouldn't call it that if it couldn't be trusted to do what they expect. Then they get into fatal crashes, and someone sues, and Tesla argues that they can't be held accountable for bad drivers who don't follow the rules.


Your claim was that the product doesn't work, and I'm telling you it works without intervention consistently and in complicated traffic situations.

Any argument about how people don't pay enough attention since it isn't yet certified as a L4 system is irrelevant and tangential to the point.


Your definition of Tesla's self-driving product is very different than what Tesla itself promised, and that's what the person you are replying to...is telling you as well.

Anyone who thinks it is pipe dream given how it works today + rate of change is clueless, and that is putting it kindly.

I don't think L4 autonomy is a pipe dream. Indeed, it exists today and is widely available in the same city you drove your Tesla in. I think it's a pipe dream for Tesla specifically to achieve it, because for bizarre and idiosyncratic reasons Elon Musk won't let them use LiDAR or mount a roof sensor. They've been stuck at L2 for a decade now, and I don't see much reason to think that making that system incrementally more reliable will ever "unlock" L4.

In practice, Tesla on HW4 drives indistinguishably different from Waymo.

It does! A system which drives indistinguishably different from Waymo 99.999% of the time is L2. You might very well never experience that unlucky 1 mile in 100,000, but if there's 1M Teslas on the road driving a daily average of 33 miles, it's going to happen hundreds of times each day. An L4 system must guarantee that it can come safely to a stop before human intervention is required, and I don't think you can achieve that guarantee by pushing the nines on an L2 system.

I've been in Waymos that have needed teleop rescue multiple times in the last year so by that metric it's not a L4 system either.

Isn't Tesla FSD good enough and trending in the right direction to be called a "pipe dream"?


Isn’t that the same one?


I think API is fine, likely only subscription is affected. Not to mention trivial heuristics to differentiate repeated API calls / same data and potential CLI usage although that would be true malice.

It seemed to me that it was performing better through opencode using API but did not test extensively.


If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.

Hi, thanks for Claude Code. I was wondering though if you'd considering adding a mode to make text green and characters come down from the top of the screen individually, like in The Matrix?

No, I want a little monkey doing tricks. /s


I mean if people have judged this important enough to be on the front page of HN ... I guess it's important enough to be on the front page?

But any combination of the Claude models are up or down on any given day: https://status.claude.com/


> don't have the resources to buy 20000$ of tokens to go debug them

$20,000 - how many developers do these hardware companies have that they need to spend that much? Claude Team Premium is US$125/mo for a seat and even cheaper if you buy annually...


$20000 is what the Antropic report says they spent on scanning OpenBSD [1].

[1] "Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings.", https://red.anthropic.com/2026/mythos-preview/


That's for OpenBSD, typical IoT firmware is tiny by comparison: a few init.rc scripts, some cron jobs, a php-cgi web UI, and glue code with hardcoded API keys. The total lines of code are orders of magnitude smaller, so the audit surface and expected cost are too.

Running a "too advanced" harness against a Claude Code subscription gets your organization banned, even if it's a shell wrapper over `claude -p`. You probably can't reproduce this research with a fixed-price subscription.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: