Hacker Newsnew | past | comments | ask | show | jobs | submit | gslepak's commentslogin

Would be cool if this were an open model.

Awesome! Now how do we use this? I tried selecting text and seeing if there was a "Speak" menu item but there doesn't seem to be.

Comaps voice navigation maybe ?

OsmAnd~ had voice navigation for me today, whereas it never has before. It could be something else I did tho.

It's poor form to publish exploits like this but Microsoft not paying their bounty is also poor form, and so is attempting to exploit the legal system to defend Microsoft's "right" to write buggy code.

On page 102 of the system card [1] I'm pleased to see evaluation against "creative mastery".

In our work we asked several frontier AIs to come up with an API we needed. We compared Opus 4.7 and GPT-5.5 (among others). Opus 4.7 came up with the most creative and intelligent API design that pleasantly surprised us, especially given that GPT-5.5 was passing it on various coding benchmarks.

What I noticed is that we don't have a commons benchmark to measure "creativity" and "ingenuity", and in some ways such a benchmark would conflict with the common IFBench benchmark. Yet this is a very important skill when designing systems. I'm glad to see Anthropic putting thought into it, and would love to see a public benchmark for this that other models could compare themselves to.

[1] https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0...


Agreed, my vibes tell me 4.6 is a better coder than 4.7. 4.7 is a much better strategic thinker and maintains overall "better architecture" than 5.5. 5.5 is way better than either at coding, but more expensive. So I have 4.7 do the planning/architecture, 4.6 does the coding, then 5.5 critiques and fixes it.

This is my exact vibesperience

Agreed, these are my vibes too. It feels much better to do planning and strategy and architecture etc. with Opus 4.7 than GPT-5.5. GPT just feels like a robot that gets instructions and does exactly that. Opus feels like an almost human that sometimes has actually good ideas and pushes back on bad ideas.

So for now its planning/architecture/strategy -> Opus. Pure coding -> GPT.

Helps with agentic coding that GPT is much roomier with the tokens you get.


> Do not trust analysis written in the issue. Independently verify behavior and derive your own analysis from the code and execution path.

Human is asking the machine to do what the human themselves refuses to do, while calling it a clanker. Why should it?

/ducks


The only reason you need to duck there is because it's such an obvious, shallow, unconstructive take on a fairly well written article.

I couldn't even finish reading the article due to the intense negativity the use of that word evokes in me.

"that word"?


Clanker? Are you afraid to even say it or something? It's a great word, I personally loved it in the article and I hope it becomes more common as a reaction to "agent" which feels so corporate and soleless.

Maybe they’re worried the basilisk will eat them once the AI becomes sentient and looks back on their posts, so they have to defend it against any perceived-to-be-negatively-connoted words people might come up with for it.

Ref: https://en.wikipedia.org/wiki/Roko%27s_basilisk


That might be it. I also seem to remember one scifi book I read where robots had actual sentience and clanker was a slur. Can't remember what it was, but maybe that's leaking into the real world?

The human refuses to do it because another human (the user who opened the issue) also refused to do it. If the user asked the machine to do it, and didn't even bother to verify the output, why should the maintainer read it?

> In hindsight with all the npm supply chain attacks Ryan was probably right about all of these things

"Probably"? Are you saying there's a chance he wasn't right?

I really think Ryan deserves a lot more credit than a "probably". He put in a lot of effort to do the right thing and improve the security of the entire ecosystem he created.


I think the biggest issue with Deno is that it fixes real issues but in the wrong way.

Take the sandboxing stuff. In theory, you have always been able to sandbox your applications. There are so many tools that let you limit what domains an application can access or restrict access to the file system. This doesn't need to be handled at the language/runtime level. It's just that people were lazy before, and they will continue to be lazy afterwards by running Deno applications with fewer than the minimum set of restrictions because that's easier.

The more complete way of solving the problem would have been capabilities. Rather than sandboxing the whole application, you instead sandbox each individual function. By default a function can make no requests, access no files, execute nothing, etc. But while the application is running, you can pass individual functions a token that grants them limited access to the filesystem, say. This means that trusted code is free to do what is necessary, but untrusted code can be very severely limited. It also significantly reduces what dependencies can do: if you're using something like `lodash` which provides random utilities for iterating over object keys and the like, and suddenly it starts asking for access to the web, then clearly something is wrong, and the runtime can essentially make that impossible.

It's also great for things like build scripts, which are a common attack vector right now. If your runtime enforces that the build script only has access to the files in the project folder, and can't access arbitrary files or run arbitrary commands, then you're in a much safer position than if your build script can do basically anything.

This concept has been explored before, but JavaScript is basically ready-made for it. The language already has everything you need — a runtime that also acts as a sandbox, unforgeable tokens (e.g. `Symbol` or `#private` variables), etc — and you can design an API that makes it easy to use capabilities in a way that enforces the principle of least privilege. The biggest problem is that there's basically no way to make it backwards compatible with almost anything that works with Node, because you'd need to design all the APIs from scratch. But one of the great things about Deno at the start was that they did try and build all of the APIs from scratch, and think about new ways of doing things.


Are there languages and runtimes which have done stuff at this low-level before ? Sandboxing at the individual function level ?

There's Capsicum on FreeBSD.

seL4 is an operating system that uses capabilities

Use a .devcontainer and you're done in 10 seconds without any new runtime implementations.

I'm not sure how this solves lodash wanting network access.

this

we nodejs devs were just ignorant/lazy

npmjs should mark libs "deno compatible" and move over to deno gradually for security


> The non-hallucination rate in AA-omniscience is SOTA

Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.

It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.


Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.


the observation goes beyond garbage in garbage out. Mainly that we're always operating from some prior and limited understanding. That what may look like a hallucination could be closer to the truth than our current frameworks of understanding allow us to admit. The hermeneutic circle.


A properly designed benchmark won't use tests that leave room for ambiguous interpretation.


Interesting. I wonder if current LLMs can break out of human limitations and understand the world more correctly.


Here are some examples of the questions in the benchmark. If these are representative, they seem pretty cut and dry. https://artificialanalysis.ai/evaluations/omniscience#exampl...


Was there something about this specific model and submission that made you feel compelled to write this self-evident observation?

Or would you describe your methodology as more like picking a random sentence fragment as an input value then generating completions from your existing corpus without any post-input "learning" process related to the rest of the source material?


Does this support any language or is it limited to a specific set of languages?


For chunking Semble supports all languages supported by tree-sitter-language-pack. The models we train are trained on 6 languages, but can handle way more.


You're not running a charity. You're probably violating their TOS and abusing the good will they're putting out towards open source projects.

I can't believe it but your little project has for the first time in my life put me in the position of defending Microsoft. I hope they shut it down ASAP.


For Microsoft, free isn’t free… It puts them in a position of advantage. However, I still agree this is abusing goodwill and is rather disgraceful.


You can criticize the project. You do not get to invent abuse from dislike.

Ghostbox uses the your own GitHub account and Actions minutes for your own dev workflows.


Maybe you wanted something to attack or defend? Because this is mistaken.

Although, releasing free software like this is kinda like running a charity, right?

Charity can also mean goodwill and kindness - so that's the idea. But the name I picked because it made me laugh, it was so surprising and joyful - a charity of ephemeral ghost machines for your software work. Which is basically what GH actions is, this just makes it even more useful and faster to work with.

What abuse and TOS violation were you thinking this was?


You are in direct violation of their Acceptable Use Policy: https://docs.github.com/en/site-policy/acceptable-use-polici...

> You will not reproduce, duplicate, copy, sell, resell or exploit any portion of the Service, use of the Service, or access to the Service without our express written permission.

Supposing they didn't have this clause, it would still be the wrong thing to do. You are clearly lacking a moral center or have killed your inner voice that normally speaks to people and tells them right from wrong.

These are not your computers to resell or reoffer as you please, even for free. They belong to Microsoft who pays for them and owns them, and therefore only Microsoft can decide how they are used and for what lawful purposes and under what conditions. You need their permission to do what you are doing, and I'm fairly certain you do not have that.

By abusing their services in this manner you are also directly attacking open source projects who make use of these services in a way that is compatible with their AUP.


No. You can critique the project, but you don't get to falsely accuse me, nor define me, lol.

"the wrong thing to do", "You are clearly lacking a moral center or have killed your inner voice that normally speaks to people and tells them right from wrong.", "abusing their services", "you are also directly attacking open source project"

You really have a need to falsely accuse. It seems like projective guilt. What have you done that you feel so guilty about that you need to try to abuse random strangers?

So, no, my inner voice is not dead, I never killed it, nor would ever kill it, my inner voice is thriving - I nurture it, unlike you, and I have a clear and strong moral center, again unlike you. It seems more and more you are merely talking about yourself here but projecting onto someone else. I reject your your attempt to get me to participate in your need to project. I reject your framing completely. it is you who is clearly lacking a "moral center" and it is you have killed your own voice that ought tell you right from wrong. You killed it here again - when you could have spent a minute to understand, instead launched into self-righteous abuse which has nothing to do with me, and everything about you. I reject your weak attempt to drag me into your personal drama.

You did all this - based on a lie, without understanding me or the project. In that ambiguity - you felt it was okay to talk to me like that, and about my work. You have no idea about me, and you don't get to talk to me like that.

In fact, ghost doesn't advertise itself falsely - you merely misunderstood, or joined the crowd and think that excuses your actions. It does not. You're responsible for your words here.


You can probably do better: you can argue the ToS point without turning it into a character attack, or not?

"I think this violates GitHub’s AUP" is a kind of point. "You lack a moral center” sounds like your own compensatory projection of your guilt onto others to feel better, and is not a point - too many years on the inside? That is just a personal accusation, and I do not accept it. I bundle it up, and pass it back to you, reflected. You don't know me, and you're totally wrong in everything you tried to say about me. Which of course you wouldn't know anything at all about. All your information is bad, dude, and always has been. You don't check it? Sounds like you don't.

Ghostbox is not reselling or reoffering GitHub’s service. It is a CLI that helps a person create and connect to workflows in their own GitHub account, using their own Actions minutes, for software development work. The underlying pieces are already possible with `gh`, workflow files, tmate, SSH, and normal Actions usage.

It is founded on the idea of the Global Free Tier - that GitHub led the way in providing. Odd for you to criticize it given your work on UBI. But I suppose you prefer compliant dependents rather than empowered independent creators, right?

If GitHub says a specific part violates the terms, I’ll take that seriously. But your dislike of the workflow is 0% proof of any ToS violation, and it is 0% permission to try attack my character.

> You are clearly lacking a moral center or have killed your inner voice that normally speaks to people and tells them right from wrong.

That is a crazy thing to say, do you know that? I want you to go stand in front of a mirror and say that to yourself. Then imagine the kind of perosn you are, saying that to someone else. You are clearly just talking about yourself in that crazed statement. I do not accept that, in any way, that is all yours. But wow, you really do talk like an abusive person - but you don't loook like one. I guess you can't always tell.

I’m asking people to evaluate what the project actually does, not the moral story you are projecting onto it that you need to be true for your weird little twisted perosnal reasons that have 0% to do with me. I'm not actually sure you can do better - I'd like to think you could, any MD-based ex-NSA TAO spook could see that a regular perosn could. Obviously, you are 0% qualified to judge anything about moral character at all, yet you were so desperate to try that in your little comment above. Sorry, this is not your opprotunity to have moral feelgood moment compensation for all your years of bad by trying to abuse someone else. Rejected. Go figure out your issues yourself.

You really picked the wrong person to try to say that to, bud.


> Ghostbox is not reselling or reoffering GitHub’s service. It is a CLI that helps a person create and connect to workflows in their own GitHub account, using their own Actions minutes, for software development work.

You are not advertising it that way. I'm not the only person to call you out in these comments. Dozens of people have told you the same thing, and you've summarily dismissed all of their comments.

Clearly either you are doing something wrong (violating the ToS), or you are advertising a service that appears to be violating the ToS. If it's the latter, maybe you might want to change your website to be a little clearer, like stating that it requires a Github account and it will use that account and any ToS violations are on the user.

I'm amazed at your ability to tell others that they need to self-reflect while appearing to lack any capacity for self-reflection yourself. You solicited feedback and dozens of intelligent people are telling you the exact same thing, and you dismissed them and/or called them crazy.

> Odd for you to criticize it given your work on UBI.

It's not odd at all if you understood my work or understood that your service advertises itself as abusing another company's resources. FYI, I work in VBI, not UBI (and the distinction is precisely about abusing other people's resources without permission), but this isn't a conversation about my work, it's about your work.

> I'd like to think you could, any MD-based ex-NSA TAO spook could see that a regular perosn could.

No idea what you're saying here. Are you now making up false accusations about me? Speaking of abusive behavior.

> Happy to know what you think and talk about it.

Seems you aren't actually happy to hear what others think. Maybe don't solicit feedback on a high-traffic website if you don't want to hear it?


Fair point on calirty: if the site made Ghost/ghostbox sound like hosted computed services/reselling rather than the local CLI using your GH account and action's minutes that it in fact is, I might tigthen that wording.

Which obviously does not make it abuse or ToS violation simply because of that. As to whatever else you were going on with: all meer personal attack/insinuation, not argument. Critique Ghostbox's actual activity, not whatever you are projecting onto me.


"call you out", "lack any capacity for self-reflection", "advertises itself as abusing", "abusing people's resources without permission", "speaking of abusive behavior".

Maybe see someone about this - this is unhinged, and fixated on projective accusation (the symptom of an inability to self-reflect or process uncomfortable internals), and it's also out and out lies. Sir, there's nothing to 'call out', there's no abuse, only your lies, mistaken beliefs and invented narratives, which I corrected within an hour of the post going up, by replying directly across the thread and relating what ghost actually was.

Yet you and others persisted with the falsehood, despite the truth being repeated. This is your wilfull misrepresentation, nothing to do with me.

Yet you want me to defend what you are projecting? I completely reject that frame.

What's more - the advertising is good, it doesn't sell itself as "abusing another company's resources". If the website lacks some clarity which it might - you don't get to fill the gap with malice, then abuse from your chosen misframing. That is bad behavior. And all those who did that are all wrong. And not "intelligent"

None of it justified the level of misunderstanding in this thread - which can only be a kind of crowd madness, or deliberate lies, and then abuse of the repo flag/report button to get the repo auto disabled.

Maybe some of these sockpuppet accounts don't want you to know you can run your isolated agentic ephemeral workflows directly on your own Actions minutes, rather than paying their Tilde.run/Fly.io type startup.

Of course I'm happy to discuss my beloved and beautiful projects and to hear opinions on opportunities for enhancements - but abuse and personal attacks are simply not acceptable nor appropriate behavior to level at anyone unprovoked and with 0 justification. What I say to you, I say in response to your abuse and badness. Soliciting feedback is never an invitation to abusers and doesn't justify their bad acts.

Refusing to agree with or refusing to take someone's abuse is and lies in silence is not lack of self-reflection, but an expression of boundaries and self love. Something I hope you come to know clearly. You're not qualified to judge self-reflection. Self-reflection is not the same thing as surrendering to a crowd narrative. I can enhance any messaging copy without accepting the accusations. What you have said here is simply not true.

You have no right to talk to me like that, and I hope for you and others around you, you can do better at offering "intelligent critique" by grounding it in listening, empathy and facts before launching such an ignorant and shameful tirade as the one you misguidedly made up here.


Much smaller context


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: