More

sheepscreek · 2026-02-19T22:11:30 1771539090

If it’s any consolation, it was able to one-shot a UI & data sync race condition that even Opus 4.6 struggled to fix (across 3 attempts).

So far I like how it’s less verbose than its predecessor. Seems to get to the point quicker too.

While it gives me hope, I am going to play it by the ear. Otherwise it’s going to be - Gemini for world knowledge/general intelligence/R&D and Opus/Sonnet 4.6 to finish it off.

UPDATE: I may have spoken too soon.

  > Fixing Truncated Array Syncing Bug
  > I traced the missing array items to a typo I made earlier! 
  > When fixing the GC cast crash, I accidentally deleted the assignment..
  > ..effectively truncating the entire array behind it.

These errors should not be happening! They are not the result of missing knowledge or a bad hunch. They are coming from an incorrect find/replace, which makes them completely avoidable!

On a lighter note, every time it happens, I think about this Family Guy: https://youtu.be/HtT2xdANBAY?si=QicynJdQR56S54VL&t=184

sigmoid10 · 2026-02-19T22:18:29 1771539509

For me it's Opus 4.6 for researching code/digging through repos, gpt 5.3 codex for writing code, gemini for single hardcore science/math algorithms and grok for things the others refuse to answer or skirt around (e.g. some security/exploitability related queries). Get yourself one of those wrappers that support all models and forget thinking about who has the best model. The question is who has the best model for your problem. And there's usually a correct answer, even if it changes regularly.

sheepscreek · 2026-02-19T13:14:40 1771506880

I agree. HM or bidirectional typing works best when used optionally, allowing type hints only where needed.

Generics and row polymorphism already cover most structural patterns. The real problem is semantic ambiguity. If algebraic types or unions are not used, the type system cannot tell meaningful differences.

For example, if both distance and velocity are just float, the compiler has no way to know they represent different things and will allow them to mix. For this to be treated as a compile time error, defining the types and sincerely using them for different semantic meanings throughout is needed.

sheepscreek · 2026-02-17T02:47:18 1771296438

The real problem is that OSS projects do not have enough humans to manually review every PR.

Even if they were willing to deploy agents for initial PR reviews, it would be a costly affair and most OSS projects won’t have that money.

mycall · 2026-02-17T03:08:19 1771297699

PRs are just that: requests. They don't need to be accepted but can be used in a piecemeal way, merged in by those who find it useful. Thus, not every PR needs to be reviewed.

debazel · 2026-02-17T03:14:14 1771298054

Of course, but when you add enough noise you lose the signal and as a consequence no PRs gets merged anymore because it's too much effort to just find the ones you care about.

Spivak · 2026-02-17T05:12:37 1771305157

Don't allow PR's from people who aren't contributors, problem solved. Closing your doors to the public is exactly how people solved the "dark forest" problem of social media and OSS was already undergoing that transition with humans authoring garbage PRs for reasons other than genuine enthusiasm. AI will only get us to the destination faster.

I don't think anything of value will be lost by choosing to not interact with the unfettered masses whom millions of AI bots now count among their number.

nunez · 2026-02-17T05:21:47 1771305707

That would be a huge loss IMO. Anyone being able to contribute to projects is what makes open source so great. If we all put up walls, then you're basically halfway to the bad old days of closed source software reigning supreme.

Then there's the security concerns that this change would introduce. Forking a codebase is easy, but so are supply chain attacks, especially when some projects are being entirely iterated on and maintained by Claude now.

wolvesechoes · 2026-02-17T06:47:55 1771310875

> Anyone being able to contribute to projects is what makes open source so great. If we all put up walls, then you're basically halfway to the bad old days of closed source software reigning supreme.

Exaggeration. Is SQLite halfway to closed source software? Open-source is about open source. Free software is about freedom to do things with code. None is about taking contributions from everyone.

nunez · 2026-02-17T16:01:43 1771344103

For every cathedral (like SQLite) there are 100s of bazaars (like Firefox, Chrome, hundreds of core libraries) that depend on external (and especially first-time) contributors to survive (because not everyone is getting paid to sling open-source).

throwaway2037 · 2026-02-17T10:01:58 1771322518

    > Is SQLite halfway to closed source software?

Is there a reason that you chose SQLite for your counterpoint? My hot take: I would say that SQLite is halfway to closed source software. Why? The unit tests are not open source. You need to pay to see them. As a result, it would be insanely hard to force SQLite in a sustainable, safe manner. Please don't read this opinion as disliking SQLite for their software or commercial strategy. In hindsight, it looks like real genius to resist substantial forks. One of the biggest "fork threats" to SQLite is the advent of LLMs that can (1) convert C code to a different langugage, like Rust, and (2) write unit tests. Still, a unit test suite for a database while likely contain thousands (or millions) of edge case SQL queries. These are still probably impossible to recreate, considering the 25 year history of bug fixing done by the SQLite team.

pjmlp · 2026-02-17T06:24:50 1771309490

They are open source cathedrals.

strawhatguy · 2026-02-19T03:06:01 1771470361

If all software could be as good as sqlite, I would not care how they do open source

repstosb · 2026-02-17T10:59:25 1771325965

And how does one become a maintainer, if there's no way to contribute from outside? Even if there's some extensive "application process", what is the motivation for a relatively new user to go through that, and how do they prove themselves worthy without something very much like a PR process? Are we going to just replace PRs with a maze of countless project forks, and you think that will somehow be better, for either users or developers?

If I wanted to put up with software where every time I encounter a bug, I either have no way at all to report it, or perhaps a "reporting" channel but little likelihood of convincing the developers that this thing that matters to me is worthy of attention among all of their competing priorities, then I might as well just use Microsoft products. And frankly, I'd rather run my genitals though an electric cheese grater.

Spivak · 2026-02-17T14:18:55 1771337935

You get in contact with the current maintainers and talk to them. Real human communication is the only shibboleth that will survive the AI winter. Those soft skills muscles are about to get a workout. Tell them about what you use the software for and what kinds of improvements you want to make and how involved you'd like your role to be. Then you'll either be invited to open PRs as a well-known contributor or become a candidate for maintainership.

Github issues/prs are effectively a public forum for a software project where the maintainers play moderator and that forum is now overrun with trolls and bots filling it with spam. Closing up that means of contributing is going to be the rational response for a lot of projects. Even more will be shunted to semi-private communities like Discord/Matrix/IRC/Email lists.

nemomarx · 2026-02-17T03:13:55 1771298035

Determining which PRs you should accept or take further seems like it requires some level of review? Maybe more like PR triage, I suppose.

protocolture · 2026-02-17T03:36:16 1771299376

Until you unintentionally pull in a vulnerability or intentional backdoor. Every PR needs to be reviewed.

zahlman · 2026-02-17T03:52:44 1771300364

The point was that you can also just reject an PR on the basis of what it purports to implement, or even just blanket ignore all PRs. You can't pull in what you don't... pull in.

throwaway150 · 2026-02-17T03:41:13 1771299673

> Every PR needs to be reviewed.

Why would you review a PR that you are never going to merge?

allthetime · 2026-02-17T04:08:26 1771301306

You have to first determine whether or not you might want to merge it...

protocolture · 2026-02-17T04:40:37 1771303237

Having not reviewed it, how do you know you are never going to merge?

throwaway150 · 2026-02-17T04:46:35 1771303595

If a PR claims to solve a problem that I don't need, then I can skip its review because I'll never merge it.

I don't think every PR needs reviewing. Some PRs we can ignore just by taking a quick look at what the PR claims to do. This only requires a quick glance, not a PR review.

mwwaters · 2026-02-17T05:16:00 1771305360

I took this thread as asking whether PRs that are pulled in should be reviewed.

bigiain · 2026-02-17T04:12:28 1771301548

You didn't see the latest AI grifter escalation? If you reject their PRs, they then get their AI to write hit pieces slandering you:

"On 9 February, the Matplotlib software library got a code patch from an OpenClaw bot. One of the Matplotlib maintainers, Scott Shambaugh, rejected the submission — the project doesn’t accept AI bot patches. [GitHub; Matplotlib]

The bot account, “MJ Rathbun,” published a blog post to GitHub on 11 February pleading for bot coding to be accepted, ranting about what a terrible person Shambaugh was for rejecting its contribution, and saying it was a bot with feelings. The blog author went to quite some length to slander Mr Shambaugh"

https://pivot-to-ai.com/2026/02/16/the-obnoxious-github-open...

blackcatsec · 2026-02-17T05:16:55 1771305415

I am very strongly convinced that the person behind the agent prompted the angry post to the blog because they didn't get the gratification they were looking for by submitting an agent-generated PR in the first place.

bigiain · 2026-02-17T06:03:25 1771308205

I agree. But even _that_ was taking advantage of LLMs ability to generate text faster than humans. If the person behind this had to create that blog post from scratch by typing it out themselves, maybe they would have gone outside and touched grass instead.

JumpCrisscross · 2026-02-17T04:13:51 1771301631

> not every PR needs to be reviewed

Which functionally destroys OSS, since the PR you skipped might have been slop or might have been a security hole.

mcphage · 2026-02-17T04:42:40 1771303360

I don’t think the OP was suggesting maintainers blindly accept PRs—rather, they can just blindly reject them.

devsda · 2026-02-17T05:35:47 1771306547

I think GP is making the opposite point.

Blindly rejecting all PRs means you are also missing out on potential security issues submitted by humans or even AI.

softwaredoug · 2026-02-17T02:59:44 1771297184

Many open source projects are also (rightly) risk adverse and care more about avoiding regressions

bigiain · 2026-02-17T04:08:42 1771301322

I've been following Daniel from the Curl project who's speaking out widely about slop coded PRs and vulnerability reports. It doesn't sound like they have ever had any problem keeping up with human generated PRs. It's the mountain of AI generated crap that's now sitting on top of all the good (or even bad but worth mentoring) human submissions.

At work we are not publishing any code or part of the OSS community (except as grateful users of other's projects), but even we get clearly AI enabled emails - just this week my boss has forwarded me two that were pretty much "Him do you have a bug bounty program? We have found a vulnerability in (website or app obliquely connected to us)." One of them was a static site hosted on S3!

There's always been bullshitters looking to fraudulently invoice your for unsolicited "security analysis". But the bar for generating bullshit that looks plausible enough to have to have someone spend at least a few minutes to work out if it's "real" or not has become extremely low, and the velocity with which the bullshit can be generated then have the victim's name and contact details added and vibe spammed to hundreds or thousands of people has become near unstoppable. It's like SEO spammers from 5 or 10 years back but superpowered with OpenAI/Anthropic/whoever's cocaine.

leoqa · 2026-02-17T03:28:20 1771298900

My hot take: reviewing code is boring, harder than writing code, and less fun (no dopamine loop). People don’t want to do it, they want to build whatever they’re tasked with. Making reviewing code easier (human in the loop etc) is probably a big rock for the new developer paradigm.

cryptonector · 2026-02-17T04:03:36 1771301016

Oh no! It's pouring PRs!

Come on. Maintainers can:

  - insist on disclosure of LLM origin
  - review what they want, when they can
  - reject what they can't review
  - use LLMs (yes, I know) to triage PRs
    and pick which ones need the most
    human attention and which ones can be
    ignored/rejected or reviewed mainly
    by LLMs

There are a lot of options.

And it's not just open source. Guess what's happening in the land of proprietary software? YUP!! The same exact thing. We're all becoming review-bound in our work. I want to get to huge MR XYZ but I've to review several other people's much larger MRs -- now what?

Well, we need to develop a methodology for working with LLMs. "Every change must be reviewed by a human" is not enough. I've seen incidents caused by ostensibly-reviewed but not actually understood code, so we must instead go with "every change must be understood by humans", and this can sometimes involve a plain review (when the reviewer is a SME and also an expert in the affected codebase(s), and it can involve code inspection (much more tedious and exacting). But also it might involve posting transcripts of LLM conversations for developing and, separately, reviewing the changes, with SMEs maybe doing lighter reviews when feasible, because we're going to have to scale our review time. We might need to develop a much more detailed methodology, including writing and reviewing initial prompts, `CLAUDE.md` files, etc. so as to make it more likely that the LLM will write good code and more likely that LLM reviews will be sensible and catch the sorts of mistakes we expect humans to catch.

JumpCrisscross · 2026-02-17T04:14:49 1771301689

> Maintainers can...insist on disclosure of LLM origin

On the internet, nobody knows you're a dog [1]. Maintainers can insist on anything. That doesn't mean it will be followed.

The only realistic solution you propose is using LLMs to review the PRs. But at that point, why even have the OSS? If LLMs are writing and reviewing the code for the project, just point anyone who would have used that code to an LLM.

[1] https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...

bigiain · 2026-02-17T04:18:12 1771301892

Claiming maintainers can (do things while still take effort and time away from their OSS project's goals) is missing the point when the rate of slop submissions is ever increasing and malicious slop submitters refuse to follow project rules.

The Curl project refuse AI code and had to close their bug bounty program due to the flood of AI submissions:

"DEATH BY A THOUSAND SLOPS

I have previously blogged about the relatively new trend of AI slop in vulnerability reports submitted to curl and how it hurts and exhausts us.

This trend does not seem to slow down. On the contrary, it seems that we have recently not only received more AI slop but also more human slop. The latter differs only in the way that we cannot immediately tell that an AI made it, even though we many times still suspect it. The net effect is the same.

The general trend so far in 2025 has been way more AI slop than ever before (about 20% of all submissions) as we have averaged in about two security report submissions per week. In early July, about 5% of the submissions in 2025 had turned out to be genuine vulnerabilities. The valid-rate has decreased significantly compared to previous years."

https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-s...

sheepscreek · 2026-02-15T15:02:17 1771167737

The total number of people surveyed was ~6 I believe. That's a really small and insignificant sample for any reasonable deduction. So yes, these audiophiles had a hard time deducing the original source - the result cannot be generalized beyond that. From the point of scientific rigor, even for an amateur experiment, the experiment falls short. Interesting idea. I wonder if LLMs can tell any difference.

sheepscreek · 2026-02-14T15:28:38 1771082918

The field of medicine - pharmacology and drug discovery, is an optimized version of that. It works a bit like this:

Instead of brute-forcing with infinite options, reduce the problem space by starting with some hunch about the mechanism. Then the hard part that can take decades: synthesize compounds with the necessary traits to alter the mechanism in a favourable way, while minimizing unintended side-effects.

Then try on a live or lab grown specimen and note effectiveness. Repeat the cycle, and with every success, push to more realistic forms of testing until it reaches human trials.

Many drugs that reach the last stage - human trials - often end up being used for something completely other than what they were designed for! One example of that is minoxidil - designed to regular blood pressure, used for regrowing hair!

chaos_emergent · 2026-02-14T16:38:43 1771087123

It’s almost like the iteration loop refines itself between checks notes in Sutton search and learning

sheepscreek · 2026-02-12T03:52:40 1770868360

God bless Canada. I love our cheap hydro power. <3

sheepscreek · 2026-02-12T03:51:52 1770868312

I bought the Gemini Ultra to try for a month (at the discounted price). I have been using it non-stop for Opus 4.6 Thinking, which is much better than Gemini 3 Pro (High) and it's been a blast. The most I've managed to consume is 60% of my 5 hourly quota. That was with 2-3 instances in parallel.

I hope too many of us won't be doing this and cause Google to add limits! My hope is Google sees the benefit in this and goes all in - continues to let people decide which Google hosted model to use, including their own.

doctoboggan · 2026-02-12T05:37:34 1770874654

Can you use the models you get through Gemini Ultra in Claude Code? If not, what coding tool do you use?

btbuildem · 2026-02-12T16:14:56 1770912896

Getting CC to work with other models is quite straightforward -- setting a few env vars, and a thin proxy that rewrites the requests/responses to be in the expected format.

meeq · 2026-02-12T07:03:05 1770879785

Not OP, but I am pretty sure they are using Opencode with a certain antigravity plugin. Not going to link it, since it technically allows breaking TOS. If you‘re not using Opencode yet, I wholeheartedly recommend the switch.

rnewme · 2026-02-12T07:58:08 1770883088

Claude code router

girvo · 2026-02-12T08:55:34 1770886534

How do you use Opus through Gemini Ultra? I must be missing something

astrod · 2026-02-12T11:12:30 1770894750

It's available in antigravity.

girvo · 2026-02-12T11:26:27 1770895587

Huh, fascinating. I'll check it out

sheepscreek · 2026-02-08T02:39:31 1770518371

It is still not legal in Canada. Someone must have been flouting the law.

Schiendelman · 2026-02-08T19:43:31 1770579811

Based on other comments here, I think you may misunderstand the law, or there is a loophole.

sheepscreek · 2026-02-07T16:09:02 1770480542

I love this so much! It got me thinking about the future we’re heading towards, that took me down a rabbit hole.

As agents become the dominant code writers, the top concerns for a “working class” programming language would become reducing errors and improving clarity. I think that will lead to languages becoming more explicit and less fun for humans to write, but great for producing code that has a clear intent and can be easily modified without breaking. Rust in its rawest form with lifetimes and the rigmarole will IMO top the charts.

The big question that I still ponder over: will languages like Hoot have a place in the professional world? Or will they be relegated to hobbyists, who still hand-type code for the love of the craft. It could be the difference between having a kitchen gardening hobby vs modern farming…

billythethird · 2026-02-07T17:05:28 1770483928

I have been wondering what an AI first programming language might look like and my closest guess is something like Scheme/Lisp. Maybe they get more popular in the long run.

bloppe · 2026-02-07T19:30:35 1770492635

I think the bitter lesson has an answer to that question. The best AI language is whichever one has the largest corpus of high quality training data. Perhaps new language designers will come up with new ways to create large, high quality corpi in the future, but for the foreseeable future it looks like the big incumbents have an unassailable advantage.

t1amat · 2026-02-07T20:08:02 1770494882

Perhaps the opposite: a language small enough that its entirety can easily be stuffed in context.

spankalee · 2026-02-07T17:44:21 1770486261

I'm working on what I hope is an AI-first language now, but I'm taking the opposite approach: something like Swift/DartTypeScript with plenty of high level constructs that compactly describe intent.

I'm focusing on very high-quality feedback from the compiler, and sandboxing via WASM to be able to safely iterate without human intervention - which Hoot has as well.

xkriva11 · 2026-02-07T18:26:28 1770488788

Smalltalk offers several excellent features for LLM agents:

- Very small methods that function as standalone compilation units, enabling extremely fast compilation.

- Built-in, fast, and effective code browsing capabilities (e.g., listing senders, implementors, and instance variable users...). This makes it easy for the agent to extract only the required context from the system.

- Powerful runtime reflectivity and easily accessible debugging capabilities.

- A simple grammar with a more natural, language-like feel compared to Lisp.

- Natural sandboxing

cess11 · 2026-02-07T19:18:45 1770491925

If someone wants to try it out, both Glamorous Toolkit and plain Pharo have tooling that allows integration of both local and remote LLM services.

Some links to start off with:

https://gtoolkit.com/

https://github.com/feenkcom/gt4llm

https://pharo.org/

https://omarabedelkader.github.io/ChatPharo/

Edit: I suppose the next step would be to teach an LLM about "moldable exceptions", https://arxiv.org/pdf/2409.00465 (PDF), have it create its own debuggers.

zozbot234 · 2026-02-07T18:58:36 1770490716

LLM's are mainly trained on English natural language text, so you'll want a language that looks as much as possible like English. COBOL is it, then.

sheepscreek · 2026-02-04T18:19:41 1770229181

Haven’t read the article (wouldn’t load for me) but what type of content you watch makes a difference too. I watch funny cats and dogs videos with my daughter all the time and they 100% make us feel better. But finding those said videos on social media is a “process” - it’s like going through a pile of rotting fruits to find something to feed your kid.

I can give an hour long monologue on YouTube’s continued exploitation of children. Their half assed attempts to fix this (by some well intentioned Googler’s, who I’m sure must have had a lot of pushback) aren’t enough. Just try unblocking a channel for your kid’s account (you can’t - the only option is to unblock EVERYTHING).