> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.
That article is literally a definition of TDD that has been around for years and years. There's nothing novel there at all. It's literally test driven development.
In my experience the agent regularly breaks some current features while adding a new one - much more often than a human would. Agents too often forget about the last feature when adding the next and so will break things. Thus I find Agent generated tests important as they stop the agent from making a lot of future mistakes.
It is definitely not foolproof but IMHO, to some extent, it is easier to describe what you expect to see than to implement it so I don't find it unreasonable to think it might provide some advantages in terms of correctness.
In my experience, this tends to be more related to instrumentation / architecture than a lack of ability to describe correct results. TDD is often suggested as a solution.
Given the issues with AWS with Kiro and Github, We already have just a few high-profile examples of what happens when AI is used at scale and even when you let it generate tests which is something you should absolutely not do.
Don't "let it" generate tests. Be intentional. Define them in a way that's slightly oblique to how the production code approaches the problem, so the seams don't match. Heck, that's why it's good to write them before even thinking about the prod side.
Wild it’s taken people this long to realize this. Also lean tickets / tasks with all needed context to complete the task, including needed references / docs, places to look in source, acceptance criteria, other stuff.
Just sharing that I bought Valuable Humans in Transit some years ago and I concur that it's very nice. It's a tiny booklet full of short stories like Lena that are way out there. Maximum cool per gram of paper.
The woman herself says she never had a problem with it being famous. The actual test image is obviously not porn, either. But anything to look progressive, I guess.
> Forsén stated in the 2019 documentary film Losing Lena, "I retired from modeling a long time ago. It's time I retired from tech, too... Let's commit to losing me."
> Lena is no longer used as a test image because it's porn.
The Lenna test image can be seen over the text "Click above for the original as a TIFF image." at [0]. If you consider that to be porn, then I find your opinion on what is and is not porn to be worthless.
The test image is a cropped portion of porn, but if a safe-for-work image would be porn but for what you can't see in the image, then any picture of any human ever is porn as we're all nude under our clothes.
For additional commentary (published in 1996) on the history and controversy about the image, see [1].
I agree that not all nudity is porn - nudity is porn if the primary intent of that nudity is sexual gratification. When the nudity in question was a Playboy magazine centerfold, the primary intent is fairly obvious.
I can't see how that would it be porn either, it's nudity.
There's nudity in the Sixtine chapel and I would find it hilarious if it was considered porn.
It's interesting because where I'm from, there was "erotica" and there was "porn". This image would at best be erotica. It would not be considered porn.
Like in US supreme Court "I know it when I see it", definition isn't straight forward but it has elements of "is it depiction of a sexual act or simply nudity ", as well as any artistic quality. Generally, erotica has high production values and porn less so.
Anyhoo! What a weird place for discussion to end up :-). The story is excellent and very hacker news appropriate, but his entire opus is pretty good. There's a bit of deus ex machine in some of qntm's work, but generally they have the right mix of surreal and puzzling and cryptic and interesting to engage a computer geek's mind :-).
the "porn" angle is very funny to me, since there is nothing pornographic or inapropriate about the image. when I was young, I used to think it was some researcher's wife whom he loved so much he decide to use her picture absolutely everywhere.
it's sufficient to say that the person depicted has withdrawn their consent for that image to be used, and that should put an end to the conversation.
is that how consent works? I would have expected licenses would override that. although it's possible that the original use as a test image may have violated whatever contract she had with her producer in the first place.
she did not explicitly consent for that photo to be used in computer graphics research or millions of sample projects. moreover, the whole legality of using that image for those purposes is murky because I doubt anyone ever received proper license from the actual rights-holder (playboy magazine). so the best way to go about this is just common-sense good-faith approach: if the person depicted asks you to please knock it off, you just do it, unless you actively want to be a giant a-hole to them.
No, because the replacement value of those things to others is very high, and generally outweighs Carrie Fisher's objection. But we should take her objection into consideration going forwards. The Lena test image is very easy to replace, and it's not all that culturally significant: there's no reason to keep using it, unless we need to replicate historical benchmarks.
I'm using Sonnet with 1M Context Window at work, just stuffing everything in a window (it works fine for now), and I'm hoping to investigate Recursive Language Models with DSPy when I'm using local models with Ollama
Apache Arrow is trying to do something similar, using Flatbuffer to serialize with zero-copy and zero-parse semantics, and an index structure built on top of that.
Arrow has a different use case I think. Lite3 / TRON is effectively more efficient JSON. Arrow uses an array per property. This allows zero copy per property access across TB scale datasets amongst other useful features - it’s more like the core of a database.
A closer comparison would be to FlatBuffers which is used by Arrow IPC, a major difference being TRON is schemaless.
My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.
Curious about comparisons with Apache Arrow, which uses flatbuffers to avoid memory copying during deserialization, which is well supported by the Pandas ecosystem, and which allows users to serialize arrays as lists of numbers that have hardware support from a GPU (int8-64, float)
Apache Arrow is more of a memory format than a general‑purpose data serialization system. It’s great for in‑memory analytics and GPU‑friendly columnar storage.
Apache Fory, on the other hand, has its own wire‑stream format designed for sending data across processes or networks. Most of the code is focused on efficiently converting in‑memory objects into that stream format (and back) — with features like cross‑language support, circular reference handling, and schema evolution.
Fory also has a row format, which is a memory format, and can complement or compete with Arrow’s columnar format depending on the use case.
> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.
reply