More

Macuyiko · 2026-02-28T08:09:16 1772266156

Things such as AirLLM, or good old llama.cpp.

Macuyiko · 2026-02-24T17:48:48 1771955328

The input attribution part is interesting though, but I do wonder to which extent that is just assigning some sort of SHAP values to the input tokens, in which case it should be pretty portable to any kind of model.

Macuyiko · 2026-02-24T17:46:43 1771955203

What I typically end up doing is just recalc the slug and see if it matches the provided one. If it doesn't redirect to the most up to date slug matching the id. Though who knows if those old SEO patterns still matter these days...

Macuyiko · 2026-02-14T11:11:39 1771067499

My personal feel (completely subjective) is that during RLHF humans are incredibly sensitive to this pattern, especially when talking about personal or emotional issues. Any reply in the form of "it's not you, it's them" is such a dopamine hit that the LLMs started applying it for everything else.

grey-area · 2026-02-14T11:13:50 1771067630

An interesting topic for some postgraduate student's thesis perhaps!

Macuyiko · 2026-01-06T13:02:29 1767704549

Yes. CP SAT crunches through it in no time, but of course larger grids would quickly make it take much longer.

See

https://gist.github.com/Macuyiko/86299dc120478fdff529cab386f...

ooopdddddd · 2026-01-06T14:22:23 1767709343

I don't believe this works in general. If you have a set of tiles that connect to neither the horse nor to an exit, they can still keep each other reachable in this formulation.

Scaevolus · 2026-01-06T15:23:55 1767713035

Yes, this is the major challenge with solving them with SAT. You can make your solver check and reject these horseless pockets (incrementally rejecting solutions with new clauses), which might be the easiest method, since you might need iteration for maximizing anyways (bare SAT doesn't do "maximize"). To correctly track the flood-fill flow from the horse, you generally need a constraint like reachable(x,y,t) = reachable(nx,ny,t-1) ^ walkable(x,y), and reachable(x,y,0)=is_horse_cell, which adds N^2 additional variables to each cell.

You can more precisely track flows and do maximization with ILP, but that often loses conflict-driven clause learning advantages.

Macuyiko · 2026-01-06T17:23:31 1767720211

Good point. I don't think the puzzles do this and if they would, I would run a pre-solve pass over the puzzle first to flood fill such horseless pockets up with water, no?

ooopdddddd · 2026-01-06T17:44:31 1767721471

It's not quite that easy. For the simplest example, look at https://enclose.horse/play/dlctud, where the naive solution will waste two walls to fence in the large area. Obviously, you can construct puzzles that have lots of these "bait" areas.

Like the other comment suggested, running a loop where you keep adding constraints that eliminate invalid solutions will probably work for any puzzle that a human would want to solve.

Macuyiko · 2026-01-07T08:16:32 1767773792

Oh I see what you mean now, indeed:

    Score: 7
    ~~~~~~
    ~····~
    ~·~~·~
    .#..#.
    ......
    ..#...
    .#H#..
    ..#...

However, I think that you do not need 'time' based variables in the form of

    reachable(x,y,t) = reachable(nx,ny,t-1)

Enforcing connectivity through single-commodity flows is IMO better to enforce flood fill (also introduces additional variables but is typically easier to solve with CP heuristics):

    Score: 2
    ~~~~~~
    ~....~
    ~.~~.~
    ......
    ......
    ..##..
    .#H·#.
    ..##..

Cool puzzle!

Macuyiko · 2025-11-20T14:06:09 1763647569

Late, but reading all of the replies, and speaking from my own observation using Claude, Codex, as well as (non-CLI) Gemini, Kimi, Qwen, and Deepseek...

It's fun how we are so quick to assign meaning to the way these models act. This is of course due to training, RLHF, available tool calls, system prompt (all mostly invisible) and the way we prompt them.

I've been wondering about a new kind of benchmark how one would be able to extract these more intangible tendencies from models rather than well-controlled "how good at coding is it" style environments. This is mainly the reason why I pay less and less attention to benchmark scores.

For what it's worth: I still best converse with Claude when doing code. Its reasoning sounds like me, and it finds a good middle ground between conservative and crazy, being explorative and daring (even although it too often exclaims "I see the issue now!"). If Anthropic would lift the usage rates I would use it as my primary. The CLI tool is also better. E.g. Codex with 5.1 gets stuck in powershell scripts whilst Claude realizes it can use python to do heavy lifting, but I think that might be largely due to being mainly on Windows (still, Claude does work best, realizing quickly what environment it lives in rather than trying Unix commands or powershell invocations that don't work because my powershell is outdated).

Qwen is great in an IDE for quick auto-complete tasks, especially given that you can run it locally, but even the VSCode copilot is good enough for that. Kimi is promising for long running agentic tasks but that is something I've barely explored and just started playing with. Gemini is fantastic as a research assistant. Especially Gemini 3 Pro points out clear and to the point jargon without fear of the user being stupid, which the other commercial models are too often hesitant to do.

Again, it would be fun to have some unbiased method to uncover some of those underlying persona's.

abshkbh · 2025-11-20T15:04:48 1763651088

We have trained this model on Windows (our first model to do so). Give it a try!

Macuyiko · 2025-10-01T18:03:42 1759341822

On the homepage it says "Sinmple" above "Export SQL", fyi

Macuyiko · 2025-07-17T07:54:27 1752738867

A coin measurer is still my goto explanation. Especially with most models having an inset for the coin to rest on / fit in. The hole itself is then just to quickly/easily get the coin out again with your finger.

With so many different coin sizes and types in the empire, I think this makes most sense.

Wikipedia also mentions this:

> Several dodecahedra were found in coin hoards, suggesting either that their owners considered them valuable objects, or that their use was connected with coins — as, for example, for easily checking coins fit a certain diameter and were not clipped.

meindnoch · 2025-07-17T10:28:50 1752748130

If you look at ancient coins, you'll see that they didn't have identical sizes. They were minted from a standard weight of metal, but the manual minting tools of the time couldn't guarantee precise thickness and shape like we have today with machine-made coins. So a dodecahedron with precisely cut circular holes is not a good way to check your coins.

tengwar2 · 2025-07-17T20:06:43 1752782803

Also if they did have identical sizes and there was a need to measure those sizes, we would expect a lot of much simpler devices to measure them - say a flat piece of metal with differently sized holes. Fancy versions like the dodecahedron might exist, but they would be outnumbered by the utilitarian devices.

Macuyiko · 2025-06-24T09:57:50 1750759070

I've noticed that puzzles that can be solved with CP-SAT's presolver so that the SAT search does not even need to be invoked basically adhere to this (no backtracking, known rules), e.g.:

    #Variables: 121 (91 primary variables)
      - 121 Booleans in [0,1]
    #kLinear1: 200 (#enforced: 200)
    #kLinear2: 1
    #kLinear3: 2
    #kLinearN: 30 (#terms: 355)

    Presolve summary:
      - 1 affine relations were detected.
      - rule 'affine: new relation' was applied 1 time.
      - rule 'at_most_one: empty or all false' was applied 148 times.
      - rule 'at_most_one: removed literals' was applied 148 times.
      - rule 'at_most_one: satisfied' was applied 36 times.
      - rule 'deductions: 200 stored' was applied 1 time.
      - rule 'exactly_one: removed literals' was applied 2 times.
      - rule 'exactly_one: satisfied' was applied 31 times.
      - rule 'linear: empty' was applied 1 time.
      - rule 'linear: fixed or dup variables' was applied 12 times.
      - rule 'linear: positive equal one' was applied 31 times.
      - rule 'linear: reduced variable domains' was applied 1 time.
      - rule 'linear: remapped using affine relations' was applied 4 times.
      - rule 'presolve: 120 unused variables removed.' was applied 1 time.
      - rule 'presolve: iteration' was applied 2 times.

    Presolved satisfaction model '': (model_fingerprint: 0xa5b85c5e198ed849)
    #Variables: 0 (0 primary variables)

    The solution hint is complete and is feasible.

    #1       0.00s main
      a    a    a    a    a    a    a    a    a    a   *A* 
      a    a    a    b    b    b    b   *B*   a    a    a  
      a    a   *C*   b    d    d    d    b    b    a    a  
      a    c    c    d    d   *E*   d    d    b    b    a  
      a    c    d   *D*   d    e    d    d    d    b    a  
      a    f    d    d    d    e    e    e    d   *G*   a  
      a   *F*   d    d    d    d    d    d    d    g    a  
      a    f    f    d    d    d    d    d   *H*   g    a  
     *I*   i    f    f    d    d    d    h    h    a    a  
      i    i    i    f   *J*   j    j    j    a    a    a  
      i    i    i    i    i    k   *K*   j    a    a    a

Together with validating that there is only 1 solution you would probably be able to make the search for good boards a more guided than random creation.

Macuyiko · 2025-05-15T17:48:25 1747331305

All of the above is true, but between solving quicker, and admitting we gave context:

I do agree with you that an LLM should not always start from scratch.

In a way it is like an animal which we have given the ultimate human instinct.

What has nature given us? Homo Erectus is 2 million years ago.

A weird world we live in.

What is context.