To be honest, three nested RDPs sound like a terrible hack. In an ideal world, this would be two port forwardings and one RDP (thinking about ssh, which is still underrepresented in windows world). In an even more ideal world, this would be an IPv6 direct access ;-)
There are legit reasons, at least for two nested sessions. A production network that’s airgapped except for a bastion host that acts as a gateway. It’s better than port forwarding because you have to auth to the bastion host before the RDP chaining, and it often takes separate credentials for the second RDP session.
It’s a semi-common setup for higher security environments, and when you have a network of stuff that has known vulnerabilities you can’t patch for whatever reason. Traffic in and out is super carefully firewalled. It’s not great, but it’s better than a 25 year old MySQL with a direct public IP.
> airgapped except for a bastion host that acts as a gateway
First time I've heard of an airgapped system you could access remotely. Doesn't that kind of defeat the label "airgapped"? I think I'd just call that "isolated" at that point instead.
This concept is related to PAM.
You often have to do ops on infra and need some DMZ to do the ops. In regulated industry you have to record every operations done by the person and have to follow principle of least privilege. This what should happen in an ideal world.
> You often have to do ops on infra and need some DMZ to do the ops.
This makes sense, "bastion" hosts and similar things is fairly common too. What's not common is calling those "airgapped", because they're clearly not.
It's probably there not as a way to connect networks, but as a way to keep them separate, only allowing RDP between specific computers on different networks.
...A test suite, And security audits, And most importantly benchmarks.
What it does have is a license which it is GPLv3. So if anyone adds all those changes, they have to make the source code available with the same software license.
In this era tho, licenses (I don't agree with this, but this is what it is) are a matter of "tokens", I speak for a fact knowing multiple relatively-big companies just gobbling GPLv3 projects and rewriting them entirely, some do publish them as well.
Has anyone made a comprehensive overview of these? A lot of memory solutions keep springing up but I’m not even entirely sure what to evaluate them by (without hands on experience).
I'm biased since I built this, but the things I'd look at: how memories are stored (flat text vs typed), what happens when info conflicts (does it detect contradictions or just store both), and whether it runs locally or cloud only.
My take is that pure memory is just one piece. What I'm really trying to build is a cognitive engine. Typed memories, contradiction detection, pain signals when you're about to repeat a mistake, decay and reinforcement. Less "store and retrieve" and more how memory actually works.
At home I currently use MiniMax via OpenRouter - it’s pretty good and very cheap. They have a subscription plan, but I’m not ready to commit to it yet.
Another way to keep the ability to try out new models is to buy a reseller subscription like Cursor’s.
I tried OpenRouter but I feel the money flies even with these models, it is not comparable to a subscription but yes, it's very good for trying. Maybe I should test other models alongside GPT 5.5 to see which one fits me.
I'm also unemployed. So far the models that I've used the most are Kimi and GLM. I haven't done that much agentic coding though, I've mostly used them for studying math and general conversations and I'm generally happy with their performance.
I've got to do some cleanup before sharing (yay vibe coding) but the big things I've changed so far:
1) Curated a set of models I like and heavily optimized all possible settings, per agent role and even per skill (had to really replumb a lot of stuff to get it as granular as I liked)
2) Ported from sqlite to postgresql, with heavily extended schema. I generate embeddings for everything, so every aspect of my stack is a knowledge graph that can be vector searched. Integrated with a memory MCP server and auditing tools so I can trace anything that happens in the stack/cluster back to an agent action and even thinking that was related to the action. It really helps refine stuff.
3) Tight integration of Gitea server, k3s with RBAC (agents get their own permissions in the cluster), every user workspace is a pod running opencode web UI behind Gitea oauth2.
4) Codified structure of `/projects/<monorepo>/<subrepos>` with simpler browserso non-technical family members can manage their work easier (agents handle all the management and there are sidecars handling all gitops transparent to the user)
5) Transparent failover across providers with cooldown by making model definitions linked lists in the config, so I can use a handful of subscriptions that offer my favorite models, and fail over from one to the next as I hit quota/rate limits. This has really cut my bill down lately, along with skipping OpenRouter for my favorite models and going direct to Alibaba and Xiaomi so I can tailor caching and stuff exactly how I want.
6) Integrated filebrowser, a fork of the Milkdown Crepe markdown editor, and codemirror editor so I don't even need an IDE anymore. I just work entirely from OpenCode web UI on whatever device is nearest at the moment. I added support for using Gemma 4 local on CPU from my phone yesterday while waiting in line at a store yesterday.
Those are the big ones off the top of my head. Im sure there's more. I've probably made a few hundred other changes, it just evolves as I go.
If this claim is true (inference is priced below cost), it makes little sense that there are tens of small inference providers on OpenRouter. Where are they getting their investor money? Is the bubble that big?
Incidentally, the hardware they run is known as well. The claim should be easy to check.
I assume they are already storing the cache on flash storage instead of keeping it all in VRAM. KV caches are huge - that’s why it’s impractical to transfer to/from the client. It would also allow figuring out a lot about the underlying model, though I guess you could encrypt it.
What would be an interesting option would be to let the user pay more for longer caching, but if the base length is 1 hour I assume that would become expensive very quickly.
Just to contextualize this... https://lmcache.ai/kv_cache_calculator.html. They only have smaller open models, but for Qwen3-32B with 50k tokens it's coming up with 7.62GB for the KV cache. Imagining a 900k session with, say, Opus, I think it'd be pretty unreasonable to flush that to the client after being idle for an hour.
Yes — encryption is the solution for client side caching.
But even if it’s not — I can’t build a scenario in my head where recalculating it on real GPUs is cheaper/faster than retrieving it from some kind of slower cache tier
I somewhat disagree that this is due diligence. Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.
> Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.
Does mmap(2) educate the developer on how disk I/O works?
At some point you have to know something about the technology you're using, or accept that you're a consumer of the ever-shifting general best practice, shifting with it as the best practice shifts.
That might be an absurd comparison, but we can fix that.
If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs, then:
You wouldn’t “need” to understand. The prints would complete regardless. But you might want to. Personal preference.
>If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs,
and the system was being run by some of the planet’s brightest people whose famous creation is well known to disseminate complex information succinctly,
>then:
You would expect to be led to understand, like… a 1997 Prius.
“This feature showed the vehicle operation regarding the interplay between gasoline engine, battery pack, and electric motors and could also show a bar-graph of fuel economy results.” https://en.wikipedia.org/wiki/Toyota_Prius_(XW10)
There are open-source and even open-weight models that operate in exactly this way (as it's based off of years of public research), and even if there weren't the way that LLMs generate responses to inputs is superbly documented.
Seems like every month someone writes up a brilliant article on how to build an LLM from scratch or similar that hits the HN page, usually with fancy animated blocks and everything.
It's not at all hard to find documentation on this topic. It could be made more prominent in the U/I but that's true of lots of things, and hammering on "AI 101" topics would clutter the U/I for actual decision points the user may want to take action upon that you can't assume the user already knows about in the way you (should) be able to assume about how LLMs eat up tokens in the first place.
Not being the Americans is Mistral‘s moat. Cooperating with the exact people who are the reason for the USA‘s loss of trust would force them to do a lot of explaining at home.
I see a significant chance that they’ll continue to blunder the product side. It might still not matter because of their massive distribution, but leaves them open to disruption by a better product (think IE vs. Chrome).
reply