If you have the hardware to run expensive models, is the cost of electricity much of a factor? According to Google, the average price in the Silicon Valley Area is $0.448 per kWh. An RTX 5090 costs about $4,000 and has a peak power consumption of 1000 W. Maxing out that GPU for a whole year would cost $3,925 at that rate. It's not particularly more expensive than that hardware itself.
At that point it'd be cheaper to get an expensive subscription to a cloud platform AI product. I understand the case for local LLMs but it seems silly to worry about pricing for cloud-based offerings but not worry about pricing for locally run models. Especially since running it locally can often be more expensive
My optician's office charges an extra $100 for blue light filtering. They at least make it clear it's optional but recommended for frequent screen use.
1) it corrected my eye with a slight astigmatism, but over corrected my other eye, so the agregate was pretty much the same
2) the aformentioned blue filter, which is part of the anti-glare coating.
3) my non-astigmatic eye was incorrectly marked with a prescription
4) I don't actually need glasses because I can see to the bottom of the eye chart without them. Its just as I'm now older, my vision is not as good as they used to be.
5) these were designed for "close work" but actually don't really help me focus closer to me.
However, arguing that, as a non-proffesional with only a passing understanding of optics (non-biological) with a large multinational company doesn't seem like a good use of time.
From the title of the issue: “Form-associated custom elements”. They are talking about the mechanism by which a custom element can have its value included when you submit a form. Here’s the spec.:
I realize that the GP is trolling, but he accidentally has a good point. It is useful to know what censorship goes into a model. Apparently Copilot censors words related to gender and refuses to autocomplete them. That'd be frustrating to work with. A Chinese model censoring terms for events they want to memory hole would have zero impact on anything I would ever work on.
I benchmarked unsloth/Qwen3-Coder-Next-GGUF using the MXFP4_MOE (43.7 GB) quantization on my Ryzen AI Max+ 395 and I got ~30 tps. According to [1] and [2], the AI Max+ 395 is 2.4x faster than the AI 9 HX 370 (laptop edition). Taking all that into account, the AI 9 HX 370 should get ~13 tps on this model. Make of that what you will.
In my short testing on a different MoE model, it does not perform well. I tried running Kimi-K2-Thinking-GGUF with the smallest unsloth quantization (UD-TQ1_0, 247 GB), and it ran at 0.1 tps. According to its guide, you should expect 5 tps if the whole model can fit into RAM+VRAM, but if mmap has to be used, then expect less than 1 tps which matches my test. This was on a Ryzen AI Max+ 395 using ~100 GB VRAM.
Running a 247GB model reliably on 100GB VRAM total is a very impressive outcome no matter what the performance. That size of model is one where sensible people will recommend at least 4x the VRAM amount compared to what you were testing with - at that point, the total bandwidth to your storage becomes the bottleneck. Try running models that are just slightly bigger than the amount of VRAM you're using and these tricks become quite essential, for a significantly more manageable hit on performance.
You don't have to statically allocate the VRAM in the BIOS. It can be dynamically allocated. Jeff Geerling found you can reliably use up to 108 GB [1].
I really wish they had used func instead, it would have saved this confusion and allowed for “auto type deduction” to be a smaller more self contained feature
Indeed. I am a frequent critic of the c++ committee’s direction and decisions. There’s no direction other than “new stuff” and that new stuff pretty much has to be in the library otherwise it will require changes that may break existing code. That’s fine.
But on the flip side, there’s a theme of ignoring the actual state of the world to achieve the theoretical goals of the proposal when it suits. Modules are a perfect example of this - when I started programming professionally modules were the solution to compile times and to symbol visibility. Now that they’re here they are neither. But we got modules on part. The version that was standardised refused to accept the existence of the toolchain and build tools that exist, and as such refused to place any constraints that may make implementation viable or easier.
St the same time we can’t standardise Pragma once because some compiler may treat network shares or symlinks differently.
There’s a clear indication that the committee don’t want to address this, epochs are a solution that has been rejected. It’s clear the only real plan is shove awkward functional features into libraries using operator overloads - just like we all gave out to QT for doing 30 years ago. But at least it’s standardised this time?
reply