Hacker Newsnew | past | comments | ask | show | jobs | submit | GistNoesis's commentslogin

I just tried with llama.cpp RTX4090 (24GB) GGUF unsloth quant UD_Q4_K_XL You can probably run them all. G4 31B runs at ~5tok/s , G4 26B A4B runs at ~150 tok/s.

You can run Q3.5-35B-A3B at ~100 tok/s.

I tried G4 26B A4B as a drop-in replacement of Q3.5-35B-A3B for some custom agents and G4 doesn't respect the prompt rules at all. (I added <|think|> in the system prompt as described (but have not spend time checking if the reasoning was effectively on). I'll need to investigate further but it doesn't seem promising.

I also tried G4 26B A4B with images in the webui, and it works quite well.

I have not yet tried the smaller models with audio.


> I'll need to investigate further but it doesn't seem promising.

That's what I meant by "waiting a few days for updates" in my other comment. Qwen 3.5 release, I remember a lot of complaints about: "tool calling isn't working properly" etc.

That was fixed shortly after: there was some template parsing work in llama.cpp. and unsloth pulled out some models and brought back better one for improving something else I can't quite remember, better done Quantization or something...

coder543 pointed out the same is happening regarding tool calling with gemma4: https://news.ycombinator.com/item?id=47619261


The model does call tools successfully giving sensible parameters but it seems to not picking the right ones in the right order.

I'll try in a few days. It's great to be able to test it already a few hours after the release. It's the bleeding edge as I had to pull the last from main. And with all the supply chain issues happening everywhere, bleeding edge is always more risky from a security point of view.

There is always also the possibility to fine-tune the model later to make sure it can complete the custom task correctly. But the code for doing some Lora for gemma4 is probably not yet available. The 50% extra speed seems really tempting.


If you are running on 4090 and get 5 t/s, then you exceeded your VRAM and are offloading to the CPU (or there is some other serious perf. issue)

Thank you. I have the same card, and I noticed the same ~100 TPS when I ran Q3.5-35B-A3B. G4 26B A4B running at 150TPS is a 50% performance gain. That's pretty huge.

TLDR: It's easy : LLM outputs are untrusted. Agents by virtue of running untrusted inputs are malware. Handle them like the malware they are.

>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html

These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.

The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.

So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.

Remember until yesterday Anthropic aka Claude was officially a supply chain risk.

If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.

Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.

Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.

If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.

Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.

Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746


He even attempts to improve on the paper by replacing the random rotation operation which is O(d^2), by a Subsampled Randomized Hadamard Transform which can be computed in O(d*log d).

Hopefully Johnson–Lindenstrauss lemma applies in the same way for SRHTransformed vectors as they do for randomly rotated vectors and the independence of the distribution laws of the coordinates remains and therefore the quantization of each coordinates independently is still theoretically sound.


A little vibecoded experiment I did today.

Using local only model with llama.cpp, using Qwen3.5-35B-A3B-UD-Q4_K_XL, copy pasting the full generated html page (no advance editor which do incremental file modification :) )

30 versions ~200k tokens one afternoon.

You can view all the files on https://github.com/unrealwill/watermolecules

You can also open the various files directly, some work, some are buggy : For example https://unrealwill.github.io/watermolecules/watermolecule20.... is OK.

https://unrealwill.github.io/watermolecules/watermolecule19.... is a funny bug.

I was just playing writing a 2d chemical simulator to simulate liquids at the molecular level.

It's not physics accurate at all. For it to be considered a liquid, it needs to have the molecule close enough that molecular force have an impact. I display the average distance to the nearest neighbor. At normal condition of temperature and pression, the average distance between water molecules should be roughly 3 angstroms.

Here the average will depend on the number of particles and the size of your browser window and zoom level.

The Van der Waals forces (the force between the molecules) have a range of 50 in this simulation, so this is something in between a liquid and a gas.

The rotations of the molecules are just for the visual effect.

I have also added auto-ionization of water. And chemical reactions.

Energy conservation and other conservation are not conserved. The simulation is more stable on the "high" level of quality.

Probably won't update further, but some agents may use your various remarks to create better simulations.


I think it boils down to the alternate view of rotations as two successive reflections.

You can then use householder matrix to avoid trigonometry.

These geometric math tricks are sometimes useful for efficient computations.

For example you can improve Vector-Quantization Variational AutoEncoder (VQ-VAE) using a rotation trick, and compute it efficiently without trigonometry using Householder matrix to find the optimal rotation which map one vector to the other. See section 4.2 of [1]

The question why would someone avoid trigonometry instead of looking toward it is another one. Trigonometry [2] is related to the study of the triangles and connect it naturally to the notion of rotation.

Rotations [3] are a very rich concept related to exponentiation (Multiplication is repeated addition, Exponentiation is repeated multiplication).

As doing things repeatedly tend to diverge, rotations are self stabilizing, which makes them good candidates as building blocks for the universe [4].

Because those operations are non commutative, tremendous complexity emerge just from the order in which the simple operations are repeated, yet it's stable by construction [5][6]

[0]https://en.wikipedia.org/wiki/Householder_transformation

[1]https://arxiv.org/abs/2410.06424

[2]https://en.wikipedia.org/wiki/Trigonometry

[3]https://en.wikipedia.org/wiki/Matrix_exponential

[4]https://en.wikipedia.org/wiki/Exponential_map_(Lie_theory)

[5]https://en.wikipedia.org/wiki/Geometric_algebra

[6]https://en.wikipedia.org/wiki/Clifford_algebra


citing the Wikipedia page for trigonometry makes this feel a lot like you just told an LLM the expected comment format and told it to write insightful comments


I had to check the precise definition for trigonometry while writing my comment, found it interesting so I added a reference.

As with many subject that we learn early in school, it's often interesting revisiting them as adult to perceive additional layer of depth by casting a new look.

With trigonometry we tend to associate it with circle. But fundamentally it's the study of tri-angles.

What is interesting is that the whole theory is "relative". I would reference the wikipedia page for angle but it may make me look like an LLM. The triangle doesn't have positions and orientation baked-in, what matters is the length of the sides and the angle between them.

The theory by definition becomes translation and rotation invariant. And from this symmetry emerge the concept of rotations.

What is also interesting about the concept of angle is that it is a scalar whereas the original objects like lines live in an higher dimension. To avoid losing information you therefore need multiple of these scalars to fully describe the scene.

But there is a degree of redundancy because the angles of a triangle sums to pi. And from this degree of freedom results multiple paths to do the computations. But with this liberty comes the risks of not making progress and going in circles. Also it's harder to see if two points coming from different paths are the same or not, and that's why you have "identities".

Often for doing the computation it's useful to break the symmetry, by picking a center, even though all points could be centers, (but you pick one and that has made all the difference).

Similar situation arise in Elliptic Curve Cryptography, where all points could have the same role, but you pick one as your generator. Also in physics the concept of gauge invariance.


Interesting work. It's a nice introduction to usage of holography.

We can etch the inside of a photosensitive material by focusing a laser at a specific point, and moving this point of focus. That's what is done in [3] [4].

But here instead of doing this sequentially they print all points simultaneously using holography.

Here they use holography to light volumetrically some photosensitive resin, in a similar fashion as it used to be done for volumetric display (In [2] you can find a figure of using an agarose gel tank as a display for volumetric hologram).

They just put a new "spin" on it, by spinning mirror around the resin tank, to project from all direction, to be able to the back of objects. The technique called "Digital Incoherent Synthesis of Holographic Light Fields" paints the resin bath with a 3D-paint brush, sequentially from multiple direction. It's called incoherent because each angle is treated independently from the other and it's light doesn't need to be interfered (in the wave sense).

The natural extension of using a conical mirror instead of spinning the mirror would need to also consider the interference of light of nearby angles making the inverse problem computation harder, and need to have higher resolution, but would avoid moving objects.

Here the holography is a fancy way of focusing the light where we want the resin to cure. It needs to have a resin with optical properties which don't change once cured or then the light behind the cured resin won't be focused where it should, even though it should cure everywhere simultaneously. Unfocused light is still absorbed by the resin which contributes to the curing, but photosensitive resin are non linear meaning nothing happens until you cross a threshold.

To do this holography, they use Digital Micromirror Devices (DMD) : a chip which has an array of micro mirror pixels. [0]

Although these mirrors are on-off only, some technique from the 1970s, allows you to control the shape of the light-field, in amplitude, phase and polarization [1].

DMD is a chip located inside the widely available technology available in Digital Light Projector. They are used as display and that's how you control the mirrors. You use them as a screen over the HDMI interface to display the right pattern. Then you just have to bounce some laser on it to have a "structured light" beam, from which you use a few lenses and hole (in a 4f arragement) to extract the "mode" you are interested whose light will refocus itself at the right 3d points.

The limits of this technique is due to the resolution of the DMD (as explained in [2]), where the smaller the pixel size the better. But here this limit is mitigated by integrating over time and angle, because what matters is resin exposition time.

[0] "Structuring Light with Digital Micromirror Devices (Photonics West 2021)" https://www.youtube.com/watch?v=vurtdU0FRm4

[1] Binary amplitude holograms for shaping complex light fields with digital micromirror devices https://www.institut-langevin.espci.fr/biblio/2025/1/18/2280...

[2] Holographic video display using digital micromirrors (Invited Paper) [the raven has the key]

[3] "how to put 3D images into glass or crystal objects 3d crystal Inside carving" https://www.youtube.com/watch?v=dkK6c45U6EU

[4] "What is Sub-surface Laser Engraving or a 'Bubblegram'? Technology Explained" https://www.youtube.com/watch?v=sOrby692Uag


It's even worse than that.

The positives outcomes are structurally being closed. The race to the bottom means that you can't even profit from it.

Even if you release something that have plenty of positive aspects, it can and is immediately corrupted and turned against you.

At the same time you have created desperate people/companies and given them huge capabilities for very low cost and the necessity to stir things up.

So for every good door that someone open, it pushes ten other companies/people to either open random potentially bad doors or die.

Regulating is also out of the question because otherwise either people who don't respect regulations get ahead or the regulators win and we are under their control.

If you still see some positive door, I don't think sharing them would lead to good outcomes. But at the same time the bad doors are being shared and therefore enjoy network effects. There is some silent threshold which probably has already been crossed, which drastically change the sign of the expected return of the technology.


I like the game up until 7 or 8 notes, but it keeps adding note.

I couldn't find a setting to freeze the difficulty where it's comfortable and where the melody can still be construed to make sense.

When adding more notes, it breaks the flow and turn a training for pitch practicing into a memory game for rain man, even more so when we make a mistake and must redo the melody partially.


Hmm - I was hoping that the Practice Mode would cover this, but I think there's probably some room to add an option where you can freeze the difficulty level - I'll see if I can add this later in the evening.

The "construed melody" is a harder problem. I've been playing around with the idea of using a markov model or even borrowing what CPU Bach did to try to create more coherent melodies over time.

Thanks for the feedback!


Transformers are nice. You can train a very minimal network that can output reasonable sequences very easily. They won't be high quality, or too pretty, but they will "make sense" way more than randomness and (usually) change keys in a coherent way.


Hey viraptor! I haven't even thought about using transformers but that sounds like a great idea. The current generator is just a standard random walk across the major/minor intervals and could definitely use some TLC!


Have you checked his physical keyboard ?

My laptop is getting old, and some keys need to be pressed with more insistence and more accurately for them to register properly. It also breaks the flow, and muscle memory for things like passwords. It also lead to letter inversions, because the tonic accent need to be put on letter which need to be pressed more, rather than on the first letter of the word. It's driving me crazy but unfortunately computer are too expensive for now (and it's probably only getting worse).



Lasers in space are fun! We[1] are actually doing this for real but automated and inversed -- launching a satellite with a laser to beam data down to Earth. Like these searchlights, but from orbit!

[1] A bunch of students at https://satlab.agh.edu.pl


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: