Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks! The last full month was all about making sure all the research is as replicable and as trustworthy as I can do it. The original implementation was extremely inefficient, and even once I had the Metal/GPU mmul op working fast, I spent much time to bring the rest of the implementation as close to the Llama.cpp as possible to allow for easier benchmarking.

As for the approaches - it seems the papers you mention do this statically, and din't produce an algorithm for actually speeding up computations with their 20-50% results - which was a large part of the difficulty. I'll probably take some time off one day and do a thorough review of the literature finally.

Ultimately I want to add a citations page with these and some other papers people have posted in the comments soon. I expect sooner or later someone will find this algorithm written up by someone :)

While development asked gpt-4 and googled trying to find information about this method - everything seemed either static, or focused on removing full dimensions/layers ad hoc with eventual retraining. Didn't find anything matching this idea exactly.



Thank you. If you want to contribute novel research, a thorough literature search of prior work is essential. And hopefully a comparison to previous approaches.

If you didn't find these works in your literature search, I encourage you to improve this skill, because it's important and valuable.


True, it will be easier next time because new llms allow to better searching through papers.

Having said that - the tradeoff with getting into a new field is that there is 50 years of history to dig through, and by the time you know exactly what you’re looking for, you don’t need to find it really. Although a better math framework or another approach would help a bit for sure.

I think in the future it will be way easier with better llm searches. Gpt right now fell short in finding some stuff, but it was already very useful in giving me an overview of the field.


One more opinion here: if you're looking around and nobody else has something obviously working, it's ok not to do a thorough search through hundreds of papers hoping to find someone else who came up with the idea. Get it working first, show it off, and if someone comes claiming it was their idea, respectfully acknowledge the prior work. But there's a gigantic gap between "wrote it in a paper" and "actually got it working".

Also, awesome job! Thanks for writing up some seriously cool engineering.


>But there's a gigantic gap between "wrote it in a paper" and "actually got it working".

The modern patent system in a nutshell. It's now suits registering patents rather than inventors in their workshops.


I think of the issues in looking up related work is the vast terminology used. There are many similar works in terms of ideas and implementation, yet they still are not reconciled.


I found gpt to be awesome in this though - it's good at mapping some fields into others.

While working, I kept asking it about certain ideas, and if the idea was known to him, he said "oh, that's this and it's called this and this".

As a fun sidenote, I noticed there are three levels of GPT's answers: - "This has a name and it's...." - "This will be difficult / very hard to do..." (it will never say impossible) - "This is an intriguing idea"

The last one is what one should aim for - I noticed that when it sees an idea that is absolutely new to it, but it doesn't see any specific reason it should not be possible, it will label it as "intriguing".

The topmost compliment I got, when asked about opinion it said more or less "it shows a deep and nuanced understanding of XX, you should publish" (and then began listing steps necessary - testing, finding collabolators etc).

All in all GPT was super useful for getting into a new field - I didn't want to reinvent the wheel, but I didn't want to spend months catching up with the 50 years history of this field. I kept repeatedly asking him for stuff like "I need to do this and this", and it kept pointing me in the right directions and providing nomenclature.


Have a look at https://arxiv.org/pdf/2306.11695.pdf which also uses the norm of inputs based on calibration




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: