Yeah, nice, you could also consider a low rank factorization as well.

kolinko · on April 19, 2024

I looked into it at the beginning, but as far as I understand, the modern models like Mistral are difficult to do LoRA on - you can use it to finetune, but the model itself doesn't lend itself to such an operation.

I'm still quite new to the field, so I'd appreciate some more insights into this, and a correction.