Ah, you're right, forgot to mention that. We use the Llama 2 7B 4 bit quantized ...

danparsonson · on July 31, 2023

So just to clarify, is that: Ideal is running the model on a GPU (any brand? Nvidia, AMD, etc.?) with 16GB of GPU RAM, less ideal is running it on the CPU, for which it needs 8GB system RAM? Presumably it will occupy all that memory while it's running?

What about if I have a GPU with 8GB?

110 · on July 31, 2023

Khoj uses Llama 2 7B, 4bit quantized. So it just needs 3.5Gb of RAM (GPU or System) [1].

Khoj and your other apps need more RAM themselves, so practically 8GB of System or GPU RAM should suffice.

Khoj has been tested with CUDA and Metal capable GPUs. So Nvidia and Mac M1+ GPUs should work. I'm think it'll work with AMD GPUs out of the box too but let me know if it doesn't for you? I can look into what needs to be done to get that to work.

[1]: The calculation is [params] * [bytes] GB RAM, so 7 * 0.5 = 3.5Gb

HexDecOctBin · on July 31, 2023

Sorry for the repetition, but do you mean 16 GB VRAM? That is a very high requirement, a RTX 4060 only has 8GB and even a RTX 4070 only ships with 12GB. Any upcoming further optimizations for reducing memory usage?

PS. Nice to see an Hindi name for a software. For those who don't speak Hindi: https://en.m.wiktionary.org/wiki/%E0%A4%96%E0%A5%8B%E0%A4%9C...

intended · on Aug 2, 2023

One of the reasons 4090 / 3090s are expensive these days. It’s an issue with the models, not with Khoj.