So just to clarify, is that: Ideal is running the model on a GPU (any brand? Nvidia, AMD, etc.?) with 16GB of GPU RAM, less ideal is running it on the CPU, for which it needs 8GB system RAM? Presumably it will occupy all that memory while it's running?
Khoj uses Llama 2 7B, 4bit quantized. So it just needs 3.5Gb of RAM (GPU or System) [1].
Khoj and your other apps need more RAM themselves, so practically 8GB of System or GPU RAM should suffice.
Khoj has been tested with CUDA and Metal capable GPUs. So Nvidia and Mac M1+ GPUs should work. I'm think it'll work with AMD GPUs out of the box too but let me know if it doesn't for you? I can look into what needs to be done to get that to work.
[1]: The calculation is [params] * [bytes] GB RAM, so 7 * 0.5 = 3.5Gb
Sorry for the repetition, but do you mean 16 GB VRAM? That is a very high requirement, a RTX 4060 only has 8GB and even a RTX 4070 only ships with 12GB. Any upcoming further optimizations for reducing memory usage?
Ideal: 16Gb (GPU) RAM
Less Ideal: 8GB RAM and CPU