Oh interesting, so you're not using Llama 2, you're using the original. Have you...

sabaimran · on July 30, 2023

Quick clarification, we are using LlamaV2 7B. We didn't experiment with Llama 1 because we weren't sure of the licensing limitations.

We determine note relevance by using cosine similarity between the query and the knowledge base (your note embeddings). We limit the context window for Llama2 to 3 notes (while OpenAI might comfortably take up to 9). The notes are ranked based on most to least similar and truncated based on the context window limit. For the model we're using, we're still limited to 2048 tokens for Llama v2.

bugglebeetle · on July 31, 2023

Have you looked at using the long context (32K) version of the Llama v2 7B released by Together AI?

https://together.ai/blog/llama-2-7b-32k

110 · on July 31, 2023

Oh neat, thanks for sharing that! Having a 32K offline model is pretty promising. Let me test out how it performs

OkGoDoIt · on July 31, 2023

I thought llama V2 has a context window of 4096?