Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh interesting, so you're not using Llama 2, you're using the original. Have you begun to evaluate Llama 2 to determine the differences in performance?

How are you determining what notes (or snippets of notes?) to be injected as context? Especially given the small 2048 context limit with Llama 1.



Quick clarification, we are using LlamaV2 7B. We didn't experiment with Llama 1 because we weren't sure of the licensing limitations.

We determine note relevance by using cosine similarity between the query and the knowledge base (your note embeddings). We limit the context window for Llama2 to 3 notes (while OpenAI might comfortably take up to 9). The notes are ranked based on most to least similar and truncated based on the context window limit. For the model we're using, we're still limited to 2048 tokens for Llama v2.


Have you looked at using the long context (32K) version of the Llama v2 7B released by Together AI?

https://together.ai/blog/llama-2-7b-32k


Oh neat, thanks for sharing that! Having a 32K offline model is pretty promising. Let me test out how it performs


I thought llama V2 has a context window of 4096?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: