Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great project and excellent initiative to learn about embeddings. Two possible avenues to explore more. Your system backend could be thought of as being composed of two parts: |Icons->Embedder->|PGVector|->Retriever->Display Result|

1. In the embedder part trying out different embedding models and/or vector dimensions to explore if the Recall@K & Precision@K for your data set (icons) improves. Models make a surprising amount of difference to the quality of the results. Try the MTEB Leaderboard for ideas on which models to explore.

2. In the Information Retriever part you can try a couple of approaches: a.after you retrieve from PGVector see if you can use a reranker like Cohere to get better results https://cohere.com/blog/rerank

b.You could try a "fusion ranking" similar to the one you do but structured such that 50% of the weight is for a plain old keyword search in the metadata and 50% is for the embedding based search

Finally something more interesting to noodle on - what if the embeddings were based on the icon images and the model knew how to search for a textual descriptions in the latent space?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: