Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Excellent read. Was searching for a full text search engine but not finding any suitable one. Plan to implement one just this way.


Author here: you may want to check out something like Whoosh https://whoosh.readthedocs.io/en/latest/intro.html (it's like a clone of Lucene but in pure Python). I've used this to build some basic search for a small Python website and it was more than fast enough for my purposes :-)


SQLite has a pretty good built-in fts engine: https://www.sqlite.org/fts5.html


Problem is FTS5 isn't included in the most default installation through package managers [I use Fedora]. And recompiling from source breaks a lot of things, as sqlite libraries are generally linked with all apps that use it.


I admit I only used sqlite through the go driver (https://github.com/mattn/go-sqlite3) where using fts5 amounts to one flag during the compile phase.


But SQLite's FTS5 has no support for the `offsets` function...


Many years ago I ran into this paper "Self-indexing inverted files for fast text retrieval" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18....

It's short and to the point. And then I implemented all that ... in PHP and MySQL :)

It feels daunting at first, but once you understand what it wants you to do, it's actually not that hard (for this particular paper, and this particular approach).

However, you do want to employ a stemming library to normalize word forms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: