bartdegoede's comments

bartdegoede · 2026-02-09T20:02:50 1770667370

A few years ago I wrote a post about implementing tf-idf in 150 lines of python. The data source it relied on was removed (turns out Yahoo! isn't as relevant anymore).

bartdegoede · 2025-10-21T17:18:14 1761067094

Adventures with writing a novel with LLMs

bartdegoede · on March 25, 2021

Added an MIT license. Have fun :-)

bartdegoede · on March 25, 2021

The idea was to illustrate the concepts of an inverted index and tf-idf :-) I've built some stuff like this for a smaller project written in Python because it was Easier™ than spinning up an Elasticsearch cluster (ie it definitely didn't need to scale :-D)

sodapopcan · on March 25, 2021

Yes, I've been really interested in FTS recently and love articles like this. I'm currently implementing a paired-down version myself in Elixir because I'm searching one "table" (not postgres) and I don't want to bring in external dependencies for it.

bartdegoede · on March 25, 2021

Author here: you may want to check out something like Whoosh https://whoosh.readthedocs.io/en/latest/intro.html (it's like a clone of Lucene but in pure Python). I've used this to build some basic search for a small Python website and it was more than fast enough for my purposes :-)

bartdegoede · on May 3, 2018

Mailinabox (https://mailinabox.email/guide.html#admin) uses it to get SSL certificates too; you can see it in action here https://github.com/mail-in-a-box/mailinabox/blob/master/mana...

bartdegoede · on March 9, 2018

It will contain all content for pages that are live (in my case that's only 2 so far: http://bart.degoe.de/js/search/index.json), so your user will download all the content you want to have available for search on your site on page load, and then, depending on your index configuration, load more data in memory.

Lunr has an analysis pipeline that will generate a bunch of tokens for queries to match on, and you can do pretty much anything there.

It won't scale to thousands of pages, but I don't have thousands of them anyway :-)