Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I do a lot of these design experiments. I have an idea such as "what if wikipedia was designed only with reading comprehension in mind" and then I see what taking it to the most ridiculous extreme does. Sometimes it pans out, like I'm fairly happy with this wikipedia mirror, sometimes it doesn't--like I have no logos or branding anywhere on my sites, that turned out to be a bit confusing.

My mirror clocks in at 21 Gb. I'm self hosting so there's no real money cost besides the one-time investment in hardware. A really chap person could probably host this off a raspberry pi attached to something like a Corsair Voyager and serve multiple requests per second.

There's some technical trickiness in actually storing the files in a file system, since most filesystems don't deal well with having millions of files in the same folder. So I've had to build a four-tier directory structure based on the hash of the file name to be able to store the files. They're stored in a structure like 31/444/781/225/foobar.gz.

I did some experiments storing them as BLOBs in a database, but I couldn't get it to work.



Interesting. What are the bandwidth considerations for this? I had the impression that many ISPs make self-hosting impractical on normal household plans, but then I've never looked into it seriously.


Dunno, I have unlimited uploads and downloads @ 100 mbit, and I host both this Wikipedia mirror as well as a search engine and some other stuff.

The bandwidth usage is pretty inconsequential since most of my pages clock in at just a few kilobytes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: