Cassandra looks pretty fun - are you suggesting that as the database right? Im thinking a quick python implementation for PUT (maybe DELETE) and meta operations, using cassandra as a backend and Lighttpd for the GET (high performance) might work..... cheers.
If you want a GFS clone, you /really/ want HDFS from the Hadoop project - I'm reading the new O'Reilly book on Hadoop on the moment which is excellent.
No first hand experience here but my understanding is that HDFS and Hadoop in general are geared towards high-throughput and not necessarily low latency. You'll likely have to put a caching layer in front of it if this is still [1] the case. There is already some [2] HTTP accessibility for HDFS, I'm again not 100% clear on its status but if it works well enough a nice caching proxy (varnish) with plenty of memory should help with latency on heavily-accessed files.
That is probably what I meant, too - I read that one of those OS projects had a clone of the GFS, only better (supposedly). Must have been Hadoop, as I haven't looked at many others.
I think there must be open source clones of the FS Google uses, but I don't know the names.