Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have not written about it publicly at any length. Much narrower versions targeting specific data models have been used in production and I've licensed bits and pieces to big tech companies. The data structures and algorithms are non-obvious, and the join mechanics that enable arbitrary relationship type traversals on the representation hurts the brain (it really isn't visualizable). It is just an elegant instantiation of adaptive information theoretic embeddings in very high dimensionality spaces, reduced to fast database algorithms. My previous startup was based on related but very early and incomplete computer science, which was written about.

The storage engine is extremely fast but the only unusual capabilities it has are architectural: the ability to continuously, in the background, reshard data (many tens of thousands of times per second per inexpensive server) and shift shards between storage engines. If you think about it, all this really requires from a storage engine is the ability to concurrently create and destroy logical files at an extremely high rate, much higher than a typical file/operating system allows. Some of the internal algorithms are novel but it is still just a storage engine. It is tuned for petabyte storage densities per server -- it was originally designed for exabyte-scale sensor data models.

None of my database work has ever been open sourced AFAIK, though many companies have older designs. The biggest practical hurdle to open sourcing is that it would require many man-months of tedious unpaid work and I have zero desire to do that. It is also a production-grade research project; I currently have no obligation, explicit or implied, to maintain any kind of compatibility if I feel like redesigning some aspect of it. That said, I also want to get away from the current reality that every company wants someone to build their own slight variation of these designs.



Andrew, everything you reveal publicly has been tantalizing! (See https://www.jandrewrogers.com/ for those who haven't stumbled upon his posts yet)

"There is virtually no literature on practical representations of topological spaces, never mind parallel algorithms using those representations. A thorough exposition of both the theory and practice is on the order of a few hundred pages of dense technical literature that no one has had time to write, despite multiple implementations. Watch this space." - October 2015, J. Andrew Rogers.

I emailed you back in 2016 to inquire about your work and wondered what had become of SpaceCurve. (Thank you for replying!) You mentioned recent work then on a "modality architecture." Is that related to the work you mentioned in your post above?

Obviously, you're a busy man with a desire and the potential to change the world with your creations. But perhaps also a drive to withhold your creations from public display until only after you have them distilled to their purest elegance?

If it is your intention to eventually share, I encourage you to do the world a great favor and just share what you've got so far (with a "no guarantees; no support" reminder in your README), even if some corners are unpolished, inscrutable, or built on shifting ideas. With an appropriate license, you'll at least get the benefit of easily taking bits of your implementations with you between projects, even without supporting anyone else who consumes it.

Do you have any peers who are familiar enough and excited about your work to start writing up some posts laying out the conceptual ground-work? Have there been any relevant research papers or books published that would be foundational to understanding? Maybe start with links to those? I'd devour them!

On the other hand, perhaps you are motivated not to share, while your skills are highly marketable due to near exclusivity? If so, I certainly don't begrudge you that! And like you said, you have no obligations. :)


"In essence, you can only make money if you are doing hardcore R&D. This strongly incentivizes the creation of new capabilities but also disincentivizes publication of CS research.

"You see this in markets like databases, where open source has captured almost the entire market for undifferentiated capabilities, and there is a lucrative high-end market with unique product capabilities that don't exist in open source or CS literature. The trend toward treating CS research as trade secrets, originally started because algorithm patents were impractical to enforce, turned out to be effective at maintaining profitability in high-end software products if open source can't replicate capability."

https://news.ycombinator.com/item?id=20196610

Ah drat, apparently my fears are confirmed. If you should someday have enough money and not enough fame, I'll be eagerly looking forward to hearing the lessons you're willing to share.


Can you point to some companies and use cases they have?

Perhaps some of the companies have done conference talks on the systems built on top of your research?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: