Cofounder of one of those analytics agents here (https://getdot.ai).
The promise of the technology is not that it can deal with any arbitrarily complex Enterprise setup, but rather that you expose it with enough guidance on a controlled and sufficiently good data model.
Depending on your use case this can be super valuable as it enables a lot more people to use data and get relevant recommendations.
But yeah it's work to make nice roads and put up signs everywhere.
I am working with Databricks' Genies. I have a _very_ complex Enterprise data schema(s). Genies, and from what I can tell, your product work on a set of tables ~20 and expect a well thought out and documented data model.
I have hundreds of tables designed by several different teams. I do have decent documentation on the tables but if I had a nice, organized data model I wouldn't need an AI assistant. If I had a perfect data model my team could write simple SQL queries or give chatgpt a schema dump + a natural language query and it would get the answer most of the time.
IMHO, the big value in this space will be when these tools can wrangle realistic databases.
Well, I can't comment much on Genie, but the core question is always how you scale the complexity.
In Dot, it's divide and conquer. If you have several different teams each of them has to maintain their knowledge base.
A bunch of our customer have less than 10 tables hooked up to Dot, but this data is core to their business and so the analytics agent is really useful. Our most complex setup is on more than 5000 tables, but that was a lot more work to lay out the structure and guidelines.
Also, I don't think all organization are ready for AI. If the data model is a huge mess, data quality is poor and analytics use cases are not mature, it's better to focus on the fundamentals first.
The darker the ink the more it hurts, however if you apply the expensive recommended creams beforehand it is very bareable, anyone can endure a 20 minute removal session.
Pretty much if you could do the tattoo you can get it removed, first times hurt a bit but I have vague memories that it hurt like hell when I got it done black-out drunk in Aiya Napa, Cyprus as an 18 year old as well.
If you're thinking about removing a tattoo pain should be really low on your list of considerations, I genuinely mean it (both because sessions are short and it doesn't hurt much)
It won't be for long. That can only exist because of the FDA Shortage List, and with the official manufacturers releasing single dose vials, it is only a matter of "when" not "if" it won't be classed as shortage, and you're back to paying $1K.
The 14th way is “multi way joins” also called “worst case optimal joins” which is a terrible name.
It means instead of joining tables two at a time and dealing with the temporary results along the way (eating memory), you join 3 or more tables together without the temporary results.
Negating inputs (set complement) turns the join's `AND` into a `NOR`, as Tetris exploits.
The worst case bounds don't tighten over (stateless/streaming) WCOJ's, but much real world data has far smaller box certificates.
One thing I didn't see is whether Dovetail join allows recursive queries (i.e., arbitrary datalog with a designated output relation, and the user having no concern about what the engine does with all the intermediate relations mentioned in the bundle of horn clauses that make up this datalog query).
Do you happen to know if it supports such queries?
It seems like narrow tables solve having NULLs in the tables you store, but they do nothing about NULLs in the tables you create using, say, a LEFT JOIN. Like, if you create a database with Name, Postnomials, and Prenomials, some people don't have Postnomials or Prenomials, so even if you create three narrow tables, when you JOIN them all to form the full polite addresses, you'll end up with NULLs in the result of that JOIN.
It works a little different in “Rel” (the query language Relational.ai uses). You would create multiple definitions of what a “full polite address” is for each “case” of valid arguments/empty columns and use that going forward. A bit like a UNION without the same column width requirements.
I would think the right approach to "SQL without LEFT JOIN" would be just to focus on making pulling down multiple related tables as distinct resultsets in a single query easier and have the client code work with a graph instead of hammering everything into a single tabular layout. Or leave the concept of "connect these two tables together and make them NULL where not applicable" as an exercise for the client.
Quite the opposite. The idea is to move as much of the business logic into the database. “Rel” definitions are meant to be written once and reused everywhere. Instead of letting the client decide different business logic every time, you capture and control it in one place.
They don't have joins at all because of how expensive binary joins are to do. NoSQL pre-joins relations (graphs dbs), pre-joins foreign keys (document dbs) , pre-joins everything (queries) (wide column dbs). Saying "all" is a bit of a hyperbole but it gets to the point of the matter.
It took about 10 years, but worst case optimal joins and multi-way joins in general are finally fixing the Join problem in databases that led to the proliferation of NoSQL systems over the past decade.
Author here: I did not write that I was proud of not knowing Python. I just wrote that I don't know Python. The thought of trying to understand 2k lines of it looking to see where Memgraph 'cheated' to make their product look good and the other bad was beyond my current capabilities.
Author here to clear up a few questions: I did not run any benchmarks for Memgraph, just Neo4j on my machine and compared them to their numbers on their machine. My 8 faster cores to their 12 slower cores, so not apples to apples, but close enough to make the point that Memgraph is not 120x times faster than Neo4j. I used to work at Neo4j, then at AWS for Neptune, I work on my own graph database http://ragedb.com/, and work for another database company https://relational.ai/
Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)
Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?
Those algorithms are based on linear algebra and matrix operations on sparse matrices (which are like compressed bitmaps on speed, re: https://github.com/RoaringBitmap/RoaringBitmap ). The insight is that the adjacency list of a property-graph is actually a matrix, and then you can use linear algebra on it. But it may require the DB is built bottom up with matrices in mind from the start (instead of linked lists like Neo4j does). Maybe your double array approach in RageDB could be made to fit..
I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531
Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”
That way, for such queries one doesn't have to traverse the graph database as a discovery process through what each node "knows about", but could view and operate on the database from a God-like perspective, similar to table operations in relational databases.
The lack of a Schema does hurt Neo4j performance. Properties are stored "willy nilly" on a linked list of bytes per node/relationship. No order, an "age" property can be: 45, 38.5, "adult", [18,19], false... and that makes a terrible mess when aggregating, sorting, filtering, searching, etc.
Anyone selling you 99% accuracy can prove it there first.