Hacker Newsnew | past | comments | ask | show | jobs | submit | maxdemarzi's commentslogin

The current benchmark for real world text to sql is https://spider2-sql.github.io/ and the SOTA is at 64%

Anyone selling you 99% accuracy can prove it there first.


Cofounder of one of those analytics agents here (https://getdot.ai).

The promise of the technology is not that it can deal with any arbitrarily complex Enterprise setup, but rather that you expose it with enough guidance on a controlled and sufficiently good data model.

Depending on your use case this can be super valuable as it enables a lot more people to use data and get relevant recommendations.

But yeah it's work to make nice roads and put up signs everywhere.


I am working with Databricks' Genies. I have a _very_ complex Enterprise data schema(s). Genies, and from what I can tell, your product work on a set of tables ~20 and expect a well thought out and documented data model.

I have hundreds of tables designed by several different teams. I do have decent documentation on the tables but if I had a nice, organized data model I wouldn't need an AI assistant. If I had a perfect data model my team could write simple SQL queries or give chatgpt a schema dump + a natural language query and it would get the answer most of the time.

IMHO, the big value in this space will be when these tools can wrangle realistic databases.


Well, I can't comment much on Genie, but the core question is always how you scale the complexity.

In Dot, it's divide and conquer. If you have several different teams each of them has to maintain their knowledge base.

A bunch of our customer have less than 10 tables hooked up to Dot, but this data is core to their business and so the analytics agent is really useful. Our most complex setup is on more than 5000 tables, but that was a lot more work to lay out the structure and guidelines.

Also, I don't think all organization are ready for AI. If the data model is a huge mess, data quality is poor and analytics use cases are not mature, it's better to focus on the fundamentals first.


I don't see Grok there


I’ve had tattoos removed and currently removing two more. They are all 20 years old, very faded, blurry they honestly just didn’t look good any more.

The first removal 5 years ago would leave me bleeding and require bandage changes for a week.

The new tattoo removal lasers leave you feeling like a sun burn for two days. No blood, no bandages.

It will become way more common.


Are the new lasers very painful when applied ?


The darker the ink the more it hurts, however if you apply the expensive recommended creams beforehand it is very bareable, anyone can endure a 20 minute removal session.

Pretty much if you could do the tattoo you can get it removed, first times hurt a bit but I have vague memories that it hurt like hell when I got it done black-out drunk in Aiya Napa, Cyprus as an 18 year old as well.

If you're thinking about removing a tattoo pain should be really low on your list of considerations, I genuinely mean it (both because sessions are short and it doesn't hurt much)


It’s $275 a month plus $99 every 6 months at SlimDownRX. The compounded pharmacies are helping.


It won't be for long. That can only exist because of the FDA Shortage List, and with the official manufacturers releasing single dose vials, it is only a matter of "when" not "if" it won't be classed as shortage, and you're back to paying $1K.


Why would classifying it a shortage make it expensive?


Classifying it as a shortage makes it cheap. It will get expensive when it is off the shortage list.


That makes no sense.


The FDA shortage list allows compound pharmacies to produce it without paying for patents/licensing.


Being on the shortage list is what allows compounders to sell it. Once that is no longer true, that option goes away.


$39/month from the top chinese vendors


Tell me more…


The 14th way is “multi way joins” also called “worst case optimal joins” which is a terrible name.

It means instead of joining tables two at a time and dealing with the temporary results along the way (eating memory), you join 3 or more tables together without the temporary results.

There is a blog post and short video of this on https://relational.ai/blog/dovetail-join and the original paper is on https://dl.acm.org/doi/pdf/10.1145/3180143

I work for RelationalAI, we and about 4 other new database companies are bringing these new join algorithms to market after ten years in academia.


Justin also has a post on WCOJ that's really solid:

https://justinjaffray.com/a-gentle-ish-introduction-to-worst...


Negating inputs (set complement) turns the join's `AND` into a `NOR`, as Tetris exploits.

The worst case bounds don't tighten over (stateless/streaming) WCOJ's, but much real world data has far smaller box certificates.

One thing I didn't see is whether Dovetail join allows recursive queries (i.e., arbitrary datalog with a designated output relation, and the user having no concern about what the engine does with all the intermediate relations mentioned in the bundle of horn clauses that make up this datalog query).

Do you happen to know if it supports such queries?


You can by using “narrow” tables (key value and key key). Download the slide notes from this presentation of how Relational.ai is doing it: https://www.slideshare.net/maxdemarzi/developer-intro-deckpo...


It seems like narrow tables solve having NULLs in the tables you store, but they do nothing about NULLs in the tables you create using, say, a LEFT JOIN. Like, if you create a database with Name, Postnomials, and Prenomials, some people don't have Postnomials or Prenomials, so even if you create three narrow tables, when you JOIN them all to form the full polite addresses, you'll end up with NULLs in the result of that JOIN.


It works a little different in “Rel” (the query language Relational.ai uses). You would create multiple definitions of what a “full polite address” is for each “case” of valid arguments/empty columns and use that going forward. A bit like a UNION without the same column width requirements.


That seems clumsy.

I would think the right approach to "SQL without LEFT JOIN" would be just to focus on making pulling down multiple related tables as distinct resultsets in a single query easier and have the client code work with a graph instead of hammering everything into a single tabular layout. Or leave the concept of "connect these two tables together and make them NULL where not applicable" as an exercise for the client.


Quite the opposite. The idea is to move as much of the business logic into the database. “Rel” definitions are meant to be written once and reused everywhere. Instead of letting the client decide different business logic every time, you capture and control it in one place.


They don't have joins at all because of how expensive binary joins are to do. NoSQL pre-joins relations (graphs dbs), pre-joins foreign keys (document dbs) , pre-joins everything (queries) (wide column dbs). Saying "all" is a bit of a hyperbole but it gets to the point of the matter.


It took about 10 years, but worst case optimal joins and multi-way joins in general are finally fixing the Join problem in databases that led to the proliferation of NoSQL systems over the past decade.


Author here: I did not write that I was proud of not knowing Python. I just wrote that I don't know Python. The thought of trying to understand 2k lines of it looking to see where Memgraph 'cheated' to make their product look good and the other bad was beyond my current capabilities.


Author here to clear up a few questions: I did not run any benchmarks for Memgraph, just Neo4j on my machine and compared them to their numbers on their machine. My 8 faster cores to their 12 slower cores, so not apples to apples, but close enough to make the point that Memgraph is not 120x times faster than Neo4j. I used to work at Neo4j, then at AWS for Neptune, I work on my own graph database http://ragedb.com/, and work for another database company https://relational.ai/

If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...


> If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...

Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)

Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?

Those algorithms are based on linear algebra and matrix operations on sparse matrices (which are like compressed bitmaps on speed, re: https://github.com/RoaringBitmap/RoaringBitmap ). The insight is that the adjacency list of a property-graph is actually a matrix, and then you can use linear algebra on it. But it may require the DB is built bottom up with matrices in mind from the start (instead of linked lists like Neo4j does). Maybe your double array approach in RageDB could be made to fit..

I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531

Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”

That way, for such queries one doesn't have to traverse the graph database as a discovery process through what each node "knows about", but could view and operate on the database from a God-like perspective, similar to table operations in relational databases.

Further reading: https://graphblas.org/


The lack of a Schema does hurt Neo4j performance. Properties are stored "willy nilly" on a linked list of bytes per node/relationship. No order, an "age" property can be: 45, 38.5, "adult", [18,19], false... and that makes a terrible mess when aggregating, sorting, filtering, searching, etc.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: