Show HN: PugSQL, a Python Port of HugSQL

avolcano · on May 29, 2019

I really like the idea of this! Potentially quite helpful for the GraphQL space, I think, since these query files could map 1:1 with resolvers (if I understand GraphQL correctly).

I've been working on a Node+TypeScript web app on and off for the last couple years, and one thing that's always bugged me with it is database access - it uses Knex.js as a query builder. Knex is a solid (if imperfect) DSL for interacting with SQL, but as time goes on and I get more comfortable with SQL, I've started wishing I could write raw SQL instead more easily. I think an architecture like PugSQL might help bridge the gap between "passing a bunch of SQL strings around" and a query builder.

Slightly off topic further thinking - one problem I've always had with Knex and TypeScript, though, is the lack of static typing - I've been writing runtime validations for each individual query result. This has been a bit annoying to maintain at scale since I don't have very good patterns for it. With a system like PugSQL, though, I could imagine just having input and output validators for each parameterized query.

Of course, the long term dream would be to generate type definitions from the SQL files, but I assume that would require a heck of a lot of magic (e.g. "actually run a query, figure out what the schema of the result table is, and create a snapshot of that"). I haven't seen a lot of prior art in terms of "static typing of DB access without a big ol' ORM," but I'm hopeful there's some options.

petetnt · on May 29, 2019

> Of course, the long term dream would be to generate type definitions from the SQL files, but I assume that would require a heck of a lot of magic (e.g. "actually run a query, figure out what the schema of the result table is, and create a snapshot of that"). I haven't seen a lot of prior art in terms of "static typing of DB access without a big ol' ORM,"

Regarding GraphQL, a combination of something like Postgraphile[0] and graphql-code-generator[1] or graphqlgen[2] gets you pretty much there without writing a single line of code.

[0] https://www.graphile.org/ [1] https://github.com/dotansimha/graphql-code-generator [2] https://oss.prisma.io/graphqlgen/

Rotareti · on May 30, 2019

> I've been writing runtime validations for each individual query result. This has been a bit annoying to maintain at scale since I don't have very good patterns for it.

I think pydantic [0] solves this problem perfectly. The objects you create with pydantic work with type systems such as mypy too, so you get all the editor support for them, if you want.

https://github.com/samuelcolvin/pydantic/

cjauvin · on May 29, 2019

The context seems relevant to plug my own take at this "problem" (aka. finding an alternative to a full-blown Python ORM), which involves talking to Postgres via only builtin data structures:

https://github.com/cjauvin/little_pger

benatkin · on May 29, 2019

Perhaps slightly unrelated: I'm considering moving to asyncpg and using quart, which is a port of flask to async python. What I wonder if it's time to start using async python, and if these libraries are mature enough. If so, I hope libraries like this and little_pger will switch to it or support it!

https://github.com/MagicStack/asyncpg https://gitlab.com/pgjones/quart

BerislavLopac · on May 30, 2019

Instead of Quart and asyncpg, I recommend you to take a look at Starlette [0] and databases [1].

[0] https://www.starlette.io/

[1] https://github.com/encode/databases

linsomniac · on May 29, 2019

I haven't yet used this model, but a previous HN discussion months ago brought Quart and asyncpg to my attention and my memory was the discussion was very favorable.

bpicolo · on May 30, 2019

I really like Quart as an almost drop-in Flask, asyncio alternative.

holtalanm · on May 29, 2019

i really really want to work on porting HugSQL to Elixir. debating starting on that for my next side project.

i remember when learning clojure, HugSQL was one my of favorite things ever. it was just...clean, simple, and awesome.

kungfooguru · on May 29, 2019

If you do, please use https://hex.pm/packages/eql (https://github.com/artemeff/eql) for the file parser :)

holtalanm · on May 29, 2019

o.0 thanks for this. reading up on the docs now.

arthurbrown · on May 30, 2019

Ecto is one of the very best things about elixir from my perspective. It seems bizarre that anyone would have this great tool uniquely available to their language and want to ignore it.

holtalanm · on May 30, 2019

ecto is awesome, don't get me wrong. i really enjoy it. not saying the slightest i want to ignore it, but i simply found HugSQL to be an absolute pleasure to use when I was learning Clojure. I like ecto, but i liked HugSQL more.

loop0 · on May 29, 2019

I started the same project with the same name a few months ago, I’m happy to see someone went farther than I did. Congrats on this project, I’m testing later. HugSQL achieves the right balance between orm and boilerplate code.

fulafel · on May 29, 2019

Nice to see Python ports of Clojure libraries. Are there others? Is there something close to spec or Plumatic Schema?

jjwiseman · on May 29, 2019

I want a Python port of instaparse!

born2discover · on May 29, 2019

Interesting. But what a potential use case for this? I mean what makes it stand out when put side by side with SQLAlchemy? Does it do anything differently?

From what I've been able to gather from that website, PugSQL is a wrapper around SQLAlchemy. So my question, why do we need a wrapper around an already well established, popular, robust and very powerful library?

phaer · on May 29, 2019

I haven't used PugSQL yet, but I like the principle because it allows to keep SQL and Python code in separate files, which makes it easier to use linters or even just proper syntax highlighters for the SQL compared to strings embedded into python or other programming languages.

sametmax · on May 29, 2019

First, good linters and syntax highlighters are supposed to work in embedded code. It works for HTML, JS, CSS. It works for docstring in Python, regex or f-strings. But the tooling state of SQL is quite terrible.

Secondly, you can do that with Path.read_text() and pass it to SQLalchemy-core if a query is so complex you want to (although you would lose many other architectural benefits by doing so, it is sometime necessary).

But writing your entire program like that seems very slow, SQL being verbose, and you having to write additional wrappers on top of it anyway.

phaer · on May 29, 2019

> First, good linters and syntax highlighters are supposed to work in embedded code. It works for HTML, JS, CSS. It works for docstring in Python, regex or f-strings. But the tooling state of SQL is quite terrible.

As you write, in practice it sucks for SQL. Formatting is another issue, can become quite awkward in embedded code

> Secondly, you can do that with Path.read_text() and pass it to SQLalchemy-core if a query is so complex you want to

Yes, if you use one file per query. Otherwise you'd need something similar to the naming-scheme of PugSQL, just implemented by your self.

> But writing your entire program like that seems very slow, SQL being verbose, and you having to write additional wrappers on top of it anyway.

This depends entirely on your program and/or problem domain. It might be overkill for simple CRUD applications with a few INSERT/UPDATE/SELECT queries, but might make a lot of sense if you use more advanced SQL features.

fulafel · on May 30, 2019

One clear advantage of the yesql/hugsql style is that you get to manage your SQL files in a consistent way. You're going to have migration files and stored procedure/view/etc definitions in .sql files too. This lets you export your queries as Clojure/Python functions simply but still have them live next to the rest of the SQL, for easy refactoring etc.

Scarbutt · on May 29, 2019

Looks like sqlalchemy is just used for the "being able to handle multiple databases" part, not for its higher level abstractions.

michelpp · on May 29, 2019

You may of course already be aware of this but it's worth pointing out. To many, SQLAlchemy is an ORM, but the SA developers very purposefully made the ORM layer completely optional. The SQLAlchamy-Core layer can be used to functionally compose queries without mapping types at all, and below that, SA can be used as an abstract but featureful generic client for executing just raw SQL.

Drdrdrq · on May 30, 2019

Thank you (and sibling for the example), I was not aware of that. I did check SA out, but missed this somehow, so I'm using psycopg2 directly now... But if I gain portability to other DBs for low effort, then I'm all for it. Thanks!

eshyong · on May 29, 2019

FYI, you can also execute queries directly against a database in SQLAlchemy:

    engine = create_engine('mysql://scott:tiger@localhost/test')
    connection = engine.connect()
    result = connection.execute("select username from users")
    for row in result:
        print("username:", row['username'])
    connection.close()

nicwolff · on May 30, 2019

SQLAlchemy Core is powerful but hardly complete; every real-world application I've written with it has ended up needing substantial hand-written SQL to cover the gaps. So I see a clear use case on top of SQLAlchemy to organize those queries.

buremba · on May 29, 2019

The Java/Kotlin alternative would be `the amazing library` JDBI: http://jdbi.org/

BossingAround · on May 29, 2019

Wow, this looks really good. Is there a different de-facto standard library for connecting to a DB from Java other than Hibernate? After using Hibernate for one of my extremely simple projects, I felt like I was using a tank to kill mosquitoes.

encloser · on May 29, 2019

JDBC is the low level connection standard. Everything I am aware of is built ontop of JDBC.

Many people use the JPA libraries like Hibernate and EclipseLink. Another popular ORM is (my|i)batis. JDBI also seems to have a decent following.

After years of using ORMs, I vastly prefer interfaces like HugSQL unless I'm building a dead simple CRUD app. (I'm never building simple CRUD apps.)

BossingAround · on May 30, 2019

Are you building the DB layer in Clojure and call it from Java then, or are you writing your whole app in Clojure?

gigatexal · on May 29, 2019

Very cool! Kudos to the dev(s)

cwp · on May 29, 2019

Awesome. I do the same sort of thing in Javascript and it works great. Nice to see it available in Python.

_pgmf · on May 29, 2019

A simple Python interface ... built atop the most powerful and widely used Python ORM. This is the library equivalent of virtualization!

This is crazy, right? Let's take SQLAlchemy, all the experience and expertise that went into building it, throw that out the window and make the dumbest possible wrapper on top.

zzzeek · on May 29, 2019

Charles, you gotta calm down.

I love these wrappers around Core. People who hate my ORM get to use my library anyway, the community comes to me and continues to help stability and performance improvements at that level in any case. the ORM was never intended to please everybody; Core was :)

infinite8s · on May 30, 2019

Core is awesome! It allows me to write complex on-demand analytical queries using a proper AST model instead of gluing together strings - basically the equivalent of my own Tableau or Looker. Most SQL libraries are the equivalent of building a parse tree by gluing together linear strings.

_pgmf · on May 29, 2019

Mike, you're precious!

sametmax · on May 29, 2019

Especially since SQLAlchemy has sqlalchemy-core, which allow to express pretty much any regular SQL, including proprietary one and very complex report generating query, but with type-casting, data checks, code completion, linting and a uniform API no matter the dialect or project.

I mean, we are not talking about just an ORM. SQLAlchemy is damn god-like powerful.

For some perspective: https://docs.sqlalchemy.org/en/13/core/tutorial.html

nathell · on May 29, 2019

This is not an ORM.

PugSQL is a Python incarnation of HugSQL, which was inspired by yesql, whose rationale you might want to read:

https://github.com/krisajenkins/yesql#rationale

sametmax · on May 29, 2019

SQLAlchemy-core is not an ORM either, the ORM is based on it.

You can already write raw sql with it in a separate file if needed, and make a function out of it. I can't see a benefit to adding PugSQL on top of it, unless you plan to use the SQL from several languages.

Other than that, what will happen is that you will end up writing abstraction layers on top of it anyway. Gotta validate those data. Gotta provide a unified API to the rest of the program. And you will rewrite a poorly tested, less expressive sqlalchemy-core.

Not to mention you lose the benefit of being able to create a lib that can talk to several databases, code completion, linting, etc. That are much better in Python than SQL.

So, on one hand: full power of SQL if needed, plenty of additional features when not. On the other hand... what ?

ptttr · on May 30, 2019

That's what I love about Clojure culture. It started as an acknowledgment of weaknesses of dominant solutions (isn't ORM an anti-pattern?) and lead to this beautiful story of open source: Yesql -> HugSQL -> PugSQL

It's interesting to watch as it clashes here with one-way-of-doing-things Python culture.

ethanpil · on May 29, 2019

Reading the documentation and looking at examples, I think you are being a bit harsh. This project looks to me like an interesting, intuitive and useful abstraction layer for organizing a project's DB queries.

pbecotte · on May 29, 2019

Imagine a query - `select * from users`. Now, in my webapp, the user can pass a username, in which case I want to do `select * from users where username=<some string>`.

In naive sql land, people solve this with string concatenations. Its... fine, but there is a lot of complexity to getting it right.

In sqlalchemy you can do `q = session.query(Users)` followed by `if username: q = q.filter(Users.username == mystring)`. The library handles all the concatenation and type conversions and whatnot for you.

So far as I can tell- with this you literally just write both queries and call a different one? I don't see ANY way of doing code reuse. And in my toy example that may not seem like a big deal, but I remember lots of functions where I had tens of endpoints that each were making similar queries, with a number of optional arguments to each. With sqlalchemy, we can build a "active user filter" function, and reuse it everywhere. This seems to take away that option.

BerislavLopac · on May 30, 2019

I see this as a perfectly complementary tool to SQLAlchemy Core itself: you can write your queries in Core, reusing as much as necessary -- but there is always a couple of insanely complex queries that join several tables and subqueries and do all kind of magic to get the right report. In this case PugSQL kicks in, and you can keep your complex queries in SQL and call them from Python as needed.

infinite8s · on May 30, 2019

I've found that most of these insanely complex analytic type queries are easy to model in Core if you treat Core as a DSL and use good programming practice (composition, separation of concerns, etc). The final queries end up being beasts, but if the Core-based pieces are modeled well they are basically equivalent to one I would have written by hand, but with the extraordinary benefit that they can be much more easily parameterized (imagine a financial report where you roll up Year-to-Date, Month-to-Date, Week-to-Date metrics, so using lots of window functions - with the proper structure it's trivial to add a further breakdown allowing you to group by any other metric). I've basically used it to create my own version of Looker.

balfirevic · on May 29, 2019

It's either two queries or something like this:

select * from users where username = :username or :username is null

That can be considered slightly better than two queries or slightly worse, depending on who you ask. And it is indeed the state of query reuse if you are not using at least decently smart query builder.

My impression is that many people simply don't have the kind of requirements that call for such advanced run-time query composition.

nrjames · on May 29, 2019

Dataset is my favorite dumb wrapper on top of SQLAlchemy. https://dataset.readthedocs.io/en/latest/

gtycomb · on May 29, 2019

Thanks for mentioning, I didn't know of this. A classic Python module it is, sweet and simple.

robodale · on May 29, 2019

Watch your naming - pug is also a template engine for npm.

mcfunley · on May 29, 2019

If you want to put sql in your templates, I can't help you

benbristow · on May 30, 2019

I mean if you run a website with pages about SQL you might want to.

gmueckl · on May 29, 2019

Naming collisions for open source software are the norm rather than the exception. At some point you just have to start living with it.

lkschubert8 · on May 29, 2019

It's probably safe to call it PugSQL though.

noir_lord · on May 29, 2019

Aye but the reason it's called pug is they got into problems because Jade was already used.

Naming is hard.

I really like pug though, with Vue SFC it's really clean.