Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: PugSQL, a Python Port of HugSQL (pugsql.org)
155 points by mcfunley on May 29, 2019 | hide | past | favorite | 53 comments


I really like the idea of this! Potentially quite helpful for the GraphQL space, I think, since these query files could map 1:1 with resolvers (if I understand GraphQL correctly).

I've been working on a Node+TypeScript web app on and off for the last couple years, and one thing that's always bugged me with it is database access - it uses Knex.js as a query builder. Knex is a solid (if imperfect) DSL for interacting with SQL, but as time goes on and I get more comfortable with SQL, I've started wishing I could write raw SQL instead more easily. I think an architecture like PugSQL might help bridge the gap between "passing a bunch of SQL strings around" and a query builder.

Slightly off topic further thinking - one problem I've always had with Knex and TypeScript, though, is the lack of static typing - I've been writing runtime validations for each individual query result. This has been a bit annoying to maintain at scale since I don't have very good patterns for it. With a system like PugSQL, though, I could imagine just having input and output validators for each parameterized query.

Of course, the long term dream would be to generate type definitions from the SQL files, but I assume that would require a heck of a lot of magic (e.g. "actually run a query, figure out what the schema of the result table is, and create a snapshot of that"). I haven't seen a lot of prior art in terms of "static typing of DB access without a big ol' ORM," but I'm hopeful there's some options.


> Of course, the long term dream would be to generate type definitions from the SQL files, but I assume that would require a heck of a lot of magic (e.g. "actually run a query, figure out what the schema of the result table is, and create a snapshot of that"). I haven't seen a lot of prior art in terms of "static typing of DB access without a big ol' ORM,"

Regarding GraphQL, a combination of something like Postgraphile[0] and graphql-code-generator[1] or graphqlgen[2] gets you pretty much there without writing a single line of code.

[0] https://www.graphile.org/ [1] https://github.com/dotansimha/graphql-code-generator [2] https://oss.prisma.io/graphqlgen/


> I've been writing runtime validations for each individual query result. This has been a bit annoying to maintain at scale since I don't have very good patterns for it.

I think pydantic [0] solves this problem perfectly. The objects you create with pydantic work with type systems such as mypy too, so you get all the editor support for them, if you want.

https://github.com/samuelcolvin/pydantic/


The context seems relevant to plug my own take at this "problem" (aka. finding an alternative to a full-blown Python ORM), which involves talking to Postgres via only builtin data structures:

https://github.com/cjauvin/little_pger


Perhaps slightly unrelated: I'm considering moving to asyncpg and using quart, which is a port of flask to async python. What I wonder if it's time to start using async python, and if these libraries are mature enough. If so, I hope libraries like this and little_pger will switch to it or support it!

https://github.com/MagicStack/asyncpg https://gitlab.com/pgjones/quart


Instead of Quart and asyncpg, I recommend you to take a look at Starlette [0] and databases [1].

[0] https://www.starlette.io/

[1] https://github.com/encode/databases


I haven't yet used this model, but a previous HN discussion months ago brought Quart and asyncpg to my attention and my memory was the discussion was very favorable.


I really like Quart as an almost drop-in Flask, asyncio alternative.


i really really want to work on porting HugSQL to Elixir. debating starting on that for my next side project.

i remember when learning clojure, HugSQL was one my of favorite things ever. it was just...clean, simple, and awesome.


If you do, please use https://hex.pm/packages/eql (https://github.com/artemeff/eql) for the file parser :)


o.0 thanks for this. reading up on the docs now.


Ecto is one of the very best things about elixir from my perspective. It seems bizarre that anyone would have this great tool uniquely available to their language and want to ignore it.


ecto is awesome, don't get me wrong. i really enjoy it. not saying the slightest i want to ignore it, but i simply found HugSQL to be an absolute pleasure to use when I was learning Clojure. I like ecto, but i liked HugSQL more.


I started the same project with the same name a few months ago, I’m happy to see someone went farther than I did. Congrats on this project, I’m testing later. HugSQL achieves the right balance between orm and boilerplate code.


Nice to see Python ports of Clojure libraries. Are there others? Is there something close to spec or Plumatic Schema?


I want a Python port of instaparse!


Interesting. But what a potential use case for this? I mean what makes it stand out when put side by side with SQLAlchemy? Does it do anything differently?

From what I've been able to gather from that website, PugSQL is a wrapper around SQLAlchemy. So my question, why do we need a wrapper around an already well established, popular, robust and very powerful library?


I haven't used PugSQL yet, but I like the principle because it allows to keep SQL and Python code in separate files, which makes it easier to use linters or even just proper syntax highlighters for the SQL compared to strings embedded into python or other programming languages.


First, good linters and syntax highlighters are supposed to work in embedded code. It works for HTML, JS, CSS. It works for docstring in Python, regex or f-strings. But the tooling state of SQL is quite terrible.

Secondly, you can do that with Path.read_text() and pass it to SQLalchemy-core if a query is so complex you want to (although you would lose many other architectural benefits by doing so, it is sometime necessary).

But writing your entire program like that seems very slow, SQL being verbose, and you having to write additional wrappers on top of it anyway.


> First, good linters and syntax highlighters are supposed to work in embedded code. It works for HTML, JS, CSS. It works for docstring in Python, regex or f-strings. But the tooling state of SQL is quite terrible.

As you write, in practice it sucks for SQL. Formatting is another issue, can become quite awkward in embedded code

> Secondly, you can do that with Path.read_text() and pass it to SQLalchemy-core if a query is so complex you want to

Yes, if you use one file per query. Otherwise you'd need something similar to the naming-scheme of PugSQL, just implemented by your self.

> But writing your entire program like that seems very slow, SQL being verbose, and you having to write additional wrappers on top of it anyway.

This depends entirely on your program and/or problem domain. It might be overkill for simple CRUD applications with a few INSERT/UPDATE/SELECT queries, but might make a lot of sense if you use more advanced SQL features.


One clear advantage of the yesql/hugsql style is that you get to manage your SQL files in a consistent way. You're going to have migration files and stored procedure/view/etc definitions in .sql files too. This lets you export your queries as Clojure/Python functions simply but still have them live next to the rest of the SQL, for easy refactoring etc.


Looks like sqlalchemy is just used for the "being able to handle multiple databases" part, not for its higher level abstractions.


You may of course already be aware of this but it's worth pointing out. To many, SQLAlchemy is an ORM, but the SA developers very purposefully made the ORM layer completely optional. The SQLAlchamy-Core layer can be used to functionally compose queries without mapping types at all, and below that, SA can be used as an abstract but featureful generic client for executing just raw SQL.


Thank you (and sibling for the example), I was not aware of that. I did check SA out, but missed this somehow, so I'm using psycopg2 directly now... But if I gain portability to other DBs for low effort, then I'm all for it. Thanks!


FYI, you can also execute queries directly against a database in SQLAlchemy:

    engine = create_engine('mysql://scott:tiger@localhost/test')
    connection = engine.connect()
    result = connection.execute("select username from users")
    for row in result:
        print("username:", row['username'])
    connection.close()


SQLAlchemy Core is powerful but hardly complete; every real-world application I've written with it has ended up needing substantial hand-written SQL to cover the gaps. So I see a clear use case on top of SQLAlchemy to organize those queries.


The Java/Kotlin alternative would be `the amazing library` JDBI: http://jdbi.org/


Wow, this looks really good. Is there a different de-facto standard library for connecting to a DB from Java other than Hibernate? After using Hibernate for one of my extremely simple projects, I felt like I was using a tank to kill mosquitoes.


JDBC is the low level connection standard. Everything I am aware of is built ontop of JDBC.

Many people use the JPA libraries like Hibernate and EclipseLink. Another popular ORM is (my|i)batis. JDBI also seems to have a decent following.

After years of using ORMs, I vastly prefer interfaces like HugSQL unless I'm building a dead simple CRUD app. (I'm never building simple CRUD apps.)


Are you building the DB layer in Clojure and call it from Java then, or are you writing your whole app in Clojure?


Very cool! Kudos to the dev(s)


Awesome. I do the same sort of thing in Javascript and it works great. Nice to see it available in Python.


A simple Python interface ... built atop the most powerful and widely used Python ORM. This is the library equivalent of virtualization!

This is crazy, right? Let's take SQLAlchemy, all the experience and expertise that went into building it, throw that out the window and make the dumbest possible wrapper on top.


Charles, you gotta calm down.

I love these wrappers around Core. People who hate my ORM get to use my library anyway, the community comes to me and continues to help stability and performance improvements at that level in any case. the ORM was never intended to please everybody; Core was :)


Core is awesome! It allows me to write complex on-demand analytical queries using a proper AST model instead of gluing together strings - basically the equivalent of my own Tableau or Looker. Most SQL libraries are the equivalent of building a parse tree by gluing together linear strings.


Mike, you're precious!


Especially since SQLAlchemy has sqlalchemy-core, which allow to express pretty much any regular SQL, including proprietary one and very complex report generating query, but with type-casting, data checks, code completion, linting and a uniform API no matter the dialect or project.

I mean, we are not talking about just an ORM. SQLAlchemy is damn god-like powerful.

For some perspective: https://docs.sqlalchemy.org/en/13/core/tutorial.html


This is not an ORM.

PugSQL is a Python incarnation of HugSQL, which was inspired by yesql, whose rationale you might want to read:

https://github.com/krisajenkins/yesql#rationale


SQLAlchemy-core is not an ORM either, the ORM is based on it.

You can already write raw sql with it in a separate file if needed, and make a function out of it. I can't see a benefit to adding PugSQL on top of it, unless you plan to use the SQL from several languages.

Other than that, what will happen is that you will end up writing abstraction layers on top of it anyway. Gotta validate those data. Gotta provide a unified API to the rest of the program. And you will rewrite a poorly tested, less expressive sqlalchemy-core.

Not to mention you lose the benefit of being able to create a lib that can talk to several databases, code completion, linting, etc. That are much better in Python than SQL.

So, on one hand: full power of SQL if needed, plenty of additional features when not. On the other hand... what ?


That's what I love about Clojure culture. It started as an acknowledgment of weaknesses of dominant solutions (isn't ORM an anti-pattern?) and lead to this beautiful story of open source: Yesql -> HugSQL -> PugSQL

It's interesting to watch as it clashes here with one-way-of-doing-things Python culture.


Reading the documentation and looking at examples, I think you are being a bit harsh. This project looks to me like an interesting, intuitive and useful abstraction layer for organizing a project's DB queries.


Imagine a query - `select * from users`. Now, in my webapp, the user can pass a username, in which case I want to do `select * from users where username=<some string>`.

In naive sql land, people solve this with string concatenations. Its... fine, but there is a lot of complexity to getting it right.

In sqlalchemy you can do `q = session.query(Users)` followed by `if username: q = q.filter(Users.username == mystring)`. The library handles all the concatenation and type conversions and whatnot for you.

So far as I can tell- with this you literally just write both queries and call a different one? I don't see ANY way of doing code reuse. And in my toy example that may not seem like a big deal, but I remember lots of functions where I had tens of endpoints that each were making similar queries, with a number of optional arguments to each. With sqlalchemy, we can build a "active user filter" function, and reuse it everywhere. This seems to take away that option.


I see this as a perfectly complementary tool to SQLAlchemy Core itself: you can write your queries in Core, reusing as much as necessary -- but there is always a couple of insanely complex queries that join several tables and subqueries and do all kind of magic to get the right report. In this case PugSQL kicks in, and you can keep your complex queries in SQL and call them from Python as needed.


I've found that most of these insanely complex analytic type queries are easy to model in Core if you treat Core as a DSL and use good programming practice (composition, separation of concerns, etc). The final queries end up being beasts, but if the Core-based pieces are modeled well they are basically equivalent to one I would have written by hand, but with the extraordinary benefit that they can be much more easily parameterized (imagine a financial report where you roll up Year-to-Date, Month-to-Date, Week-to-Date metrics, so using lots of window functions - with the proper structure it's trivial to add a further breakdown allowing you to group by any other metric). I've basically used it to create my own version of Looker.


It's either two queries or something like this:

select * from users where username = :username or :username is null

That can be considered slightly better than two queries or slightly worse, depending on who you ask. And it is indeed the state of query reuse if you are not using at least decently smart query builder.

My impression is that many people simply don't have the kind of requirements that call for such advanced run-time query composition.


Dataset is my favorite dumb wrapper on top of SQLAlchemy. https://dataset.readthedocs.io/en/latest/


Thanks for mentioning, I didn't know of this. A classic Python module it is, sweet and simple.


Watch your naming - pug is also a template engine for npm.


If you want to put sql in your templates, I can't help you


I mean if you run a website with pages about SQL you might want to.


Naming collisions for open source software are the norm rather than the exception. At some point you just have to start living with it.


It's probably safe to call it PugSQL though.


Aye but the reason it's called pug is they got into problems because Jade was already used.

Naming is hard.

I really like pug though, with Vue SFC it's really clean.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: