Hey folks!
I just built a tool to statically evaluate SQL queries in JavaScript and TypeScript code: Postguard.
It's born out of the desire to write explicit database queries without a query builder, yet statically analyzable, and built on top of Babel and the TypeScript compiler API.
What does it do?
You query the database like this:
import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"
export async function queryUserById(id: string): Promise<UserRecord | null> {
const { rows } = await database.query<UserRecord>(sql`
SELECT * FROM users WHERE if = ${id}
`)
return rows[0] || null
}
Define a schema for your tables. The record type can be derived from the schema when using TypeScript:
// schema.ts
import { defineTable, Schema, TableRow } from "squid"
export type UserRecord = TableRow<typeof usersTable>
const usersTable = defineTable("users", {
id: Schema.Number,
name: Schema.String,
created_at: Schema.Date
})
Now let's run postguard
to validate what we just wrote:
$ npx postguard ./sample/*.ts
β Query validation failed in sample/test.ts:7:32:
No table in the query's scope has a column "if".
Tables in scope: "users"
5 | export async function queryUserByID(id: number) {
6 | const { rows } = await database.query<UserRecord>(sql`
> 7 | SELECT * FROM users WHERE if = ${id}
| ^
8 | `)
9 | return rows[0] || null
10 | }
Ahhh, we have a typo in our query! Let's fix it:
const { rows } = await database.query<UserRecord>(sql`
SELECT * FROM users WHERE id = ${id}
`)
Let's run it again:
$ npx postguard ./sample/*.ts
β Validated 1 queries against 1 table schemas. All fine!
More advanced stuff
Let's take our previous sample code and change the SELECT query:
import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"
export async function queryUserById(id: string): Promise<UserRecord | null> {
const { rows } = await database.query<UserRecord>(sql`
SELECT id, name FROM users WHERE id = ${id}
`)
return rows[0] || null
}
$ npx postguard ./sample/*.ts
β Query validation failed in sample/test.ts:6:40:
Query's result does not match the expected result type.
Missing columns in result rows: "created_at"
Actual columns in result rows: "id", "name"
4 |
5 | export async function queryUserByID(id: number) {
> 6 | const { rows } = await database.query<UserRecord>(sql`
| ^^^^^^^^^^^^
7 | SELECT id, name FROM users WHERE id = ${id}
8 | `)
9 | return rows[0] || null
What happened? We defined that we expect the query to return rows
of type UserRecord
:
await database.query<UserRecord>(/*...*/)
Postguard evaluated the query and noticed that the result rows of that SELECT query do not match the TypeScript type UserRecord
, since that type has a created_at
property.
The fix is trivial:
import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"
export async function queryUserById(id: string): Promise<UserRecord | null> {
const { rows } = await database.query<UserRecord>(sql`
SELECT * FROM users WHERE id = ${id}
`)
return rows[0] || null
}
$ npx postguard ./sample/*.ts
β Validated 1 queries against 1 table schemas. All fine!
Why?
So why did I even spent my time on that? Having worked with ORMs for years, I slowly grew more and more tired of them. A query builder seemed like a better solution.
Now the issue with query builders is that you effectively write SQL queries, but not as SQL queries; you formulate them using the query builder's proprietary API. Now I need to read up on two complex interfaces: My Postgres' fancy SQL features AND the query builder's comprehensive API...
Writing good old SQL feels like the natural way, so back to that. But now I've lost all confidence in my code and need to keep a really high test coverage, since I cannot statically reason about my queries. I actually need to run them to see if they work at all.
Enter the stage: Postguard. Back to confidence and short feedback cycles.
Feedback
So this tool is still pretty damn young and should not be used for production code yet.
I would love to hear some feedback regarding the overall approach, though!
The concept might seem a bit esoteric, but the code feels quite natural.
Share anything that's on your mind :)
Happy hacking!
Top comments (9)
I really like the idea here, and would like to see what some programmers from other languages think about it as well. That said, my only think I don't like is this bit,
I personally hate the template function syntax thing, so the leading
sql
bit I guess.Also personally I still think that grouping table queries together makes sense so having some kind of repository makes sense I feel. Like:
and then since the actual checker in this case is not part of TS, but rather I assume looks for the
sql
prepended templates, that we could instead to some full tokenization of all the repository class files, and find any string that contained sql keywords, then validate them.But I really like the idea of this, no need for a meh ORM, nor a meh query builder, but you get safe queries, and still get the benefit of model objects with types.
Excited to get a chance to play with this. :)
Nice to hear that you like the concept, Scott :)
Aesthetic preference is of course one thing to consider, but I don't see any alternative to the tagged template string that would offer similar benefits. The reason why it's a tagged template in a nut shell:
Agreed. Absolutely. :)
If you like to have traditional OOP and classes, feel free to go for it. I also keep the query functions for each table together in one file. I'm just a big advocate of a more functional programming style, that's all ;)
Might work, too, but would force users into a particular style of programming.
Thanks a lot for sharing this! I think this kind of dialog is very important, especially when having different views on a few things.
Tbh I didn't even question the template tags before, so this raised my awareness already quite a bit.
I hope we can have more such exchanges :)
I'd love to have more exchanges about it. :)
The tagged
sql
template isn't any more identifiable than Repository.query in terms of a token per se I would argue. As for its use in terms of how it deals with interpolation and then injection proofing, it would already either have to parse the sql string, or otherwise impose structure to how the interpolation happens right?How will it know how to deal with all of those? I haven't looked at the code yet, but I imagine you will really need to parse out things well and guard against various things. Also on the return of the Postgres Query Object, simply require that be the return type, though marrying it to Postgres or a library's raw query function's return type/param type is perhaps not the best option. This all said, I dislike tagged template literal functions in general so maybe I'm not rational here. But I also am concerned about someone forgetting to wrap it in the literal function so doing
will result in bad things, but having a single spot in the repo option saves you mostly providing some developer doesn't go rogue and try to interact with the DB directly. You could have a check for that, but then you're back to looking for
DB.rawQuery
as a token instead ofRepository.query
.I'm intrigued as to the functional approach you mention though. I assume you mean you have something like:
Then do something like:
or something like:
and especially in the second example I'm not sure how that's more functional. Its basically a repository class without making an actual class.
This is providing I'm not just on a whole different idea here. :P
This is fun dialogue and I hope we can continue. I worry that my top part about the tagged template literals my not be clear and comprehensible though, let me know if so.
In fact it is. It's subtle, but there is one major difference:
Repository.query(sqlString)
might be fine, butRepository.query(createSomeQuery(someVar))
will be hard to resolve, especially ifcreateSomeQuery
is imported from another file. The template tag and the string will be tightly coupled, though. No further indirections possible, just need to check if the template tag is an export of thesquid
package πIt's really using Babel to parse the source code to an abstract syntax tree, analyzing it, and when it finds an sql query, it will take the template string, replace the interpolated expressions with placeholders and parse the resulting SQL string using the native Postgres SQL parser implementation.
So we are actually utilizing a proper compiler toolchain and as long as we can trust Babel and the Postgres parser to do their job, there is not that much that can go wrong in that regard :)
User might forget to use the
sql
template tag... That's a very interesting point!We can prevent that source of error, though, I think. Let's come up with a check that makes sure that every query invocation involving a template string requires the template string to be tagged with either
sql
orsql.raw
. Anything else would be considered an error.Yeah, your code sample is basically it :)
The reason why I think it's a better choice than having a class: If you decide for the class-approach you will either
a) Keep the class completely stateless and all methods can be static. As you noted, this is basically equivalent to the exported functions approach.
b) Might introduce some statefulness to the class, maybe instantiate a singleton-ish instance, because that's what distinguishes the class from a set of static exports. That's what I would try to prevent in the first place. Introducing state here would lead to creating an ORM with most of its downsides. This is what we are trying to get away from ^
The difference between the concepts of classes and static functions is that the classes can be stateful. Since I would do whatever I can to prevent any statefulness here, there is no reason for me to introduce classes here.
But I suppose you are coming from the other side: You have a lot of classes and you try to figure out why not to use one here...
I think we are laying the foundation for a whole new blog post here right now... :D
Haha I think we really are laying the foundation for a blog post indeed. I'm not sure what about yet, perhaps a few different kinds really at this point. But anyways...
I hear what you mean on statefulness. Statefulness is indeed public enemy number one, I like the class with static methods approach a bit more just from the standpoint of classes are approachable to non-JS devs. What I've done recently which I really like is the following:
I like this because it lets you see that you might have added stuff to the file that doesn't belong in the idea that you can throw a function into it, export it, and go on, where as if I throw a function into the class that isn't used outside, then something has gone wrong, its also easier to keep track of what's being exported and hat isn't. And if I need one of the helper functions somewhere else, then I might have some un-needed coupling or another. Obviously this isn't real OOP by any stretch of the word and is really just a slightly different way of doing things, but I'm a fan. :P
being hard to resolve is true, but is
actually easier to resolve?
Now full disclosure here, I haven't actually gotten to look through the source code yet so I'm not 100% on how squid does things, but I am familiar with how tagged templates functions work. I would assume that the
sql
part is just a tag to identify it later, and the template function doesn't do anything fancy at all to it. And so later when we parse everything out, we still have to resolve that function.Unless the sql safety check is at run time? I just looked at the function in question I believe(raw.githubusercontent.com/andywer/...) and it seems to do the actual sql query parameter wrapping thing, but doesn't actually check the validity of the query in the DB for you.
But for the checking that the query is valid syntactically with the schema, that is an extra build step right? And that will require the same bit regardless of the identifier unless there's something I'm really truly missing. I get that the template literal functions(this is getting annoying to repeatedly type out lol) get the params passed in the second argument, but that only helps you providing like I said that someone hasn't forgotten the sql tag, or done like
sql`SELECT * FROM Users WHERE ` + `${invalidatedInput};`;
though at this point I suppose the developer is just trying to hurt themselves and we should allow darwinism to run its course.
This is kind of iffy to me as a solution. Checking every query invocation means that squid HAS to be tied to a specific framework or set of frameworks. And only working on template strings means that someone not using one, because either they're not a fan of string interpolation or just kind of accidentally forgets they exists, is left in the dust.
I find trying to guard against developer accidents and incompetence as much as possible is good practice, so I like a more general solution. Also since its in a build step, even if this process takes a bit longer, I think its worth it. This is where having a Repository Interface that requires them to implement those methods can help I feel.
The above points I guess bring me to two other things that I'd like to say that maybe are the source of some of our disconnect here:
I think perhaps that decoupling the two sub point, and the two main points could be quite beneficial. Also I feel that whatever querying library you use should be wrapping stuff for you so doing it yourself may not be necessary, though that might stick us back into query builder kind of land, I'm not sure about it without a lot more investigation, but I feel end of the day it's
DB.query(sql, params);
as the call, so allowing the developer to decide how to wrap it and call it is up to them, we just do some fancy validation and parsing. :P Not care about the implementation of it at all nor the library used.This is a fun discussion that I hope can continue! Also I've been using the inclusive 'we' here, but Its not my project and I haven't made any OSS to it, so forgive me!
Nahhh, I appreciate the "we". Sounds like a community effort already :)
I try to keep it shorter this time... ^
So the big picture overview as of right now is:
Squid:
sql
template tagPostguard
No, squid could basically be completely db-agnostic, since all it does is super generic. All the validation happens on build time by postguard.
The reason why it's SQL-injection proofed is that squid doesn't merge the expressions into the SQL query string yet, but produces a string with parameters Γ‘ la
$1
and an array of the expression values, so the database driver can figure out how to escape them in a water-proof way.I hope that shines some light on things, because it seems there was some confusion about the big picture concepts (-> will need to improve the readme files, i guess).
But sure, there is still room for various improvements and I love to hear your thoughts :)
I've seen your squid project as well, and I'm glad to see someone exploring this direction. π
Running static analysis on the SQL bits is interesting - but appears to miss out on refactoring safety? I believe there is a much simpler and refactoring-safe way to approach this - have a look at this old thing I built in PHP some years back:
github.com/mindplay-dk/sql#select-...
That is, to use static references for all table and column symbols, e.g. interpolating schema/column objects and values into template literals:
This approach would work much better with JS/TS than it did with PHP, since
user
anduser.id
would have types that thesql
function can recognize. (in PHP the string interpolation happens before the call, which creates some corner cases and inconsistencies for e.g.INSERT
statements in my library.)Now you can rename
user
oruser.id
(with TS at least) and have automatic refactoring. You also get errors highlighted (by TS) for example after removing a column. Running static analysis on the SQL queries is probably much less necessary then, and the whole thing could be quite a bit simpler?I know some users are averse to interpolation-heavy syntax - personally, considering how much this pays back in terms of safety and refactoring, I don't mind at all.
Prisma, TypeORM and Sequelize just don't do it for me - they're either too complex or too removed from SQL. I hope the idea of leveraging string interpolation catches on more widely in the future.
Cheers. π
Interesting library Andy. I as well have run into the 'learn SQL engine AND query builder API" issue in the past. If you target this library squarely in the 'you have to know your SQL' and keep high performance I think you could be on to something here.
Good point, David. One thing that jumps to my mind:
The "keep high performance" part makes perfect sense, but even if that's not your ambition you might still prefer SQL over any query builder's custom API.
Would you agree? Of course you would also want to have some simple IDE integration for a better dev experience. I think this can be solved with reasonable effort, though.