loading...

JOIN Semiotics and MassiveJS v6

dmfay profile image Dian Fay Originally published at di.nmfay.com on ・7 min read

MassiveJS version 6 is imminent. This next release closes the widest remaining gap between Massive-generated APIs and everyday SQL, not to mention other higher-level data access libraries: JOINs.

This is something of a reversal for Massive, which until now has had very limited functionality for working with multiple database entities at once. I've even written about this as a constraint not without benefits (and, for the record, I think that still -- ad-hoc joins are a tool to be used judiciously in application code!).

But the main reason for this lack was always that I'd never come up with any solution that didn't fit awkwardly into an already-awkward options object. Deep insert and resultset decomposition were quite enough to keep track of. I am naturally loath to concede any inherent advantages to constructing models, but this really seemed like one for the longest time.

There are, however, ways. Here's what Massive joins look like, if we invade the imaginary privacy of an imaginary library system's imaginary patrons:

const whoCheckedOutCalvino = await db.libraries.join({
  books: {
    on: {library_id: 'id'},
    patron_books: {
      type: 'LEFT OUTER',
      pk: ['patron_id', 'book_id'],
      on: {book_id: 'books.id'},
      omit: true
    },
    who_checked_out: {
      type: 'LEFT OUTER',
      relation: 'patrons',
      on: {id: 'patron_books.patron_id'}
    }
  }
}).find({
  state: 'EV',
  'books.author ILIKE': 'calvino, %'
});

(relation in this sense indicates a table or view.)

And the output:

[{
  "id": 2,
  "name": "East Virginia State U",
  "state": "EV",
  "books": [{
    "author": "Calvino, Italo",
    "id": 1,
    "library_id": 2,
    "title": "Cosmicomics",
    "who_checked_out": [{
      "id": 1,
      "name": "Lauren Ipsum"
    }]
  }]
}, {
  "id": 3,
  "name": "Neitherfolk Public Library",
  "state": "EV",
  "books": [{
    "author": "Calvino, Italo",
    "id": 2,
    "library_id": 3,
    "title": "Cosmicomics",
    "who_checked_out": [{
      "id": 2,
      "name": "Daler S. Ahmet"
    }]
  }, {
    "author": "Calvino, Italo",
    "id": 4,
    "library_id": 3,
    "title": "Invisible Cities",
    "who_checked_out": []
  }]
}]

Or in other words, exactly what you'd hope it would look like -- and what, if you use Massive, you may previously have been dealing with a view and decomposition schema to achieve. This is a moderately complex example, and between defaults (e.g. type to INNER) and introspection, declaring a join can be as simple as naming the target: db.libraries.join('books').

The join schema is something of an evolution on the decomposition schema, sharing the same structure but inferring column lists, table primary keys, and even some on conditions where unambiguous foreign key relationships exist. It's more concise, less fragile, and still only defined exactly when and where it's needed. Even better, compound entities created from tables can use persistence methods, meaning that join() can replace many if not most existing usages of deep insert and resultset decomposition.

It might seem a little unconventional to just invent ersatz database entities out of whole cloth. There's some precedent -- Massive already treats scripts like database functions -- but the compound entities created by Readable.join() are a good bit more complex than that. There's a method to this madness though, and its origins date back to before Ted Codd came up with the idea of the relational database itself.

Semiotics from 30,000 Feet

Semiotics is, briefly, the study of meaning-making, with 19th-century roots in both linguistics and formal logic. It's also a sprawling intellectual tradition in dialogue with multifarious other sprawling intellectual traditions, so I am not remotely going to do it justice here. The foundational idea is credited on the linguistics side to Ferdinand de Saussure: meaning is produced in the relation of a signifier to a signified, or taken together a sign. Smoke to fire, letter to sound, and so forth. Everything else proceeds from that relationship. There is, of course, a lot more of that everything else, and like so many other foundational ideas the original Saussurean dyad is something of a museum piece.

But the idea of theorizing meaning itself in almost algebraic terms would outlive de Saussure. The logician Charles Sanders Peirce had already come to similar conclusions, and had realized to boot that the interpreted value of the signifier's relationship to its signified is as important as the other two. Peirce, following this line of reasoning, understood this "interpretant" itself to be a sign comprising its own signifier and signified which in turn yield their own interpretant, in infinite chains of signification. Louis Hjelmslev, meanwhile, reimagined de Saussure's dyad as a relation of expression to content, and added a second dimension of form and substance. To Hjelmslev, a sign is a function, in the mathematical sense, mapping the "form of expression" to the "form of content", naming as the "substance of expression" and "substance of content" the raw materials formed into the sign.

The use of the term "substance" sounds kind of like some sort of philosophically-détourned jargon, but there are no tricks here: it's just stuff. There's no more specific designation than the likes of "substance" for "that which has been made into a sign"; the category includes everything from physical materials to light, gesture, positioning, electricity, more, in endless combinations. A sign is created by these matters being selected and formed into content and expression: fuel, oxygen, and heat organized into fire and smoke, or sounds uttered in an order corresponding to a known linguistic quantity. It should be said also that consciousness need not enter into it: anything can make a sign, and even a plant can interpret one.

This all is to say: there's stuff out there, and what it has in common is that it is made to mean things. Most stuff, in fact, is constantly meaning many things at the same time, as long as there's an interpreting process -- and there's always something. The philosopher-psychologist tag team of Gilles Deleuze and Felix Guattari envisioned the primordial soup of matters-awaiting-further-formation as a spatial dimension: the plane of consistency or plane of immanence. Signification, as they proposed in 1000 Plateaus, happens on and above the plane of consistency, as matters are selected and drawn up from it to become substance and sign. The recursive nature of signification means that these signs are then selected into the substance of yet other signs, becoming layers or strata on the plane in a fashion they compare to the formation of sedimentary rock.

Signs and Databases

A database management system, like any other program, is an immensely complex system of signs. However, what sets DBMSs (and some other categories of software, like ledgers and version control systems) apart is that they're designed to manage other systems of signs. Thanks to this recursive aspect, a database can be imagined as a plane of consistency, a space from which any combination of unformed bytes might be drawn up into column-signs and row-signs which in turn are gathered into table-signs and view-signs and query-signs.

And if tables and views and queries are all still signs at base, where exactly do the differences come in? Tables store persistent data and are therefore mutable, while views and queries do not and are not, and must be constituted from tables themselves and (in the case of views) from each other. Tables constitute a lower stratum of signs, with views forming table- and view-substance into signs on higher strata, and queries higher still, at a sufficient remove from the plane of consistency that they're no longer stored in the database itself.

This is, of course, arriving at inheritance the long way around. In Massive terms, database entities are first instances of a base Entity class, after which they inherit a second prototype: one of Sequence, Executable, or Readable. Some of the latter may be further articulated as Writables, as well; there are no Writables which are not also Readables.

But there's more than one thing happening here, and the ordering of tables, views, and database functions into class-strata is the second step -- matters must be chosen before they can be formed into signs. It's in this first step of stratification that Massive adds script files to the API system of signs, treating them (almost) identically to functions and procedures.

Readable.join() takes the same idea further to expand on the database's relations: before, a Readable mapped one-to-one with a single table or view. But as long as SQL can be generated to suit, there's no reason one Readable couldn't map to multiple relations. Writables too, for that matter:

const librariesWithBooks = db.libraries.join('books');
const libraryMembers = db.patrons.join('libraries');

// inserts work exactly like deep insert, persisting an
// entire object tree
const newLibrary = await librariesWithBooks.insert({
  name: 'Lichfield Public Library',
  state: 'EV',
  books: [{
    library_id: undefined,
    title: 'Jurgen: A Comedy of Justice',
    author: 'Cabell, James Branch'
  }, {
    library_id: undefined,
    title: 'If On a Winter\'s Night a Traveller',
    author: 'Calvino, Italo'
  }]
});

// updates make changes in the origin table, based on
// criteria which can reference the joined tables
const withCabell = await librariesWithBooks.update({
  'books.author ilike': 'cabell, %'
}, {
  has_cabell: true
});

// deletes, like updates, affect the origin table only
const iplPatrons = await libraryMembers.destroy({
  'libraries.name ilike': 'Imaginary Public Library'
});

Try it Out!

The first v6 prerelease is available now: npm i massive@next. There's now a prerelease section of the docs going over what's new and different in detail. But to sum up the other changes:

  • Node < 7.6 is no longer supported.
  • Implicit ordering has been dropped.
  • Resultset decomposition now yields arrays instead of objects by default. The array schema field is no longer recognized, and you'll need to remove it from your existing decomposition schemas. To yield objects, set decomposeTo: 'object' instead.
  • JSON and JSONB properties are now sorted as their original type instead of being processed as text.
  • The type property of the order option has been deprecated in favor of Postgres-style field::type casting as used elsewhere. It will continue to work through the 6.x lifecycle but may be removed in a subsequent major release.

This is a feature I've been wishing I could make happen somehow ever since I first published the original resultset decomposition Gist more than two years ago. It's involved extensive changes to table loading, criteria parsing, and statement generation. I've endeavored not to break these areas, and have informally experimented by dropping pre-prerelease versions into an existing codebase. Results have been good, but should you find an issue with this or any other Massive functionality, please let me know!

I'm really excited to see just how far joins expand Massive's capabilities, but in truth there's just one thing I think I and most other Massive users will get the most mileage out of: plain old query predicate generation with criteria objects, without having to define and manage a plethora of views to cover basic JOINs. Stratification is a useful way to think about the production of meaning -- but strata themselves can also be dead weight.

Posted on by:

dmfay profile

Dian Fay

@dmfay

It's pronounced Diane. I do data architecture, operations, and backend development. In my spare time I maintain Massive.js, a data mapper for Node.js and PostgreSQL.

Discussion

markdown guide