DEV Community

Cover image for Saving the UNION with HoneySQL
Daniel Fitzpatrick
Daniel Fitzpatrick

Posted on

Saving the UNION with HoneySQL

SQL might be an unparalleled success story of narcissistic design, but it's a robust apparatus. Consequently, you've probably brushed against some rough edges.

The problem

Have you ever noticed that UNION breaks DRY principles?

SELECT x.foo, x.bar, '' as baz FROM x
UNION
SELECT '' as foo, '' as bar, y.baz FROM y;
Enter fullscreen mode Exit fullscreen mode

The code above requires more references to foo, bar, and baz than necessary. Adding additional queries to the union is an exercise in masochism because this unnecessary code duplication increases exponentially.

If each query introduces just a single column, and you have 5 of them, that's 20 times need to repeat yourself! We can use HoneySQL to put this unruly child in timeout. 1

TL;DR

If you just want the code and don't want to follow along, I have uploaded a gist which includes a solution for the problem mentioned at the end of this post.

Spread some honey...sql

Translating the previous query into HoneySQL makes a good beginning.

(require '[honeysql.helpers :as hh]
         '[honeysql.core :as hsql])

{:union
 [(-> (hh/select :foo :bar ["" :baz])
      (hh/from :x))
  (-> (hh/select ["" :foo] ["" :bar] :baz)
      (hh/from :y))]}

;=>
{:union
 [{:select (:foo :bar ["" :baz]), :from (:x)}
  {:select (["" :foo] ["" :bar] :baz), :from (:y)}]}
Enter fullscreen mode Exit fullscreen mode

Where is the SQL? You can view it from the repl with (hsql/format *1).

HoneySQL displayed its first big win in the previous output. Did you spot it? It emits regular Clojure data structures, which we can easily manipulate with the standard library. It also allows us to describe our desired output succinctly.

{:union [{:select [:foo]}
         {:select [:bar]}]}

;=>
{:union [{:select [:foo [nil :bar]]}
         {:select [[nil :foo] :bar]}]}
Enter fullscreen mode Exit fullscreen mode

As in the example above, we will supply nil as the value for "back-filled" columns.

Columns in HoneySQL will commonly look like one of three patterns.

:column
:table.column
[:processing-stuff :column]
Enter fullscreen mode Exit fullscreen mode

A function that parses out the column name we need for back-filling would look like this.

(defn get-column-name [col]
  (cond
    (vector? col) (second col)
    (and (keyword? col)
         (str/includes? (name col) ".")) (-> col name
                                             (str/split #"\.")
                                             second keyword)
    (keyword? col) col))
Enter fullscreen mode Exit fullscreen mode

A quick test proves that this works.

(get-column-name :baz)
;=> :baz
(get-column-name :bar.baz)
;=>:baz
(get-column-name [:foo.bar :baz])
;=>:baz
Enter fullscreen mode Exit fullscreen mode

Next, we should contrive some way to get an ordered collection of all our column names. That will simplify everything else. Ironically this will require another set theory UNION, but we want to preserve element order.

(defn ordered-union
  [cmp coll1 coll2]
  (loop [xs [] coll1 coll1 coll2 coll2]
    (cond
      (and (empty? coll1) (empty? coll2)) xs
      (empty? coll1) (vec (concat xs coll2))
      (empty? coll2) (vec (concat xs coll1))
      (cmp (first coll1) (first coll2)) (recur (conj xs (first coll2))
                                               (rest coll1) (rest coll2))
      :else (recur (conj xs (first coll1)) (rest coll1) coll2))))
Enter fullscreen mode Exit fullscreen mode

Has it been a while since you last saw loop/recur?

The function above takes a comparator cmp and two collections. It assumes that coll1 "comes before" coll2 - an assumption made visible in the last branch. The first three branches are obvious exit conditions. The fourth branch ensures that collisions are not duplicated in the resultset, thus enforcing the set theory aspect of our function.

We can employ the usual tricks to support more than two inputs. In this case, it's just a reduction over the inputs. Because ordered-union is generally helpful outside of the scope of this article, feel free to add a [cmp coll1 coll2 & colls] arity and repurpose.

We will apply this trick to get our complete ordered list of columns.

(defn get-all-columns-in-union [m]
  (reduce
   (fn [a b]
     (ordered-union = a (map get-column-name b)))
   [] (map :select (:union m))))
Enter fullscreen mode Exit fullscreen mode

We now have something worth testing.

(get-all-columns-in-union
 {:union [{:select [["first" :key] :a :b :c]}
          {:select [["second" :key] :d :e :f]}
          {:select [["third" :key] :g :h :i]}]})
;=> [:key :a :b :c :d :e :f :g :h :i]
Enter fullscreen mode Exit fullscreen mode

A crucial insight is that we can use ordered-union to rewrite the SELECT clauses. Check this out.

(defn format-nil-column [column] [nil column])

(ordered-union
 #(= (get-column-name %1) (get-column-name %2))
 (mapv format-nil-column [:key :a :b :c :d :e :f :g :h :i])
 [["first" :key] :a :b :c])
;=> [["first" :key] :a :b :c [nil :d] [nil :e] [nil :f] [nil :g] [nil :h] [nil :i]]

(ordered-union
 #(= (get-column-name %1) (get-column-name %2))
 (mapv format-nil-column [:key :a :b :c :d :e :f :g :h :i])
 [["second" :key] :d :e :f])
;=> [["second" :key] [nil :a] [nil :b] [nil :c] :d :e :f [nil :g] [nil :h] [nil :i]]

(ordered-union
 #(= (get-column-name %1) (get-column-name %2))
 (mapv format-nil-column [:key :a :b :c :d :e :f :g :h :i])
 [["third" :key] :g :h :i])
;=> [["third" :key] [nil :a] [nil :b] [nil :c] [nil :d] [nil :e] [nil :f] :g :h :i]
Enter fullscreen mode Exit fullscreen mode

I plugged our previous three SELECT statements into ordered-union, and it returned what we wanted in each case.

Put it all together

We are now in the red zone. So, finally, let's wrap things up.

(defn add-columns [nil-columns provided-columns]
  (ordered-union
   #(= (get-column-name %1) (get-column-name %2))
   nil-columns provided-columns))

(defn normalize-union
  ([m]
   (let [cols (get-all-columns-in-union m)]
     (update m :union (partial normalize-union cols))))
  ([columns queries]
   (let [nil-columns (mapv format-nil-column columns)]
     (mapv #(update % :select (partial add-columns nil-columns)) queries))))
Enter fullscreen mode Exit fullscreen mode

To test

(normalize-union
 {:union [{:select [["first" :key] :a :b :c]}
          {:select [["second" :key] :d :e :f]}
          {:select [["third" :key] :g :h :i]}]})
{:union
 [{:select
   [["first" :key]
    :a
    :b
    :c
    [nil :d]
    [nil :e]
    [nil :f]
    [nil :g]
    [nil :h]
    [nil :i]]}
  {:select
   [["second" :key]
    [nil :a]
    [nil :b]
    [nil :c]
    :d
    :e
    :f
    [nil :g]
    [nil :h]
    [nil :i]]}
  {:select
   [["third" :key]
    [nil :a]
    [nil :b]
    [nil :c]
    [nil :d]
    [nil :e]
    [nil :f]
    :g
    :h
    :i]}]}
Enter fullscreen mode Exit fullscreen mode

The devil in the details

It's time for me to confess a problem that I have been hesitant to mention.

Types.

Postgres will treat nil as type TEXT for a UNION with more than two queries, and it forces us to care about this because column foo must have a matching type across every row in the resultset.

I will offer some clues about resolving this but won't go into an in-depth solution. The solution is less work than you probably think. It would be best to start by allowing normalize-union to accept an optional type-map. It should look like this.

{:thing1 :integer
 :thing2 :uuid
 :thing3 :timestamp}
Enter fullscreen mode Exit fullscreen mode

If you pass that map into format-nil-column, then you can leverage honeysql.core/call to force nil to be the type you want. Something like this should do the trick.

(honeysql.core/call :cast nil (get type-map column))
Enter fullscreen mode Exit fullscreen mode

Do it right, and you might even be able to provide limited support for wildcards. I hope this has been a fun trip down the dark side of SQL.


  1. I was reluctant to embrace HoneySQL when someone first introduced me, but its core feature has won me over with time: eschewing cleverness. For example, other ORM users talk about "fighting the query generator" to win their desired performance target - a story foreign to my user experience. 

Top comments (1)

Collapse
 
dvliman profile image
David Liman

honeysql treating query as just data map is extremely useful to build complex queries based on input!