Joseph Moore

Posted on Aug 9, 2018

Database Views Don't Really Exist

#database

I keep running into a misconception about database views, even when talking to fairly experienced developers. People seem to be under the impression that, in a relational database, a view is something that actually exists. They don't!

Now, let me explain. Obviously, when you create a view, SOMETHING exists. But many developers seem to misunderstand what it is that exists. Most often, I encounter this misconception when trying to address performance problems - "Hey, if your join is so bad and slow, why don't you just create a view! That'll make your SQL faster, with no joins or anything!"

When I say that the view doesn't exist, I mean that, at any given time, there is no data stored that says "here's what's in this view!" A view is not a table that gets constantly updated when the underlying tables get updated.

So what is a view, really?

A view is a stored query. When you create a database view, the database stores the SQL you gave it. Then, when you come along and query that view, the database takes the stored view query, adds in the extras from the query against the view, and executes it. That's it! Essentially, a view is just an automatic query rewriter, letting you write SQL which is simpler than what you would otherwise write. It's kind of like writing a function to hide a bunch of reusable nasty code. That code still gets executed, it just gets hidden behind a simpler, reusable facade.

Examples, please!

Let's say you have two shiny tables, Cars and Drivers, which have a many-to-many relationship:

People can drive many cars, and one car can be driven by many people.

Now, let's say I need to be able to query for the name of everyone who drives a blue Toyota. What might my SQL look like?

select distinct(d.name) from car_drivers cd
left join car c on c.vin = cd.vin
left join driver d on d.driver_id = cd.driver_id
where c.color = 'blue' and c.make = 'toyota';

(This isn't the most performant version of this query, but we'll get to that in a minute.)

While this is a trivial example, it's still a lot of stuff to get right. In a real-world example, those joins can get out of hand quickly.

So, let's make a view, so our queries can be simpler!

create view cars_and_drivers_vw as
select c.vin, c.color, c.make, c.model, c.year,
       d.driver_id, d.name, d.birthday
from car_drivers cd
left join car c on c.vin = cd.vin
left join driver d on d.driver_id = cd.driver_id;

Now, my query becomes:

select distinct(name) from cars_and_drivers_vw
where color = 'blue' and make = 'toyota';

Nice! Simple! Pretty!

But what's really happening?

Here's what happens when we set all of this up in an Oracle database and ask Oracle to show it's execution plan for our simple query:

If you don't know what you're looking at, never fear! That's what I'm here for!

What this shows us is that Oracle is taking this:

select distinct(name) from cars_and_drivers_vw
where color = 'blue' and make = 'toyota';

and turning it into this:

select distinct(d.name) from car_drivers cd
left join car c on c.vin = cd.vin
left join driver d on d.driver_id = cd.driver_id
where c.color = 'blue' and c.make = 'toyota';

(Hey, that looks familiar!)

Nice!

Stop Telling Me Views are Faster!

Now we come to the thing that made me write this post. Views do not improve performance! In one of my team's more complex applications, we have some very large tables with complicated relationships and even more complicated queries that need to be performed on them. Every time we run into a slow query, someone suggests that we can solve the slowness with a view. By creating a view that does all of our joins for us, we can magically make our query faster! However, if you realize that a view is just a fancy query re-writer, not a persistent data structure, you know that this is not true. Using a view will not be any faster than the raw query; in fact, it could easily make things worse. The answer to performance problems is not a view; it is often a process of trial and error: finding different queries that yield the same results, analyzing query execution plans, and finding good indexes to help the database query things more efficiently.

As an example of a query that might be more efficient than the slow one we've been using, here's a query that is equivalent to our first query, but may be more efficient over a large data set.

select d.name from driver d
where exists (
    select 1 from car_drivers cd 
    where cd.driver_id = d.driver_id 
    and exists (
        select 1 from car c
        where c.vin = cd.vin 
        and c.color = 'blue' 
        and c.make = 'toyota'
    )
);

What about materialized views?

You just had to bring those up, didn't you.

Materialized Views are a concept which exists in Oracle and in Postgres (and some other, smaller engines). The term is really a misnomer, and Oracle has replaced the name 'Materialized View' with 'Snapshot', as it is a more accurate description of what is happening.

When you create a ~~materialized view~~ snapshot, the database runs the query you specified immediately, then stores the results. The important thing is that, as the original tables change, the contents of the materialized view do not change. You have to manually refresh the view whenever you want to update the contents, which can be a slow operation. So, materialized views are fast, but they rapidly become stale. They're mostly useful for non-real-time analytics and analysis, where you would periodically refresh the view, then perform lots of queries on it.

The exception to this (since there always is one in SQL databases) is Microsoft SQL Server, which allows you to create an 'indexed view,' which is like an Oracle snapshot, except that updates to the underlying table are reflected in the view. However, this comes with limitations to the queries which can be used to construct the view.

Epilogue

So, obviously, we didn't use a view to solve our recent performance problems. In fact, the initial version of the application used views for these queries, and we got rid of them because they actually made many of our queries much slower. We also can't solve our specific problem with indexes because the most complex parts of the queries are split across two tables 🤦.

Instead, we leveraged some mechanisms we'd already built into the app to implement a small bit of CQRS - we created a small table that keeps the important, hard-to-optimize parts of our common queries. We already had hooks in our DB layer that allowed us to run additional updates as side effects to the updates on our main tables, so we took advantage of that to make sure that our query table is always up-to-date. This allowed us to add some very nice indexes, and one of our queries went from taking about 20 seconds to taking about 0.15 seconds.

Top comments (4)

geraldew • Jul 5 '21

A big issue with views is how they get treated by various parts of the data engine as well as some other tools.

On larger platforms it seems common for query optimisers to see right through the views, which can have some interesting effects.

As a benefit, it can (should?) mean that columns defined in a view that don't get used anywhere will not be implemented at any stage of the query plan. Indeed, for systems that let you see the query plan, it's common to not see the view mentioned at all, only the tables it called upon. This allows for the definition of views that provide a great many variations of column usage, with it being very unlikely in practice that many will ever be used at the same time. At run time, all the unused ones simply never come into existence.

While that sounds good, there can be a catch with view transparency. Because if metadata capture is done from the query plans then you can find your lineage tool also never shows the views either. When you were expecting the metadata lineage to show you which scripts or processes were using various views, that invisibility has now become a maintenance liability. FWIW this is why I prefer my lineage analysis to be based on the SQL rather than the inside knowledge of the query plan - but as I had to write my own to achieve that, I probably would say that.