Jonathan Eccker for DealerOn Dev

Posted on Feb 7, 2020 • Edited on Jun 11, 2021

Adventures in SQL Source Control with SSDT

#database #sql

SQL Source control in our stack was introduced because of a number of pain points. We had to do full database refreshes any time we spun up a new environment, we were fairly limited in options to run databases locally, we had no clean way to properly version databases, and SQL scripts often were manual operations. We had come up with a couple processes to alleviate these pain points, but they often weren't sustainable or flexible.

What is SQL Source Control?

SQL Source control is a way to treat your SQL database as a proper application (a Data Tier Application, to be precise). Changes made to schema and data get the same attention that a standard application would. This allows you to publish a database to a target and it will update it to the schema you provided.

There are a number of SQL Source control providers, we went with SSDT because it fits into our stack very nicely, and is free.

SSDT

SSDT, or Sql Server Data Tools, is managed very similarly to standard .Net application. The solution and project layout is very similar, you can msbuild it or build and publish it in visual studio, all in a way that'll feel familiar to any .Net developer.

Setting It Up

The first step is creating a new SQL Server Database Project in visual studio (under the SQL Server section).

Visual studio then provides a very handy schema import tool. If you right click on the project, and highlight Import, you can select to import the schema from a database. We opted for the Schema\Object Type folder structure as it would assist in providing some better boundaries between different schemata.

Once a connection was selected (to a QA server), you click start and it'll do the rest.

The Git Ignore

For the initial commit, it became apparent there was a lot of excess stuff that probably shouldn't go into source control. After playing around a bit the .gitignore came out looking like:

.vs
bin
obj
*.user
*.dbmdl
*.jfm
*.log

When making changes you will occasionally see things called .refactorlog, do not gitignore these, they are required for properly doing refactors like changing column names, and need to be checked in.

It was all broken

Over the last few years I've put a lot of effort into pruning our database. We inherited a rather gross setup with over 1000 stored procedures, half of which were not used anywhere anymore, and hundreds of dead tables. I've gotten a number of these deleted (see https://dev.to/dealeron/using-graphdbs-to-visualize-code-sql-dependencies-3370), but there was still a lot of clutter.

Lo and behold, there were a couple dozen stored procedures and views that simply would not build, as well as a few foreign keys. This provided immediate value, identifying a ton of dead weight that could be removed.

The foreign keys were broken because they were set up in such a way that did not create a hierarchical cascade delete, and had the possibility of spinning up multiple delete paths. They were set up on an older version of SQL Server and managed to port over to a newer version of SQL Server that does not allow this to occur.

Deleting the Broken Stuff

The goal then became to delete all the broken stuff. I had assumed that just deleting them from source control and republishing would delete the objects. This proved not to be true unless I selected a "Delete Objects in target not in source control" option. This also proved to not work because of a fail safe that prevents you from accidentally deleting tables that still have rows in them.

The solution was to provide DROP IF EXISTS scripts in a data file (see next section). This provides a bit of long lived bloat, but deleting tables typically is uncommon so does not create too much of a pain point.

The Data

It's pretty common to have table data that is considered part of schema. Lookup tables for referential integrity or required for applications are the most common example. To make sure that these get included and maintained via deployments/publishes, you need to add a post deployment script.

I created a data folder and added a LookupTables folder into it to accommodate for all tables that are considered lookup tables. As of writing this, I'm up to 4 folders, LookupTables, ConfigurationTables, DataChanges, and ManualDrops. ConfigurationTables is mostly consisting of data that should be initialized by the supporting applications, but is not yet. DataChanges is for scripts used to change the shape of data or migrate data.

I then added a Post-Deployment Script (under User Scripts in Add Item when you right click on the data folder). You can only have one post deployment script per project, but can make that script pull in scripts from other files.

This post deployment script ends up looking like:

:r ./LookupTables/SomeLookupTable.sql
:r ./LookupTables/AnotherLookupTable.sql
:r ./ConfigurationTables/AConfigurationTable.sql
:r ./DataChanges/Ticket#.sql

The :r is a command (from SQL CMD) that pulls in a sql script from another file. Make sure those other files are not included in the build, or else the compiler will try to treat them as tables/stored procedures/views/etc., and will give you compile time errors.

The Catch With Data

When deploying a dacpac (the "build" of a Data Tier Application), ALL data scripts will run EVERY time you deploy. This means you need to make sure that everything is re-runnable, and may need to clear out bloat if a script will be determined to not be runnable after a schema change occurs.

So far I haven't seen this become too much of a concern. There has been a few scripts that we ran occasionally that we identified would need to stop being SQL scripts and turn into a piece of functionality that lives on an application, but other than that the re-runability is solved by doing simple IF NOT EXISTS(SELECT * FROM TableName) wrappers.

Mad Gains

I've been beta testing SSDT for our company for a few months now. It's become apparent that SSDT, dacpac deploys, and SQL Source Control solve many more problems than I had initially set out to address.

LocalDbs

It's now extremely simple to spin up a super lightweight database on a local machine. Simply spin up a new SQL Server (docker, localdb, sql server express, whatever), and either do a dacpac deploy or publish to it from visual studio, and you have a functional database.

New environments in general are much easier to spin up, and have much lower maintenance cost since we can start them with small databases and build them up instead of doing a full pull of production data with a data scrub on sensitive information.

No Manual Executions

There doesn't need to be communication between development and other teams (dev ops, QA, etc.) about what scripts to execute alongside other tickets. Just like an application, as long as they deploy a specific build, it'll all be there.

Breaking Up Database by Domain

Our database was a bit of a monolith. Since SSDT by default doesn't drop tables that aren't manually dropped, we are able to virtually break up the database into multiple projects that all just happen to deploy to the same instance of SQL Server.

This makes it super trivial to migrate tables related to a specific team/domain into their own micro database (with a bit of migration plan). It essentially decouples tables from the database that's hosting them.

Visual Studio Writes Your SQL

Visual studio provides a ton of helper functionality for changing schema. Typically writing a SQL script to change the name of a column might be a bit gross, but in visual studio you can just go into the designer view, change the name, and it'll provide a refactor log to make the change for you without you needing to write any SQL.

Compile Time Schema Errors

As stated earlier, it's now very apparent when something about the database is not lining up. Deleting a table that's required in a view won't even make it past the build step, let alone making it to trying to run it against a database.

Merge Conflicts

I know it sounds odd, but merge conflicts are really a good thing. They are a strong indicator that a piece of functionality is possibly getting modified to do two different things. Previously it was always a possibility that two people making changes to the same stored procedure, function, or view, would just overwrite the other's changes and no one would be the wiser until we started getting bug reports.

Tests

I haven't played around with tests, but being able to provide tests around triggers, functions, views, constraints, and stored procedures is really powerful. Putting logic in the database has always seemed iffy because of how opaque it can be. I still stand by trying to have most logic live at the application layer, but having the database be testable massively helps ease concerns when we want/need to put logic in the database for performance purposes.

Not so Mad Pains

There's definitely some issues that come with using SSDT.

Data Bloat

As you start adding more and more DataChanges that go through SQL source control you may occasionally need to go in and trim some of the DataChanges that were meant to only run once. You ideally would want to do this after you've verified that all databases are up to the latest version so that they have the data change you are deleting. This also may happen as RefactorLogs start to pile up.

Gross Merge Conflicts

While merge conflicts in stored procedures and the like can be helpful, I've seen a lot of merge conflicts come out of the post deployment script, as well as the .sqlproj file. This can cause a bit of a headache and can create a bit of uncertainty, but most have been deal-able with a take-right-then-left resolution.

Conclusion

Overall SSDT has been a fantastic experience. I've really enjoyed learning it, and it has proven to be a very powerful tool. I would love if it had a better pattern for it to handle data changes, but that's really the only pain point.

Top comments (2)

Snamelisch • May 9 '20

Hi, I like this article very much, because it gave me lots of practical info I could not find elsewhere.
I would like to now more about the setup of the “DROP IF EXISTS” topic. Because I could not info about this.
Greetings Marc

Jonathan Eccker DealerOn Dev • Aug 4 '20

SSDT includes the ability to do a pre-deployment script, and the original thought was that whenever dropping a table we would add a line into this script "DROP TABLE IF EXISTS _____".

Since then we've adjusted a bit, instead of deleting tables or columns outright we move them into a "deprecated" schema for a while (or append "_deprecated" to the column name), then once a quarter we do a higher risk deploy where we delete all of the deprecated objects, and do a deploy with the data loss prevention mechanisms turned off and "Drop objects in target but not in source" enabled.