SQL Source control in our stack was introduced because of a number of pain points. We had to do full database refreshes any time we spun up a new environment, we were fairly limited in options to run databases locally, we had no clean way to properly version databases, and SQL scripts often were manual operations. We had come up with a couple processes to alleviate these pain points, but they often weren't sustainable or flexible.
SQL Source control is a way to treat your SQL database as a proper application (a Data Tier Application, to be precise). Changes made to schema and data get the same attention that a standard application would. This allows you to publish a database to a target and it will update it to the schema you provided.
There are a number of SQL Source control providers, we went with SSDT because it fits into our stack very nicely, and is free.
SSDT, or Sql Server Data Tools, is managed very similarly to standard .Net application. The solution and project layout is very similar, you can msbuild it or build and publish it in visual studio, all in a way that'll feel familiar to any .Net developer.
The first step is creating a new
SQL Server Database Project in visual studio (under the
SQL Server section).
Visual studio then provides a very handy schema import tool. If you right click on the project, and highlight
Import, you can select to import the schema from a database. We opted for the
Schema\Object Type folder structure as it would assist in providing some better boundaries between different schemata.
Once a connection was selected (to a QA server), you click start and it'll do the rest.
For the initial commit, it became apparent there was a lot of excess stuff that probably shouldn't go into source control. After playing around a bit the
.gitignore came out looking like:
.vs bin obj *.user *.dbmdl *.jfm *.log
When making changes you will occasionally see things called .refactorlog, do not gitignore these, they are required for properly doing refactors like changing column names, and need to be checked in.
Over the last few years I've put a lot of effort into pruning our database. We inherited a rather gross setup with over 1000 stored procedures, half of which were not used anywhere anymore, and hundreds of dead tables. I've gotten a number of these deleted (see https://dev.to/dealeron/using-graphdbs-to-visualize-code-sql-dependencies-3370), but there was still a lot of clutter.
Lo and behold, there were a couple dozen stored procedures and views that simply would not build, as well as a few foreign keys. This provided immediate value, identifying a ton of dead weight that could be removed.
The foreign keys were broken because they were set up in such a way that did not create a hierarchical cascade delete, and had the possibility of spinning up multiple delete paths. They were set up on an older version of SQL Server and managed to port over to a newer version of SQL Server that does not allow this to occur.
The goal then became to delete all the broken stuff. I had assumed that just deleting them from source control and republishing would delete the objects. This proved not to be true unless I selected a "Delete Objects in target not in source control" option. This also proved to not work because of a fail safe that prevents you from accidentally deleting tables that still have rows in them.
The solution was to provide
DROP IF EXISTS scripts in a data file (see next section). This provides a bit of long lived bloat, but deleting tables typically is uncommon so does not create too much of a pain point.
It's pretty common to have table data that is considered part of schema. Lookup tables for referential integrity or required for applications are the most common example. To make sure that these get included and maintained via deployments/publishes, you need to add a post deployment script.
I created a data folder and added a
LookupTables folder into it to accommodate for all tables that are considered lookup tables. As of writing this, I'm up to 4 folders,
ManualDrops. ConfigurationTables is mostly consisting of data that should be initialized by the supporting applications, but is not yet. DataChanges is for scripts used to change the shape of data or migrate data.
I then added a
Post-Deployment Script (under
User Scripts in
Add Item when you right click on the data folder). You can only have one post deployment script per project, but can make that script pull in scripts from other files.
This post deployment script ends up looking like:
:r ./LookupTables/SomeLookupTable.sql :r ./LookupTables/AnotherLookupTable.sql :r ./ConfigurationTables/AConfigurationTable.sql :r ./DataChanges/Ticket#.sql
:r is a command (from SQL CMD) that pulls in a sql script from another file. Make sure those other files are not included in the build, or else the compiler will try to treat them as tables/stored procedures/views/etc., and will give you compile time errors.
When deploying a dacpac (the "build" of a Data Tier Application), ALL data scripts will run EVERY time you deploy. This means you need to make sure that everything is re-runnable, and may need to clear out bloat if a script will be determined to not be runnable after a schema change occurs.
So far I haven't seen this become too much of a concern. There has been a few scripts that we ran occasionally that we identified would need to stop being SQL scripts and turn into a piece of functionality that lives on an application, but other than that the re-runability is solved by doing simple
IF NOT EXISTS(SELECT * FROM TableName) wrappers.
I've been beta testing SSDT for our company for a few months now. It's become apparent that SSDT, dacpac deploys, and SQL Source Control solve many more problems than I had initially set out to address.
It's now extremely simple to spin up a super lightweight database on a local machine. Simply spin up a new SQL Server (docker, localdb, sql server express, whatever), and either do a dacpac deploy or publish to it from visual studio, and you have a functional database.
New environments in general are much easier to spin up, and have much lower maintenance cost since we can start them with small databases and build them up instead of doing a full pull of production data with a data scrub on sensitive information.
There doesn't need to be communication between development and other teams (dev ops, QA, etc.) about what scripts to execute alongside other tickets. Just like an application, as long as they deploy a specific build, it'll all be there.
Our database was a bit of a monolith. Since SSDT by default doesn't drop tables that aren't manually dropped, we are able to virtually break up the database into multiple projects that all just happen to deploy to the same instance of SQL Server.
This makes it super trivial to migrate tables related to a specific team/domain into their own micro database (with a bit of migration plan). It essentially decouples tables from the database that's hosting them.
Visual studio provides a ton of helper functionality for changing schema. Typically writing a SQL script to change the name of a column might be a bit gross, but in visual studio you can just go into the designer view, change the name, and it'll provide a refactor log to make the change for you without you needing to write any SQL.
As stated earlier, it's now very apparent when something about the database is not lining up. Deleting a table that's required in a view won't even make it past the build step, let alone making it to trying to run it against a database.
I know it sounds odd, but merge conflicts are really a good thing. They are a strong indicator that a piece of functionality is possibly getting modified to do two different things. Previously it was always a possibility that two people making changes to the same stored procedure, function, or view, would just overwrite the other's changes and no one would be the wiser until we started getting bug reports.
I haven't played around with tests, but being able to provide tests around triggers, functions, views, constraints, and stored procedures is really powerful. Putting logic in the database has always seemed iffy because of how opaque it can be. I still stand by trying to have most logic live at the application layer, but having the database be testable massively helps ease concerns when we want/need to put logic in the database for performance purposes.
There's definitely some issues that come with using SSDT.
As you start adding more and more DataChanges that go through SQL source control you may occasionally need to go in and trim some of the DataChanges that were meant to only run once. You ideally would want to do this after you've verified that all databases are up to the latest version so that they have the data change you are deleting. This also may happen as RefactorLogs start to pile up.
While merge conflicts in stored procedures and the like can be helpful, I've seen a lot of merge conflicts come out of the post deployment script, as well as the
.sqlproj file. This can cause a bit of a headache and can create a bit of uncertainty, but most have been deal-able with a take-right-then-left resolution.
Overall SSDT has been a fantastic experience. I've really enjoyed learning it, and it has proven to be a very powerful tool. I would love if it had a better pattern for it to handle data changes, but that's really the only pain point.