Thanks for sharing your experience. Sounds like it was a pretty frustrating p...

[Comment from a deleted post]

Replies for: From a Data Analyst perspective how fuzzy do you think the boundaries between these Data * types are, how often do you think they are broken and ho...

Thanks for sharing your experience. Sounds like it was a pretty frustrating project.

There’s no rules to say that everything in a relational database has to be completely normalised with primary/foreign keys on each table. There’s nothing to stop your team storing what is effectively a collection of spreadsheets. It’s an expensive way to go , but not flat out wrong.

Database development is a lot like software development. Sometimes a decision made a few years back causes headaches down the line - setting too strict data types, building metrics on dim tables, retrospectively adding keys, dropping tables without checking dependcies ... so a big bang overhaul to the architecture probably isn’t the best option.

What was the outcome for your project?

simonhaisz • Oct 25 '18

That's a good point about technical debt. I'm always cognizant that good database design is hard, but I suppose I should remember that it's not necessarily that someone sat down one day and said "I think this is a good design". These sort of situations can occur over time. And unlike software you cannot refactor a database, so any serious change will need an upgrade/transform step. And I've been involved in enough upgrades to know how scary that can seem.

We actually got it to a POC where its heuristics of inferring schema from unstructured/semi-structured sources worked fairly well, generating UNIQUE queries to find primary keys and JOIN queries for foreign keys. Then show our 'best guess' to a user who could fiddle with the diagram correcting mistakes before committing to the new schema and starting the transfer. But at a certain point if you had a very large, convoluted unstructured database it became a 'garbage in, garbage out' situation.

The actual frustrating part of the project is that priorities shifted and it got put on the shelf :(