Some commented out lines of code, which supposedly apply a particular business rule over another, are left hanging without what they were supposed to do. Some comments allude to warnings about the consequences of changing particular join conditions, without an explanation as to why.
Many SQL codes can go over a dozen layers of join paths deep, with multiple subqueries, badly written split-apply-combine methods, and inexplicably named aliases.
Executives of the business may one day get an automatically generated report one day and question "how the figures were generated" (Think "why is this number so low?"). With people moving from one job to another, undocumented SQL code jibberish is all that the analyst has to go by (if the analyst is given access to code at least).
A lot of data pipelines rely on SQL, and they really should be documented better - explain to the user the code that they are looking at. Throw in a diagram perhaps.
But we shouldn't have to write a separate document that does these things. Separating the documentation from the code increases the chances that the documentation will not be updated. Documentation should really live in the code as comments. I like the simplicity of markdown, and think it would be nice if I could use markdown within the comment block of the sql to document the code.
A little while ago, I posted about documenting Python code for data science. I had built a package called mindoc that allowed me to document my pipeline code using markdown within my code.
Now I have expanded the tool to help document SQL code and added support for diagrams (using mermaid js)!
See this github page for how it works: https://minchulkim87.github.io/mindoc/