The Setup
As a developer there are fewer things more frightening than an ask or requirement phrased similar to the following:
I have this excel spreadsheet, but we need it as an application.
Invariably the spreadsheet has a convoluted sequence of functions and back-references, look-ups and unexplainable inconsistencies in format and display.
Ideally sometime before a developer is handed a spreadsheet and the the above “requirement” there could be a detailed analysis of business requirements including data sources, and all transformations expected of that data including test cases and example data. However in a fast paced modern working environment that is a lofty luxury indeed.
So what is our hard-working, indefatigable developer to do? Trying to suss out the hierarchy of column formulas extending up to and possibly column AX, some referencing sums of other formula derived columns, is a head scratchier indeed.
The Situation
In a recent exercise the (nearly) exact situation described above was encountered. The resulting artifact was to be in the form of a Database View. The data was (or would be) available in tables in the target database so that was at least covered. Still, I needed to map out the column inter-relationships and devise which structure of nested select statements would be appropriate to capture the same functionality of the Excel spreadsheet.
I first considered mapping out the column relationships on paper, but quickly realized that doing so would need a substantial amount of erasing as I moved on to the next column and had to re-draw a portion of my map. Clearly a more automatic solution was needed and being a developer... I had an idea for a tool to use.
The Search
A few months ago I stumbled across a diagramming tool that could take diagram descriptions in a markup language and produce decent quality graphical outputs. This was an amazing discovery as I’ve always struggled with the visual documentation tools such as draw.io or Visio. I much prefer to construct things like this in a declarative way rather than using visual tools. For me the benefits of declarative ways include being able to copy-paste easier, check into source control and even programmatically create the declarative statements.
The tooling I found is called Mermaid and it has a really nifty live editor for quick work and learning: Since discovering Mermaid, I’ve used it for some light documentation of processes, data-flows and... some unorthodox things.
The Solution
Back to our problem of trying to get a handle on the rat’s nest of excel formulas. Let’s look at one...
Doesn’t look too bad... bet let’s look again.
$AA$1
is a “constant” to be provided from fairly static table. Not doing too bad so far.
P7
is another formula=SUM(M7:O7)
OK so now referencing another formula. Now we're getting more complicated.
$P$19
is =SUM(P6:P18)
Now things are interesting... a sum of the set of data.
The whole sheet is littered with things like this. Very often the results of one formula are used in later or earlier formulas.
I figured I could use a type diagram to help me understand these relationships.flowchart
So to start with....
flowchart TD
AA7 --> $AA$1 & P7 & $P$19
Line 1 Flowchart
tells Mermaid that this is a type diagram.
Line 2 is indented for readability. Let’s take it piece by piece
AA7
This is the first node in the diagram. This is the simplest way to define a node, just an identifier.
-->
This is a line, in this case an arrow line. It indicates a directional relationship. In this case, that the identifier (column) to the left is related (references) the identifiers (columns) the right.
$AA$1
The next identifier (column) that is the referenced item of the relationship.
& P7
We can indicate that more than 1 thing is related, we could have put the relationship on a separate line as AA7 -> P7
if we wanted to, but I like this one line per referencing column format for clarity.
& $P$19
The next column referenced.
So far we’ve defined a flowchart
that has a node AA7
that references $AA$1
, P7
and $P$19
Let’s see what it looks like:
We can clearly see that AA7
references the other three columns.
let’s expand on that...
P7
is =SUM(M7:O7)
and becomes a line in our diagram P7 --> M7 & N7 & O7
Mermaid doesn’t understand Excel range notation, so I manually unrolled this.
$AA$1
is a value so we don’t need to expand it anymore.
$P$19
is =SUM(P6:P18)
I’m really interested at a line-by-line model of this so I can just say that it references back to P7
This becomes a line in the diagram $P$19 --> P7
With our full (so far) diagram markup of
flowchart
AA7 --> $AA$1 & P7 & $P$19
P7 --> M7 & N7 & O7
$P$19 --> P7
The resulting diagram is
Now we can more clearly see the hierarchy of our Excel formulas.
Using this method I created a flowchart of all the formulas for one row of the “spreadsheet that is a requirement”. Here is what the full markup looks like:
flowchart BT
H6 --> F6 & G6
I6 --> AT6
J6 --> H6 & I6
KG --> AU6
P6 --> M6 & N6 & O6
T6 --> R6 & S6
AA6 --> P6
AA6 --> P19_SUM_P6
P19_SUM_P6 --> P6
AB6 --> Y6 & Y19_SUM_Y6 & W6 & W19_SUM_W6 & T6 & T19_SUM_T6 & V6 & V19_SUM_V6
Y19_SUM_Y6 --> Y19
W19_SUM_W6 --> W6
T19_SUM_T6 --> T6
V19_SUM_V6 --> V6
AC6 --> P6 & T6 & V6 & W6 & Y6 & AA6
AD6 --> P6 & T6 & V6 & W6 & Y6 & AB6
AE6 --> AL6
AF7 --> AL6
AJ6 --> AG6
AK6 --> AH6
AL6 --> AJ6 & AK6
AM6 --> P6
AN6 --> AD6 & AM6
AO6 --> AN6 & AL6
AP6 --> AO6 & AO19_SUM_AO6
AO19_SUM_AO6 --> AO6
AQ6 --> AO6 & AP6 & AN6
AR6 --> AN6 & H6
AS6 --> AQ6 & AQ19_SUM_AQ19
AT6 --> AS6 & AQ6
AQ19_SUM_AQ19 --> AQ6
AU6 --> F6 & J6
AV6 --> J6 & H6
AW6 --> AU6
AX6 --> AV6 & AW6
It’s big.
I did change the direction of the flowchart by adding BT
after the flowchart
definition. This stands for Bottom-to-Top. Other directions supported are LR
and RL
in addition to TD
the default for Top-Down.
Here is the resulting diagram:
Is your mind blown yet?
This diagram clearly shows me the conceptual layers of the SQL View I’ll be creating. I could now see which columns I needed to have ready for
each following layer.
At initial count I see 18 layers. I count by horizontal groupings. This a naive count of layers as many of the smaller functions can be in-lined into a sibling layer. In practice the very large functions that include checks for divide by zero were not rolled together with other layers and... the summing columns could not be rolled in without some correlated-sub-query mess. It was much cleaner to have these as separate layers.
The Summary
Had I not used Mermaid to help me diagram the requirements, it would have taken me multiple times longer than it did with using the diagram to assist in understanding the chain of data formulas.
That I don’t need to manually adjust the positions of nodes or connect and draw lines myself, saved me a ton of time. Being able to save the
markup to a text file for later editing or adding to this very same blog-post... priceless (insert appropriate credit-card commercial here💳).
I plan on using Mermaid diagrams in this manner in the future to better capture the true nature of requirements.
I strongly encourage you to check out the Live Editor and Documentation to learn more about how you can use Mermaid diagrams.
Of note, in this task we used just ONE type of Mermaid diagram out of many... and more are being added. Mermaid is open-source and being constantly improved on.
Here are the diagram types that Mermaid has as of this writing:
- Flowchart
- Sequence Diagram
- Class Diagram
- State Diagram
- Entity Relationship Diagram
- User Journey
- Gantt
- Pie Chart
- Gitgraph (Git) Diagram
- C4C Diagram (Context) Diagram
- Mindmaps
- Timeline
Top comments (0)