We all long ago learned that LOC (Lines Of Code) are a terrible unit for measurement.
Well, at least most of us learned that.
Now when I sat down this Friday to work on some internal magic to get some text from your console to a dashboard (easier said than done, I've found CouchDB to be the tool of choice) at the end I was doubting my productiveness.
The Result
At the end I had two things:
- three hours in our time tracking, might have been quite a bit more though
- 104 (one hundred and four) lines of beautiful shell code in our GitLab
That's even less than a line of code per minute.
Hell, that even contained a block of code that's been duplicated five times with a different variable name
(it's a non-trivial case to DRY that part).
Granted, we did port the whole thing from a terrible vim file;pandoc file | curl home-grown-nodejs-daemon
to a cleaner database solution with revisions and stuff, but the discussion part was just about an hour or so.
The Script
So what does the script do?
Basically it
- uses
getopts
to get some variables filled - reads missing variables from stdin if the tty is interactive
- fails if mandatory variables are missing
- downloads the current document
- merges the current document with the new entry
- pushes that back to CouchDB wrapped with the correct revision
Seems like an easy task, right?
Why Lines of Code are bad measurement.
I've put a lot of effort into making the script as robust as possible.
If at some point you enter something like a literal my"name\":{}\x123
it will be stored the very same way in the database.
Everyone who has ever dealt with shell-scripts will know that it's hell of an effort to not fail at this.
There is your shell which has escaping.
There is the json merging which needs the string input to be escaped.
There is the curl which could possibly fail.
There is so much that could go wrong.
It took five lines of (maybe too) tightly packed shell that use a variable, read it if not set, but only if the tty is interactive, fail otherwise, escape it (properly, not only a simple s/"/\"/g
) and store it in a new read-only variable.
This works for all inputs, including special characters that need special escaping in JSON (think: "binary" characters, multibyte, hell even emoji).
That's five lines.
You'd have trouble putting that in such a tightly packed piece of code in programming languages that don't need super special escaping.
There is no meaning to the number of lines of code, because it's an artificial number that can be changed at will (blank lines, moving lines together, comments, .…).
But further, there are task which seem so thoroughly trivial, but end up in a lot of work. Sometimes they even turn out to actually be plain simple, but that might not be obvious at first. There often is an elegant solution to a simple problem, that is so elegant and plain that you simply don't see it.
Top comments (5)
I find lines of code to be a useful measure for complexity, whether inherent or incidental. High lines of code per file / function is a good start to determine problematic sections.
It is definitely not a measure of work. As developers, we don't produce code, we use code to solve a problem.
I don't think it's like that.
I can easily use several tens of lines of code for list comprehension/transformation and aggregation, but for the same task can also use one mathematical formula.
However, it can also mean that the programmer did format the code to better reflect the actual task accomplished.
It can also show that a module/file/class contains a lot of code that can/should be split up.
There is not "one reason" for filesize, no matter whether it be small, medium, or large.
A high amount of lines of code says literally nothing at all.
Do not at all, ever rely on that.
For instance, said script contained about 100 lines of code (getting/reading parameters) that wrapped 5 lines of actual code (send to CouchDB).
Of course, LOC should not be taken as a hard rule, but they can be an indicator of code that needs to be refactored, especially if it is touched frequently.
In the case of your script, it is very likely that the reader does not care about the i/o boilerplate, so that could as well be in another file or at the bottom of the same file. Then you would have one short script that is read often and a long script that is read rarely.
BTW, despite thinking that it is useful as an indicator for complexity, I absolutely agree with you that LOC per time is not a good way to measure effectivity or productivity.
Three hours is quite fast. With breaks and co-worker interruptions, this would take me at least a whole day. And I'm seen as a productive engineer here.
I think part of that is owed to my extensive knowledge of shell scripting and jq (which made most of the shell script) and the ease of use of CouchDB.
Seriously, we had that home-grown (also mine) nodejs thing before, I'm glad we don't need that anymore.