loading...
Cover image for Your bash scripts are rubbish, use another language

Your bash scripts are rubbish, use another language

taikedz profile image Tai Kedzierski Updated on ・5 min read

(Headline photo from nixcraft's post to which I was reacting below)

So the below was a rant I posted in response to some pushback - someone suggested using Python instead of bash, and a few people complained about how it's overkill, how there are two versions of Python you need to get right, or you have to get it onto the machines in the first place, or suchlike.

I love shell scripting, and I still use it a lot, but I'm no fool. There are so many issues with it that blindly defending it for all use cases is foolhardy.

Not least because in company settings, most other people you will work with haven't the slightest idea what they're doing with shell scripts. They already get brownie points for thinking of putting the scripts in source control, kind a like getting grades for writing your name on your test.

SHELL SCRIPTING IS REAL PROGRAMMING. It should be source controlled, code reviewed and written to clean, maintainable standards. Because that code is meant for production.

So this is what I retorted:


If it's a personal machine, install Python - or other language of choice.

If it's the enterprise machines, have a policy to ensure Python - or other language of choice.

Shell is good and great, I'm a command line junkie myself, and I still turn to shell scripts for a lot of my work, but a shell script that's for more than wrapping a long pipe or two quickly becomes madness unless you actually put in the time and effort to learn the language PROPERLY. (The article feature pic is an example of utterly shitty code and their problem is not the filename spaces YET, but their RUBBISH handling of variables, and the lack of any effort to write cleanly)

I am an extensive bash scripter, and became so historically only for the reasons listed around here, which I once saw as valid reasons. They can all be worked around - and the overhead of solving "language X not on our machine farm" is often better return on effort than the years of unmaintainable brittle shell scripts you've been writing dozens of and never maintaining, or your colleagues cannot fathom and won't touch for love nor money. Unfoathomable shell scripts that run the backbones of deployments, builds, farm management and more - the backbones of many a company.

I've seen bash scripts by seasoned developers, and those are also utter trash. Your code may be cleaner than the average, but that's an extremely low bar.

Shell scripting is good and powerful in its own right, it is true, and I have advocated that people give it a proper try and actually learn it, but the sad reality is that nobody does. If it's any "proper" language, they learn the ins and outs gladly, by peer expectation or inherited bias; but shell, even if you learn it properly, someone else will come f/ck up your clean code because they can't be arsed.

I am endlessly pushing for developers and admins to actually learn to use bash/shell properly, make use of functions, encapsulate steps of logic, write clean code. "It's just a shell script," "that's overkill," "it's fine like this," "I don't want to sink time into learning this, it's not a real language anyway."

The other truth is that, in the sysadmins space, most can't even write clean Python/Ruby/Perl/PHP/JavaScript/chosen-lang either, and given the number of gotchas and things shell lets you get away with until you hit a catastrophic bug (of the coder's carelessness, the misunderstood shell behaviour is documented) (Steam bug anyone?), they'd be better off in a safer environment than shell scripting.

Shell scripting perennial issues that are not the fault of the coder:

  • will gladly let you get away by default with undefined variables (unless you explicitly set -ue)
  • comparison vs assignment is the only place where space matters (a=b and a = b are COMPLETELY different statements, wtf)
  • you cannot return arrays from functions, only a stream of text (the power and the Achille's Heel) (this one issue compounds many of the others, by preventing workaround functions from being written)
  • arrays cannot be passed down to functions as distinct items alongside other arguments (you can use references as a way around, but how many bash scripters know those?) (easier to use global variables right? yuck)
  • variables are global by default. unless you make your iteration counter local, you stand to see some weeeeird bugs
  • string splitting is done around an inherent part of the string, not as a a function operating on it (do we all know about IFS, does everybody know how to use it? didn't think so)
  • Is it really the shell you thought it you were running? Ever deployed bash scripts only to find that the only interpreter on the machine is sh? Or the environment forces you into sh by default? Or that in fact you're not running bash but ash? Or maybe the system default is dash. Anyhow, you have to write everything now in plain sh and lose any improvements that bash ever brought that make the task more bearable.
  • Inconsistent environments for common commands are rife. Your script uses the "mail" function? GNU or BSD, which options to use? You use netcat? Which variant, which options? You use tar, grep, rsync? You using GNU, BSD or Busybox implementations? (these variations happen endlessly when mixing Ubuntu, CentOS and Alpine deployments, and that's just the surface)
  • (I scoff at any pushback of "ensuring the right version of Python on the company systems")
  • attempting anything remotely event-driven yields a nasty pile of workarounds (I've tried, with muted success) Granted, this is a space which shell is definitely not designed for, but that's to say how far I tried to do everything in shell at one point. It's possible, but it's damn hard work where another language would have been better.

Most fundamentally, the view of shell as "not a proper language" hampers any impetus at large to learn it correctly and extensively, and understand its own idiosyncracies. At least with one of the other languages, developers have an inherited mindset that their skill in that language needs continual improvement, and will work towards this.

I still write tons of bash scripting. I love it. But recommending other people use another language is much more sane. Personally, I chose Python too. But in the end, I wouldn't recommend it unless you are going to do your darndest to learn. It. Properly.

Posted on by:

taikedz profile

Tai Kedzierski

@taikedz

GNU/Linux and free software enthusiast; organiser of Edinburgh Linux User Group and the Edinburgh Language Exchange; Computer Smacker General; ramen guzzler, quiche murderer. A friendly cat.

Discussion

pic
Editor guide
 

I saw once a Go library designed to mimic a UNIX shell and some of UNIX utilities. The idea was to use Go for what would traditionally be shell scripts.

The main two strengths of a UNIX shell are effortless pipes (try that with Python!) and external command execution as a first-class citizen.

 

Most modern languages allow function composition (pipes between functions). And they have extensive, stable repositories of libraries.

This boils down to shipping the right packages with the distribution - something Linux distro's already have, of course.

 

Python's implementation support for pipes is reasonable and not that hard to use, but it's painful compared to actually using pipes directly in bash/tcsh/etc. Even using FIFOs on the command line feels more natural than the way you have to compose them in Python. The Unix way of composing operations is probably the best implementation of functional/stream programming ever.

And I am a Python lover, so don't think I'm knocking Python!

Absolutely right if you're going in and out of Python to create a pipeline of full fledged processes (I suppose you refer to the subprocess module and alike).

What I meant is: you can stay in the environment that supports 'tacit/point-free programming', and make sure you have everything you need:

# 'function' composition is a dash
# functions are processes
tac logs.txt | grep "http://" | xargs wget

Could be just as easy:

# function composition not built-in
# functions are native
compose( read_file("logs.txt"), filter_lines("http://"), wget )

provided you have these functions lying around somewhere.

Granted: Python has these FP concepts built-in, but not as nicely as the unix way. There are better languages for that: Haskell, F#, erlang...

-- my haskell is rusty - but function composition is a dot:
-- functions are native
pipeline = read_file . (filter_lines "http://") . web_get
pipeline "logs.txt"

Interestingly enough, someone already thought up a Haskell shell: Turtle

Yes, I was talking about composing processes like you would at the shell.

I see what you're saying about point-free programming, though. I still think the Unix style is the cleanest, most natural implementation of point-free programming, and I think the fact that it is a genuine stream of processing is a big point in its camp. However, if your Haskell example is accurate, I like it. The examples the Wikipedia article give seem less intuitive and a lot more LISP-y.

I think most programmers would probably find the use of compose in Python a lot less intuitive than nested generator functions, and it's certainly an inelegant implementation of point-free programming. I also wonder if it can eliminate some of the advantages of the generators? It probably doesn't based on the sample implementation, but I'd have to think carefully about if applying partial like that would have unintended consequences, at least in some cases.

I think so, too. You can make a nice 'fluent' DSL out of it, though.


class Chain:
        def __init__(self, *fns):
                self.fns = fns
        def __or__(self, fn):
                return Chain(*self.fns, fn)
        def __call__(self, arg):
                return reduce(lambda ret, f: f(ret), self.fns, arg)

Chain() | read_file | create_filter("https://") | web_get

def double(x):
        return 2*x

def fromstr(s):
        return int(s)

def inc(x):
        return x+1

def repeat(n):
        return lambda s: s * n

c = Chain() | fromstr | inc | double | str | repeat(3)

assert c("1") == "444"
assert c("20") == "424242"

..... ingenious.

I'm still trying to brain this, its possibilities and its limitations but... wow.

A bit of commentary would be very welcome :-)

I'll expand it in a full fledged post :) Or 'leave it as an exercise'?

I either love this or hate it, I can't decide. Bravo, sir!

Somehow, I have hit the 'publish' button on dev.to/xtofl/i-want-my-bash-pipe-34i2.

I've only skimmed the article so far, but it looks like a good one. I like the title! 😁

 

Yeah, doing pipes is a nightmare in anything other than shells, and the reason why I still love using shell scripts - chaining tools.

But there's a lot of logic that can often be sub-moduled out to other languages to make a cohesive whole. Some systems people don't like the idea of a program not being self-contained in a single file and you end up with 2000+ lines of sub-optimal code at best, more often than not downright horrendous though... take a peek at the install scripts of some of your favorite software to see what I mean...

 

In regards to having to choose between 2 versions of python, no not anymore. 2.7 is end of live and at minimum everyone should be using 3.6+. But you should be on the latest stable, currently 3.8

 

Yeah, but legacy scripts are totally a thing. Some outfits don't even know you can have both installed side-by-side so you can do a progressive script migration. So they did no migration. And they mandate all Python2 even now. It's not pretty. Doing my best on my side to further educate my colleagues, but that's just my corner...

 

Yea legacy is a different thing I agree. But we should all be encouraging to get people to upgrade.

 

I usually start with a bash script. Because it's just easy. Then I need to do something too complicated like concatenate a string or use arrays, hashmaps etc and I end up rewriting the entire thing in python.

I'm sure there are many people that are amazing at writing bash scripts but it's never been very readable. Once you get just a bit complicated languages like PERL are starting to look pretty by comparison.

Also my biggest petpeave bash sed and awk are not compatible between Linux, mac and likely other variants.

 

I'm sure there are many people that are amazing at writing bash scripts

No there are only a handful :-D (I'm only half kidding)

Also my biggest petpeave bash sed and awk are not compatible between Linux, mac and likely bad variants.

My biggest gripe with shell scripting is the tooling dependency.

I used to quip "in bash, ANY language is your library!". The obverse of course is true: any tool can subsume any other in any given environment, and you won;t know til your script crashes.

macOS uses the BSD Utils by default, as do the BSDs in general ; Fedora uses GNU Coreutils, except when they use a BSD adaptation ; Ubuntu is GNU Coreutils most of the time; Alpine uses BusyBox (a great tool in and of itself, but a thorn for cross-platform shell scripting) ; ...

 

it's just limited. It makes it harder to write tests and follow most coding patterns.

Granted there are tools like this: github.com/sstephenson/bats but not sure if anyone uses them. Also.. Libraries!! How many times do we need to re-write the same fix to the same problem.

macOS uses the BSD Utils by default, as do the BSDs in general ; Fedora uses GNU Coreutils,
except when they use a BSD adaptation ; Ubuntu is GNU Coreutils most of the time; Alpine uses
BusyBox (a great tool in and of itself, but a thorn for cross-platform shell scripting) ; ...

So.. you're saying that it's a very repeatable and consistent ecosystem? O.o

Yeah that's part of the issue. It's easy, it's everywhere just run bash foobar.sh, except when it doesn't work and you have to write 7 versions to support all the various edge cases.

It's not as easy to write, but I'm really liking go. It's way more complicated and verbose than bash, but at the end of the day i end up with 1 file to copy around.

Yeah libraires... is why I started my bash-builder project and its sibling bash-libs. Build the script and have... a single file to copy around ;-)

The backbone of most of my bash scripting nowadays...

 

As an Oracle Engineer for a large coporation 95% of the automation I write I use ksh, no python3 in RHEL 7 where I work and old legacy hosts dont even have bash so ksh works everywhere. So much of the automation my team writes is all ksh for a whole host of tasks and while that last 5% is a bit of Python I can't see it dynamic changing anytime soon as SHELL scripting is so imbedded into the way the team and the business functions.

 

Once a technology is adopted it's hard to unstick it... That being said in your case it sounds like everybody gets the training required to write clean ksh? So long as they've learnt ksh fully and not "just the easy bits," then my argument remains the same: so long as there is impetus and requirement to learn the language properly, there's no (or much less of a) problem!

 

Bash is real programming. I wrote tons of modular complex logics and functions with just Bash running on top of many critical backbone servers. Bash is a beautiful language to me. Once come to the Linux environment on Server or IoT platforms, Bash is a mandatory language. If I need my program doing floating-point maths. I'll use C or C++ to create a tiny efficient engine with added APIs for the Bash script to sit on top of it. You know... Bash loves piggyback anything. It's cute.

 

Yep it's real programming alright. That said, I've seen many cases (and been guilty of a few) where the logic would have been better moved to other languages, in their own succinct modules, and used bash to tie the pieces together. It all depends on the use case of course :-)

 

"you cannot return arrays from functions, only a stream of text"

Never tell an engineer they can't do something. Please see:
dev.to/jrbrtsn/returning-an-array-...

 

That's not "returning" an array, but passing a reference to be manipulated in-place :-)

Still, a useful way of passing arrays down/up. You just.... need to know the technique... forget about recursive functions though. The name in the receiving function must be different from the one in the caller function.

 

Passing a result buffer by reference into a function is exactly what C++ does under the hood to present the illusion that functions may return something too large to fit in a CPU register. Recursive functions in Bash are a bit trickier, but I'll post a solution to that later today ;-)

I dunno, last I heard (and I am not versed in C so I could be talking complete nonesense), C functions return memory addresses... and of course, it is up to the programmer to know what any one function is passing back, be it an actual int, or a reference to a data structure of any kind...

I look forward to whatever solution you have to enable getting around the name clash during recursion with reference variables :-)

Tai,
C functions can only "return" something which can fit in a CPU register, because that is an efficient place to stash information which can be retrieved after the function "returns" - meaning the content of the CPU register gets copied into a stack variable or some other place useful to the programmer. C++ presents the illusion of returning something larger than a CPU register by silently creating space on the stack (a return buffer, if you will) before the function gets called, and then silently passing a reference to this return buffer into the function where it will get populated. Semantic shenanigans if you ask me.

That is wrong indeed.

C functions return typed values. Indeed once the structures grow bigger, programmers tend to revert to output arguments, which are often pointers to structures. Also, for lack of exceptions in C, functions often return error codes, and accept 'output' arguments as well.

In C++, types are much more used to advance the correctness of the program. Exceptions, variants, ... all make this a lot easier.

But... you can pass mere void* around. Severe pain should be your punishment - though there are legitimate cases for it.

xtofl - you may wish to review the assembly code produced by a function call in C before declaring that I have made an error ;-)

I'm sure the assembler does emit low level stuff like that. The semantics of the language are copying a value out of the function, though.

In C the semantics what a function can return are limited to something which which will fit in a CPU register because that is what the term return means. The C compiler provides an additional service of promoting or truncating this returned value to match a simple return type. You can't successfully return any arbitrary local-to-the-called-function stack based struct from a C function, because the content of this struct is undefined as soon as the function returns. You can pass the address of any arbitrary struct into a C function, where it may then get populated.
In other words, for C there is a bright line of distinction found in what may be returned, while in C++ this line is syntactically blurred with some compiler tricks.

I am very confused now. It's off topic here, but what does 6.9.1/3 of n1256 mean, then?

The return type of a function shall be void or an object type other than array type.

And then there is paragraph 12...

12 If the } that terminates a function is reached, and the value of the function call is used by
the caller, the behavior is undefined.

Maybe I should stick to C++ then.

If you don't care about the assembly generated by your compiler, stick with C++ is a good choice. That said, I picked up C 32 years ago, and it remains one of the most popular programming languages. I am roughly 5x as productive in C as C++, and I've been programming in C++ for 27 years. It's getting worse with each subsequent C++ standard.

Not debating language superiority.

But I'm truly dazzled you would not be allowed to return a perfectly valid object. Let the compiler inject the correct assembly to accomodate it.

What about this one stackoverflow.com/a/9653083/6610? Wrong, too?

The stackoverflow example returns a value which is the size of an int. That fits just dandy into a CPU register. One of the focuses of the C++ standards committee has been trying to squeeze out all the superfluous copying which has been implemented to support clear semantics - so there's your reason.

Sorry, I keep finding counterarguments to your statement, unless I misunderstood.

This article clarifies some things, too: uninformativ.de/blog/postings/2020....

Could you be talking about 'early versions' of C?

I'll try compiler explorer tomorrow - now's nap time here.

From the Wikipedia page:

In regard to how to return values, some compilers return simple data structures with a length of
2 registers or less in the register pair EAX:EDX, and larger structures and class objects requiring
special treatment by the exception handler (e.g., a defined constructor, destructor, or assignment)
are returned in memory. To pass "in memory", the caller allocates memory and passes a pointer to it
as a hidden first parameter; the callee populates the memory and returns the pointer, popping the
hidden pointer when returning.[2]

There's your compiler trick of silently allocating stack space, and then passing it by reference to the function to get populated. After the function returns, C++ compilers historically have then copied the result from the invisible-to-the-programmer return buffer into something the programmer knows about. For large objects this is a significant performance penalty to pay for some syntactic sugar.

Can you put a date on that? "Historically", that must mean pre-2003: since then, every major C++ compiler does copy elision. C++ has wildly drifted away from what C used to be, to follow its credo "don't pay for what you don't use".

But thanks. It is indeed very interesting to see how you're drawing totally opposite conclusions depending on your viewpoint. Here's a (reasonably) honest analysis of the schysm between C and C++, by someone in the standardization world: cor3ntin.github.io/posts/c/index.html

  • hardware-focused: a C compiler has to perform assembly 'tricks' to accomodate struct returning
  • problem-focused: returning structs is the most trivial thing a compiler should allow

I conclude that

  1. the C standard totally allows returning structs
  2. it leaves the semantics to compiler builders ('undefined')

Btw. with godbolt.org/z/a4exrM, you can compare the assembly generated by a large number of compilers.

I'll concede that my understanding of C++ compilers is dated - in 2003 I had already been coding in C++ for 10 years.
The crux of this disagreement is the definition of the word "return". In my opinion, creating the semantics of returning an object by silently passing in the reference to a possibly significantly sized return buffer of which the programmer may or may not be aware is not "returning" an object at all. Passing return buffers to a function by reference has been possible and clear since 1972, but I still cannot write in C or C++:

BigClass1, BigClass2, BigClass3 funcname(int arg1, double arg2)
{
   BigClass1 rtn1;
   BigClass2 rtn2;
   BigClass3 rtn3;
 // do stuff with arg1, arg2. Populate rtn1, rtn2, rtn3
  return rtn1, rtn2, rtn3;
}

However, I have always been able to write:

int funcname (BigClass1 *rtn1, BigClass2 *rtn2, BigClass3 *rtn3, int arg1, double arg2)
{
   // do stuff with arg1, arg2. Populate *rtn1, *rtn2, *rtn3
   return SOME_ERROR_CODE;
}

So, is the second example "returning" 3 objects? If not, then why?
By implementing the semantic charade of returning a single arbitrary object, modern compilers have accommodated a single case where it appears in source code that a function can return something which will not fit in CPU registers:

HugeSizeClass funcname (int arg1, double arg2)
{
   HugeSizeClass rtn;
   // do stuff with arg1, arg2, populate rtn
  // Throw exception on error
   return rtn;
}

So, where (stack|heap|static data segment) does rtn exist? How does the contents of rtn make it back to the caller? How is this not superfluous copying?
I contend that the following code is much clearer and guaranteed to be at least as efficient :

HugeSizeClass* funcname (HugeSizeClass *rtn, int arg1, double arg2)
{
   // do stuff with arg1, arg2, populate *rtn
   // Return NULL or throw exception on error
   return rtn;
}

"Returning" is indeed an abstract concept. Each platform makes it concrete in its own way. To the CPU, there's no such thing as 'returning'. I don't think opinions should matter here; it's always a choice.

To me, returning a tuple is fare more readable than mixing input- and output arguments of a function. Modern languages accomodate this, and together with 'destructuring', it leads to code that most developers can readily understand.

#python
def explode_url(url):
  ...
  return prot, dom, path

protocol, domain, path = explode_url("https://x.y.z.com/a/b/c")
// C++
auto explode_url(const string_view& url) {
  ...
  return make_tuple(prot, dom, path);
};
auto [protocol, domain, path] = explode_url("https://x.y.z.com/a/b/c");

As an application programmer (less so as a driver implementer or kernel hacker), I value this s expressiveness in a language. Any compiler that does not know this 'magic' forces me to work around it.

Hey, as an aside - cor3ntin wrote a nice overview of what divides C and C++ worlds: cor3ntin.github.io/posts/c/. He shouldn't have called it 'The Problem", but I like his analysis, which goes way beyond the technical.

I had already read the article, thanks for sharing. "Expressiveness" is an entirely subjective term which merits little discussion. In my opinion, any pointer passed in which is not marked const refers to a return buffer. If you study libc's prototypes, you can see the established convention of passing in the address of the return buffer(s) first.
Back in 1993 I would have asserted confidently that C++ will overtake C in a decade or so. Live and learn.

 

I use bash day in and day out. It's my favorite shell. I've been using it for a long time.

But... whenever I need to do something for production, I reach for Python 3.x. I use git for source control. I have my code peer reviewed.

I've converted some other peoples bash scripts into Python. I've converted some other peoples Perl scripts into Python. I've converted some other peoples JavaScript on Node.js into Python.

Because I love Python? (Well, I am fond of it, true.) No, because the other code was hard to understand and hard to maintain and hadn't been code reviewed and wasn't abiding by the approved scripting engines. (Node.js is actually approved, but the other points hold.)

The "hard" part isn't a shortcoming of the language (after all, these aren't PHP), it's a shortcoming of the programmer making a tasty pot of spaghetti code -- and spaghetti code can be written in any language.

 

spaghetti code can be written in any language.

Yes, but some languages (and their baggage) can be more conducive to spaghetti ;-)

If you look for examples of good Python, Java, JavaScript, C, Golang, etc, you can find them, and there are LOTS of people trying to demonstrate how to do it properly. Examples are everywhere.

Shell (and to a lesser extent Perl) is plentiful in the wild - and it is mostly the awful stuff that is most readily available. (this was my point about peer expectations to improve skills in some languages but not others).

If you did the conversions in a company, and everyone else can write clean Python then great :) Although perhaps getting people to write clean lang-x in the first place would have been just as productive. I speak from experience when I say trying to make people learn and write clean shell is a pain.

 

That is a valid counterpoint, and I concur. Good code can be written in any language, except PHP of course. (I would have said PHP or Perl, but I've actually seen good code in Perl, so I know it is actually possible.)

 

I support you, I love bashScript is my favorite code language, it is very powerful, in my begin with bash, my due to my big ego I did it my bashScripts complex, you know, to prove that I knew coding with bash, right now I always try to do my code very, very simple, for anyone can understand it easily, I like a lot the one liners and use the test [[ ]] and && for the flow control, in one line, but that do it more complex of read my code, so the best is use indent😁 and do your code of the most simplest wayπŸ˜πŸ‘βœŒοΈ

 

I've been learning/writing a lot of industrial-grade shell code lately. The process made me think about this special language's place in the world. I came to the following conclusion:

Shell is a real language. It's focused in specific areas of software development. If you try to venture out of those areas, you'll hit a wall. If you know you'll never venture out, then it's totally fine to use it.

Shell lacks data structures, proper maths and string operations are awkward at best, so it's not a good language to work on data.

Shell lacks object-oriented features so it's not a good language to model business rules.

Where shell shines, is in its portability in the unix world (posix shell scripts), its direct line to the OS and its I/O handling.

It's perfectly suited for DevOps, gluing different programs together and day-to-day development low-level tasks.

As these are all important areas of software development, shell code should be treated as importantly as the code you write that works with data or business rules; it's also part of the system you're building!

 

I'd say that's pretty much 99% correct. It is a language with a specific use case in mind, that of chaining tools. The mistake often made is trying to, as you say, venture into incompatible areas, where problems quickly pile up.

POSIX compatibility is a bit of a gripe of mine - it's there to ensure cross-compatibility, so that's definitely a bonus, but those specifications are so restrictive that they cripple a number of capabilities various shells have come up with to "resolve" the issues of POSIX.

One of them is array iteration - decently well fixed in bash, and probably workable in most other shells, but quite horrific to handle in POSIX sh. With properly setup server management practices, it should be possible to mandate any software required, so bash would be my choice to mandate, but other teams prefer other shells, and that's fine too.

Mind that I'm not saying that bash and shell scripting is bad - on the contrary I love it. I'm mostly saying that people generally don't bother to learn to use it well and for its use-case, and so we end up with a default quality of scripting across the board that is horrific - and that level of quality is passed on because that's what expected. And so my message became a distorted one: "for pity's sake.. learn it properly... stop adding to the problem..." ;-)

 

I really can't say I find much to disagree with here.

If I can think of how to do a task 80-90% of the way on the command line without much head-scratching, I almost immediately reach for Bash. If it's a clearly OS- or file-system-centric or especially stream-oriented task, same. But as soon as it seems like there's a lot of in-place mutation needed and certainly if I feel like an array/list/hashmap will be necessary -- arrays are such a pain in Bash -- then I reach for Python. If it seems like it's going to be complex with optional arguments or different types of runs, straight to Python. As soon as what was a straight-forward Bash script starts getting requests to do more stuff, rewrite it in Python.

I once write a 650+ line Bash script -- including the generous comments and formatting -- to generate reports to streamline security auditing on a network of user systems and servers, because one of the sys admins didn't know Python and said it would be easier if they ever needed to modify it. It was solid, effective, fast, and even configurable with the ability to add new checks using regular expressions. But that was the point at which I said "never again" for something like that. The last I heard, it's still in use a decade later and has hardly been touched except for a little tweak here or there due to some network changes. They actually use it to vet any new tool they bring in. So I'm pretty proud of that, even if I'd never write such a monstrosity now.

 

I know the pain of such pride :-)

I did a couple projects where a fair bit from one was being re-used in the other (generic stuff, like output control, string validations etc) and I started putting bits into individual files etc and eventually ended up putting together a build tool that allows using scripts from a "library" and writing #%include lines in the main script. There's some extra management in there to prevent double-including files (two dependencies with a shared third dependency) etc. Everything gets built into a single script that can then be passed around/deployed.

Some of my scripts are now beyond the 1000+ line but that's only because they pull in a few "external" libs :-)

And yeah, your heuristic for transition point sounds eminently sensible :)

 

Ha! Bash can definitely give you a bit of a Frankenstein complex, simultaneously proud and horrified at what you've created.

(Not to be confused with Asimov's definition of "Frankenstein complex".)

 

Ha! Bash can definitely give you a bit of a Frankenstein complex, simultaneously proud and horrified at what you've created.

(Not to be confused with Asimov's definition of "Frankenstein complex".)

 

This is where PowerShell shines. Cross platform, shell capable, rich objects and ecosystem. Pester test framework supports tested robust modules. If dotnet is in your environment it works great and the pipeline is powerful.
I was surprised at how much of a transition I had leaving windows behind and moving to macOS and docker based workspaces. Most of my stuff runs smoothly on all 3 systems

 

I never gave PowerShell a go on anything other than Windows... I found it there to be excessively verbose for a command language, but I can see where that might in fact improve its use as a scripting language.

The availability of object output from tools is certainly a step up from parsing text streams (especially those that are designed for on-screen display, instead of automations), but then that puts the onus on the tool writer to write for that compatibility. I'd more readily have a JSON parser in lang-x and use it to extract from the outputs from other tools, and standardise other tools around JSON (seems to be the most common thing to handle these days). This could even unify web-API-based programming and shell programming nicely...

Question remains though - is most of its functionality derived from built-in functions, or calls to commands (actual executables) in the local system? Not sure, but as I understand you, you are running the same PS in Windows and non-Windows environments?

Also what's the average quality of examples out in the wild to learn from? From Python / JS / C etc there are plenty of articles, blogs, tutorials and the likes where "clean code" is insisted on, and professional environments may insist on it.

shell scripts - including PowerShell so far as I have seen - are rarely seen as targets for the same zeal of cleanliness and suffer as a result....

 

I say, write in the language that makes you feel happiest, but write it like you're the one who is going to have to debug it/add to it in a year's time. Comment, separate variables from content, and separate concerns into separate functions (or files in the case of bash).

If someone comes to you and asks you what the code you wrote a year ago is doing, (1) you already messed up, and (2) if you can't figure it out in a matter of moments, then you messed up.

Most of us have been in this situation, so we can only strive to do better. For me, the last thing I want to do is debug a very large bash script, so I go for another language (Python, C, etc.)

 

Totally!

That said, I've been reviewing code recently for various CI developers. If I show them how their Python is not clean and there is a better way they're "huh yeah I get it, I'll try to remember that for next tiem". I do the same thing with their shell and they're "yeah, my shell skills are bad haha, anyway, moving on." Whether I press the point or not is moot here -- that initial lack of interest is the core problem... and I do think that "perception" of shell languages is in part a problem... hence for these kinds of people, all-out moving towards other languages, as you recognise, is probably more conducive to clean code...

 

Bash (and sh relatives) is probably the most valuable and yet poorly understood scripting language in existence. This owes largely to confusion about the nature of subshells and ignorance of the 'source' command, as well as ignorance of the builtin regular expression parsing facility and associative arrays. I can't think of a single instance where I would prefer Python over Bash or C or C++, or ...

 

Shell languages are great when used as "glue" logic, it becomes iffy when you start trying to do "business" (read: "non-glue") logic in them...

I did try to base a coding practice / workflow around sourcing, but sourcing is always relative. Function re-usability is important for good practice and reducing repetition etc, but if two re-used files re-source a same file, things can get messy... This problem was actually my primary motivation for creating bash-builder: a set of re-usable "libraries" that could be re-used in building other scripts... without worrying about where to source from, or what the end-point's scripts setup was.

I love the [[ $X =~ regex ]] operation. Use it lots. Pain when the environment is not bash but dash or ash or POSIX ... There are ways around it (install), but sometimes it's not always possible...

Well... I did once write a web server in bash... required very specific versions of netcat and grep, and one mail notification script I wrote wouldn't run the same under the Ubuntu and Fedora servers (difference in mail implementations)

 

if i'm not wrong, all the reasons you stated to not use bash fall with the coder. yes, there are language-based limitations (can't pass arrays, all variables global, etc.), but that's the point, isn't it?

so when the job is done by a good coder, is maintained, and you don't need to do something fancy like pass arrays around around, it works just fine. i know, i have a production system of thousands of IoT's all over the world that uses scripts for all of its small one-off tasks (real work is done in C and docker). they are clean, controlled and upgradeable.

python requires a moving dependency, not a good choice for distributed systems where you might not have complete control of the overall environment, imo.

richard

 

all the reasons you stated to not use bash fall with the coder

Yes, exactly. And it would be SO MUCH BETTER if this situation improved!

so when the job is done by a good coder

Like I said, I've seen utter trash written by otherwise competent coders. I don't know why the discipline goes out the window as soon as they face bash. Probably if I threw Haskell at them they'd try to learn it properly. But shell? It's similar enough to C and JavaScript that they can write stuff, but sufficiently idiosyncratic that they just want to be over with it already.

python requires a moving dependency

However, shell is FULL of moving dependencies. If you're writing shell, you're probably wanting to target *nix systems - OK fine. So you have a CentOS farm, Ubuntu instances, some Dockers with Alpine, and some BSD just to keep some diversity.

Each of these (can) have in varying forms different implementations of mail, grep, rsync, sed. Some use GNU Utils, some rely on BSD Utils in certain situations, some use Busybox, some use other things... I have been deep in that hole before and cried many times "why is this ONE environment different in this ONE way making me write an ENTIRE shell submodule to cope with it???" (and every environment has that one item)

Python as a moving target between 2 and 3 is a piece of cake in comparison.

 

There have been so many times I've started writing a shell script, midway thought that the task would he so much better solved in python, but continued writing shell anyway. I do agree, for most people Python with a shell script calling the python function is going to be a better solution.

 

I have my own rules regarding bash.

Use bash for:

  1. init scripts
  2. scripts that will barely grow in the future
  3. scripts that are less than 50 lines of code

Use python for everything else.

 

Yes in principle however

init scripts

If that script grows arms and legs, you'll have wanted it to be just the glue that calls the more complex parts (in other-lang)

scripts that will barely grow in the future

I'm sure you are seasoned enough to know, this NEVER goes to plan :-D

scripts that are less than 50 lines of code

I would say, less than 20 lines, but I've written ones more than 300+ lines. I just wrote them cleanly, namespaced the heck out of the functions, and re-used code as much as possible (the reason behind my bash-builder project)

I mean yes, I do agree with your points, but life has surprises, of which Inexperienced Colleagues is but one of many...!

 

Yeah, I learned shell scripting, but not all the good encapsulation techniques. Powershell has much in the way of being a better language than bash.

However I concluded much different from you. I decided shell scripting wasn't worth the additional learning and maintaining. 'your language of choice' being the best option. However I can get our IT to make a decent text editor available on every server build, everyone uses notepad so...

 
 

Good in principle, but it's unrelated. Shellcheck will perform syntax linting.

It will not highlight bad style or practices ;-)

But good shout out for linting tools!