DEV Community

ShellPipe.py | A Hacky Remedy to Overkill Shell Scripting

TaiKedz on December 09, 2020

In my previous post, I had a bit of a rant about people not learning the idiosyncracies of the language that is bash, and more generally those of s...

Read full post

Thomas Kluyver • Feb 3 '21

This is neat, but I wonder if allowing bare strings for commands is an attractive nuisance, especially with the convenience of f-strings. If filenames may have spaces in, f'cat {path}' is going to need the same kind of quoting or escaping as cat $path in bash, whereas ['cat', path] avoids that issue.

I'm trying to think about how you could make it a proper pipeline, i.e. running commands in parallel, without sacrificing convenience for simple cases.

TaiKedz • Feb 3 '21

If you do any sort of shell scripting, it would simply be bad practice to not quote a variable. Unfortunately, I've seen plenty of native shell scripts where variables have been unquoted - invariably because of the author not quite knowing how shell variable substitution works.

That said, I didn't point it out in the write-up here, but this is also valid

sh() | ['cat', path] | "grep mystring"

(there's just the one example with ['du', '-sh'] in my post which I didn't call out....)

As for the parallelization, I did start looking at threading at one point, along with generators, but I never really got much momentum on it, as other concerns drew my attention away after the first working implementation...

Thomas Kluyver • Feb 3 '21

I think the tricky bit for running them in parallel is when do you stop and wait for the processes to complete? If you have sh() | a | b, the operator before a doesn't know if that's the end of the pipeline (so it should wait for a), or there's going to be another pipe (so it shouldn't).

I think this is the same sort of thing xtofl was talking about below - if the pipeline is lazy, you need some kind of marker in the code to tell it to run everything to completion and check for errors.

TaiKedz • Feb 4 '21

In principle, if you have mypipe = sh() | "A" | "B"

A's __or__() gets called telling it explicitly to build a new pipe with incoming B

B then gets created, but its own __or__() is not called. From its point of view, it just exists, and is stored in mypipe

Implementing a wait() on that, which would probably accumulate until end of stream, could be an idea. Actually, now that I think about it, it does seem a little more complex than that.... hmmm.....

Ben Sinclair • Dec 9 '20

Me looking at this thinking it's pseudocode:

OP: What ShellPipe does is define its ofn or() function

Me: oh hell yes

TaiKedz • Dec 10 '20

Ben Sinclair • Dec 10 '20

Oh wait you're another Edinburgh dev! waves from a social distance.

TaiKedz • Dec 10 '20

👋😷

xtofl • Dec 10 '20

Lovely!

Maybe the pipeline processes could be constructed by the or-chain, and 'ignited' by a sentinel.

sh() | "ls" | "sort" | go()

TaiKedz • Dec 11 '20 • Edited

Indeed but if the goal is to "just run" that one command it becomes something like

sh() | "docker-compose down" | go()

Which works, but is a bit meh... and like I said, getting < was low priority given that you could in fact just use an actual file handle.

One thing that is annoying is that Popen() expects a proper file descriptor in stidn - passing in a io.StringIO object fails with MethodNotFoundError on fileno()

I've overloaded OR here, I've overloaded __gt__ and ~~XOR~~ __ge__ to write respectively the process's stdout and stderr to the console, or even a file, it might be time for me to stop... ;-)

Nathan Rapport • May 5 '21

social distancing intensifies

Mišo • Dec 14 '20

I know a very similar project in Python called Plumbum. Maybe you can inspire there how some issues were solved by the author(s).