Been a while, but here's a handy one I discovered over the weekend.
Back to the trusty xargs
, that rather blunt and brutish chainsaw for processing a long list of files or whatever that someone gave you.
In this case, I had a list of files that I knew with 99%+ certainty were created on our old server and thus encoded in iso-8859-1
, contained characters that were represented differently in utf-8
(which we had switched to on our new server) and needed converting, and a handy script wrapper around iconv
to do one file at a time.
All 41,000 of them. The list took four hours to generate, during which time I was pondering the fact that I really should have taken advantage of the fact that, usefully, the new server has 40 cores of Xeon goodness. So we ought to be able to parallel process this list now we've got it, right? And ideally without bothering with GNU Parallel or Perl's Parallel::ForkManager
?
Turns out we can!
xargs -P <n>
(if supported on your OS) runs the commands generated by xargs
in n-way parallel.
So:
cat <list of 41K files> | xargs -n 1 -P 100 <iconv wrapper>
We need the -n 1
as the wrapper only takes one file at a time, and this is how we tell xargs
that. Deep breath. Hit RETURN.
Whoosh. Load on server briefly rockets to 45, then falls just as fast to its steady 1 and a bit. In about one minute flat, for all 41,000 files.
Not bad.
Top comments (0)