Dave Cross

Posted on Jun 23 • Originally published at perlhacks.com on Jun 23

Bowing to the inevitable

#books #datamungingwithperl #updates #writing

Data Munging with Perl was published in February 2001. That was over 23 years ago. It’s even 10 years since Manning took the book out of print and the rights to the content reverted to me. Over that time, I’ve been to a lot of Perl conferences and met a lot of people who have bought and read the book. Many of them have been kind enough to say nice things about how useful they have found it. And many of those readers have followed up by asking if there would ever be a second edition.

My answer has always been the same. It’s a lot of effort to publish a book. The Perl book market (over the last ten years, at least) is pretty much dead. So I really didn’t think the amount of time I would need to invest in updating the book would be worth it for the number of sales I would get.

But times change.

You may have heard of Perl School. It’s a small publishing brand that I’ve been using to publish Perl ebooks for a few years. You may have even read the interview that brian d foy did with me for perl.com a few years ago about Perl School and the future of Perl publishing. In it, I talk a lot about how much easier (and, therefore, cheaper) it is to publish books when you’re just publishing ebook versions. I end the interview by inviting anyone to come to me with proposals for Perl School books, but brian is one of only two people who have ever taken me up on that invitation.

In fact, I haven’t really written enough Perl School books myself. There are only two – Perl Taster and The Best of Perl Hacks.

A month or so ago, brian was passing through London and we caught up over dinner. Of course, Perl books was one of the things we discussed and brian asked if I was ever going to write a second edition of Data Munging with Perl. I was about to launch into my standard denial when he reminded me that I had already extracted the text from the book into a series of Markdown files which would be an excellent place to start from. He also pointed out that most of the text was still relevant – it was just the Perl that would need to be updated.

I thought about that conversation over the next week or so and I’ve come to the conclusion that he was right. It’s actually not going to be that difficult to get a new edition out.

I think he was a little wrong though. I think there are a few more areas that need some work to bring the book up to date.

Perl itself has changed a lot since 2001. Version 5.6.0 was released while I was using the book – so I was mostly targeting 5.005 (that was the point at which the Perl version scheme was changed). I was using “-w” and bareword filehandles. It would be great to have a version that contains “use warnings” and uses lexical filehandles. There are dozens of other new Perl features that have been introduced in the last twenty years.
There are many new and better CPAN modules. I feel slightly embarrassed that the current edition contains examples that use Date::Manip and Date::Calc. I’d love to replace those with DateTime and Time::Piece. Similarly, I’d like to expand the section on DBI, so it also covers DBIx::Class. There’s a lot of room for improvement in this area.
And then there’s the way that the world of computing has changed. The current edition talks about HTTP “becoming ubiquitous” – which was an accurate prediction, but rather dates the book. There are discussions on things like FTP and NFS – stuff I haven’t used for years. And there are new things that the book doesn’t cover at all – file formats like YAML and JSON, for example.

The more I thought about it, the more I realised that I’d really like to see this book. I think the current version is still useful and contains good advice. But I don’t want to share it with many people because I worry that they would pick up an out-of-date idea of what constitutes best practices in Perl programming.

So that has now become my plan. Over the next couple of months, I’ll be digging through the existing book and changing it into something that I’m still proud to see people reading. I don’t want to predict when it will be ready, but I’d hope to have it released in the autumn.

I’d be interested to hear what you think about this plan. Have you read the book? Are there parts of it that you would like to see updated? What new syntax should I use? What new CPAN modules are essential?

Let me know what you think.

The post Bowing to the inevitable first appeared on Perl Hacks.

Top comments (5)

Bruce Van Allen • Jun 24

Dave, your book was a major help as I took on more and more research and data analysis projects - many thanks! Also, I always enjoy your posts in the Perl community outlets.
Things to add or expand coverage of:

Agree with @brunocontreras_b37739d245: Formats such as JSON, YAML or Markdown, and I would add XML;
Lately I've been getting some good use from refaliasing in newer Perls; it would be great to have some prospective coverage of what is becoming possible with that (assuming it survives past experimental).
In my work, data comes to me in lots of forms, and my clients need to get results in various forms. So a good deal of my own code is devoted to importing from text files with a variety of encodings, Excel (old and new) and corresponding open source docs, assorted databases, and exports from proprietary systems. Then I need to provide my results in a wide variety of formats - Excel, DBs, PDFs, HTML, Word/RTF, GIS & other mapping formats. So some coverage of interfacing with non-Perl resources would be great, and of course CPAN has many more tools for this stuff than when you first wrote the book.
Configs, $ENV, dotenv, etc ways to construct versatile setups with minimal re-writing each time;
Approaches to Mocking - for example, when I deal with voter data, I can do my own testing with real voter records, but I can't publish those tests or provide output examples because it would amount to publishing specific personal info about thousands of voters.
Maybe collect a few links to online databases that may be queried with complex searches for people learning DBI/DBIx;
PDL, interfaces with Wolfram, Perl manipulation of other languages

OK, that's probably enough. Happy to expand on these or look at drafts when the time comes.

Best wishes for this project, and thanks again for all I've learned from you!

John Poole • Jun 24

I have not read your book, but I am familiar with Manning and respect their publications, I have several of their books from the early 2000s which still occupy space on my limited shelves. Perl changed my life and I completely missed the fact that it has fallen out of vogue. (I was a developer at Oracle for 24 years dealing with native XML documentation and obtaining efficiencies in translation.) But as I look around today, I see lemming-link criticism of Perl by those who obviously have never coded in Perl.

I now have a pursuit dealing with ADS-B data of aircraft which involves millions of data packets per day, so "big data" is coming to my mind and how best to deal with it. I've found awk to be a wonderful resource to isolate what I want from my captured packets. I've been using Perl to then juggle my data, e.g. populating postGIS tables or performing spatial queries, and expect to heavily use it for visuals: maps with overlays and possibly some 3D deliverables. I share where I am coming from to let you know that as bigger challenges await us, Perl still is a very quick way to accomplish tasks and remains my #1 choice as a tool to manage/munge large data sets. (I do recognize Rust's efficiency for specialized intensive tasks such as converting text data strings into epoch time: 22 minutes in a bash script vs. 1 second in a compiled rust program.) Anything you can do to generate interest or educate will be a service to the cause and presumably you have the luxury of not letting profit drive your decision process. Like old world craftsman who become fewer and fewer these days. Perhaps the lemming trend will redirect their focus back to the essentials -- as they seem to be doing with YAML.

Go for it!

brunocontreras • Jun 24

I hadn't read the book but had a quick look online; my suggestions would be:
1) I think Perl is strongest at this kind of task, so I think an updated book is great way of showing people out there the value of the language in this context.
2) Formats such as JSON, YAML or Markdown need to be included.
3) One liners are great for these tasks as they can be easily combined with standard Linux tols.

Hope this helps.

Boyd Duffee • Jun 29

One of the messier topics will be dealing with other languages' expectations. Our workplace mixes Node with Perl (and others), with the JSON True/False values being one of the hangups.

DragosTrif • Jun 23 • Edited

I would love a new edition of this book. What I would add are problem sets in order make a student sweet a little and offer him the chance to practice.

DEV Community

Bowing to the inevitable

Top comments (5)

Read next

Let's Read - Eloquent Ruby - Ch 11

How AI Is Changing Fiction Writing

Let's Read - Eloquent Ruby - Ch 19

Book: Learning JavaScript Design Patterns