Reading from Zip archives in Ruby

#ruby

The war isn't going anywhere for now, so every couple of days I have to do the following steps to update Russian losses tracker:

download zip from Kaggle
unzip it with unall utility
run update_csv script
verify that data looks right with git diff, as occasionally there's a typo which makes losses go backwards (always corrected on next update)
delete archive folder

The part that I'd like to get rid of is unzipping. We know exactly the path inside the zip, so why bother?

Old script

The script is very straightforward, the methods updated_equipment and updated_personnel will need replacing to get data from a zip.

#!/usr/bin/env ruby

require "pathname"

class UpdateCSV
  def initialize(archive_path)
    @archive_path = Pathname(archive_path)
  end

  def updated_equipment
    @updated_equipment ||= (@archive_path + "russia_losses_equipment.csv").read
  end

  def updated_personnel
    @updated_personnel ||= (@archive_path + "russia_losses_personnel.csv").read
  end

  def csv_files
    @csv_files ||= `git ls`.lines.map(&:chomp).grep(/\.csv\z/)
  end

  def call
    csv_files.each do |path|
      case path
      when /russia_losses_equipment/
        Pathname(path).write(updated_equipment)
      when /russia_losses_personnel/
        Pathname(path).write(updated_personnel)
      else
        puts "Unknown CSV file: #{path}"
      end
    end
  end
end

unless ARGV[0]
  STDERR.puts "Usage: #{$0} path_to_updated_archive"
  exit 1
end

UpdateCSV.new(ARGV[0]).call

Gem `rubyzip` vs `zip`

There's a bit of a problem with Zip for Ruby, as there's two gems - rubyzip and zip. rubyzip is the correct one, zip is an obsolete fork that somehow got a better name.

To use either of them, you need to do require "zip" - which will use whichever one is installed. This is a leftover mess from early days of RubyGems, and such things don't really happen anymore. Just don't gem install zip, and you'll be good.

Abstract archive handling

There's going to be some shared code between updated_equipment and updated_personenel, so let's move it into a new method read_file:

  def updated_equipment
    @updated_equipment ||= read_file("russia_losses_equipment.csv")
  end

  def updated_personnel
    @updated_personnel ||= read_file("russia_losses_personnel.csv")
  end

Read file from either directory or archive

And now the read_file method:

  def read_file(path)
    if @archive_path.directory?
      (@archive_path + path).read
    else
      Zip::File.open(@archive_path).read(path)
    end
  end

There are a lot more things we could be doing with zips, but just reading is all we need.

And that's it! It saves me one of the steps.

Full code

Here's the full script:

#!/usr/bin/env ruby

require "pathname"
require "zip"

class UpdateCSV
  def initialize(archive_path)
    @archive_path = Pathname(archive_path)
  end

  def read_file(path)
    if @archive_path.directory?
      (@archive_path + path).read
    else
      Zip::File.open(@archive_path).read(path)
    end
  end

  def updated_equipment
    @updated_equipment ||= read_file("russia_losses_equipment.csv")
  end

  def updated_personnel
    @updated_personnel ||= read_file("russia_losses_personnel.csv")
  end

  def csv_files
    @csv_files ||= `git ls`.lines.map(&:chomp).grep(/\.csv\z/)
  end

  def call
    csv_files.each do |path|
      case path
      when /russia_losses_equipment/
        Pathname(path).write(updated_equipment)
      when /russia_losses_personnel/
        Pathname(path).write(updated_personnel)
      else
        puts "Unknown CSV file: #{path}"
      end
    end
  end
end

unless ARGV[0]
  STDERR.puts "Usage: #{$0} path_to_updated_archive"
  exit 1
end

UpdateCSV.new(ARGV[0]).call

DEV Community

Reading from Zip archives in Ruby

Old script

Gem `rubyzip` vs `zip`

Abstract archive handling

Read file from either directory or archive

Full code

Top comments (0)

Old script

Gem rubyzip vs zip

Abstract archive handling

Read file from either directory or archive

Full code

Gem `rubyzip` vs `zip`