DEV Community

Jesse vB
Jesse vB

Posted on • Updated on

Ruby - Convert CSV File to Two Dimensional Array


Ruby comes O.O.T.B. with a module to read and parse CSV files

two_dimensional_array ='/path/to/file.csv')
Enter fullscreen mode Exit fullscreen mode

This article will cover the basics of working with CSVs in Ruby. I will be operating on a MacOS linux-like file system with a ZSH terminal shell, but I'm sure Windows users can benefit as well!

What is a CSV File?

Popular applications like Excel and Numbers can read and write to pure CSV, but technically their default extensions are .xlxs and .numbers.

CSV means 'comma separated values'. A pure .csv file is really just a string with values separated by commas and newlines. The commas separate the columns, and the newlines separate the rows.

Do you want to see what CSV data looks like?

Navigate to a directory in your terminal where you have a pure CSV file saved.

$ pwd
$ ls
Enter fullscreen mode Exit fullscreen mode

Then use the cat command in the terminal with the file name as the argument, and you will see what a pure CSV really is!

$ cat contacts.csv
ID,First Name,Last Name,Age,Gender
5,Marc ,Stockton,64,M
18,Joan ,Traxler,82,F
Enter fullscreen mode Exit fullscreen mode

Notice how entries #5 and #18 have spaces after the first name. That's because spaces were accidentally left in the file.

So there it is. CSVs are just values, commas, and newlines.

The Ruby CSV Module

Ruby ships with two libraries, the Core and the Std-lib (Standard Library). The Core contains the classes that make up the Ruby language, stuff like Stings, Arrays, Classes, Integers, Files, etc. That's because everything in Ruby is an object that ultimately inherits from BasicObject.

$ irb
> Array.class
 => Class
> Array.class.superclass
 => Module
> Array.class.superclass.superclass
 => Object
> Array.class.superclass.superclass.superclass
 => BasicObject
> Array.class.superclass.superclass.superclass.superclass
 => nil
Enter fullscreen mode Exit fullscreen mode

Since the Core is the core of Ruby, everything is included whenever you are coding in Ruby.

The Std-lib contains extensions to Ruby. They are modules that need to be required, just like gems, only they are already installed on your computer (unless you deleted them of course). They are worth checking out and contain some really cool and helpful modules.

You can inspect all the code by navigating to where they are stored.

Open up an IRB session and type the global variable $:, it will return an array of paths which Ruby searches for modules in when they are required. Your paths might be different especially if you don't use RVM.

$ irb
> $:
Enter fullscreen mode Exit fullscreen mode

That neat little variable helps me remember where they are located. The second to last path is where the Std-lib resides.

$ pwd
$ ls
English.rb        expect.rb         open-uri.rb       ripper
abbrev.rb         fiddle            open3.rb          ripper.rb
arm64-darwin20    fiddle.rb         openssl           rubygems
base64.rb         fileutils.rb      openssl.rb        rubygems.rb
benchmark         find.rb           optionparser.rb   securerandom.rb
benchmark.rb      forwardable       optparse          set
bigdecimal        forwardable.rb    optparse.rb       set.rb
bigdecimal.rb     getoptlong.rb     ostruct.rb        shellwords.rb
bundler           io                pathname.rb       singleton.rb
bundler.rb        ipaddr.rb         pp.rb             socket.rb
cgi               irb               prettyprint.rb    syslog
cgi.rb            irb.rb            prime.rb          tempfile.rb
coverage.rb       json              pstore.rb         time.rb
csv               json.rb           psych             timeout.rb
csv.rb            kconv.rb          psych.rb          tmpdir.rb
date.rb           logger            racc              tracer.rb
debug.rb          logger.rb         racc.rb           tsort.rb
delegate.rb       matrix            rdoc              un.rb
did_you_mean      matrix.rb         rdoc.rb           unicode_normalize
did_you_mean.rb   mkmf.rb           readline.rb       uri
digest            monitor.rb        reline            uri.rb
digest.rb         mutex_m.rb        reline.rb         weakref.rb
drb               net               resolv-replace.rb yaml
drb.rb            objspace.rb       resolv.rb         yaml.rb
erb.rb            observer.rb       rinda
Enter fullscreen mode Exit fullscreen mode

Since they are plain .rb files, you can open them up to see their inner workings. You can even modify them, although don't do it unless you know what you're doing. 😃

As was mentioned, each module in the Std-lib needs to be required. So if you want to use the CSV class, make sure you require 'csv'.

# Otherwise you'll get this:
(irb):15:in `<main>': uninitialized constant CSV (NameError)
Enter fullscreen mode Exit fullscreen mode
# Don't stress, just do this:
> require 'csv'
 => true
 => CSV
> CSV.class
 => Class
Enter fullscreen mode Exit fullscreen mode

It's always a great idea to hit up CSV.methods.sort to reference all its capabilities.

Using the CSV Module to Read and Parse CSVs

There are two main methods for reading and parsing CSVs, #read and #parse! Use #read to read an actual file, and #parse to parse a properly formatted string. Let's compare the two.

$ irb
> require 'csv'
 => true
> my_csv_string = "this,is,a,csv\ncan,you,believe,it?"
 => "this,is,a,csv\ncan,you,believe,it?"
> parsed_data = CSV.parse(my_csv_string)
 => [["this", "is", "a", "csv"], ["can", "you", "believe", "it?"]]
Enter fullscreen mode Exit fullscreen mode

There it is! A two dimensional array from a CSV!
Just make sure when you want to escape a newline character, you use double quotes.

CSV#parse has two parameters, a string to parse, and a hash of options. Maybe for some odd reason we want to parse a CSV string with that's separated by semicolons... so an SSV? We can pass the col_sep option in like so.

> CSV.parse("this;is;an;ssv\ncan;you;believe;it?", col_sep: ';')
 => [["this", "is", "an", "ssv"], ["can", "you", "believe", "it?"]]
Enter fullscreen mode Exit fullscreen mode

The CSV#parse method can parse an actual file, but you have to open the file first. For instance, CSV.parse('path/to/file.csv')). Thankfully, this is what CSV#read is for!

Extracting Data from CSV Files

I created a simple CSV shown in this screenshot:

contact.csv image

Now let's find the path so we can use Ruby to extract those values with CSV#read!

$ pwd 
$ ls
Enter fullscreen mode Exit fullscreen mode
$ irb
> require 'csv'
 => true
# Make sure you remember the first forward slash in your path
> contacts_csv ='/Users/jvon1904/csv/contacts.csv')
[["ID", "First Name", "Last Name", "Age", "Gender"],
> contacts_csv
[["ID", "First Name", "Last Name", "Age", "Gender"],
 ["1", "Victoria", "Waite", "38", "F"],
 ["2", "Jamar", "Hayes", "37", "M"],
 ["3", "Leonard", "Brendle", "39", "M"],
 ["4", "Abby", "Atchison", "57", "F"],
 ["5", "Marc ", "Stockton", "64", "M"],
 ["6", "Geraldine", "Roybal", "52", "F"],
 ["7", "James", "Coles", "57", "M"],
 ["8", "Hiram", "Spellman", "58", "M"],
 ["9", "Bradford", "Vela", "41", "M"],
 ["10", "William", "Haskell", "74", "M"],
 ["11", "Christopher", "Mason", "70", "M"],
 ["12", "Thomas", "Atkinson", "68", "M"],
 ["13", "Peggy", "Underwood", "37", "F"],
 ["14", "Charles", "Wilson", "66", "M"],
 ["15", "Joanne", "Sanchez", "42", "F"],
 ["16", "Leo", "Sanders", "58", "M"],
 ["17", "Robert", "Castillo", "39", "M"],
 ["18", "Joan ", "Traxler", "82", "F"],
 ["19", "Dana", "Pitts", "78", "F"],
 ["20", "Susan", "Dupont", "34", "F"]]
Enter fullscreen mode Exit fullscreen mode

Great! With this data, you now have the power to create class instances with each row, or save them to a database, or whatever you want! In a future article I will write about just that. For now, here's some ideas of how you can play around with this.

# getting a record is easy now
> contacts_csv.last
 => ["20", "Susan", "Dupont", "34", "F"]

# retrieve all female contacts
> { |row| row[4] == 'F' }
[["1", "Victoria", "Waite", "38", "F"],
 ["4", "Abby", "Atchison", "57", "F"],
 ["6", "Geraldine", "Roybal", "52", "F"],
 ["13", "Peggy", "Underwood", "37", "F"],
 ["15", "Joanne", "Sanchez", "42", "F"],
 ["18", "Joan ", "Traxler", "82", "F"],
 ["19", "Dana", "Pitts", "78", "F"],
 ["20", "Susan", "Dupont", "34", "F"]]

#retrieve the first names of contacts under 40
>{ |row| row[3].to_i < 40 }.map{ |row| row[1] }
 => ["First Name", "Victoria", "Jamar", "Leonard", "Peggy", "Robert", "Susan"]
Enter fullscreen mode Exit fullscreen mode

Oops! See how we got the "First Name" there? That's a header, so it shouldn't be part of the records. There's a way to get around this, but instead of getting an array back, we'll get a CSV::Table class. Let's check it out!

# we just need to pass in the headers option
> parsed_data ='/Users/jvon1904/csv/contacts.csv', headers:
 => #<CSV::Table mode:col_or_row row_count:21>
> parsed_data.class
 => CSV::Table
Enter fullscreen mode Exit fullscreen mode

Be aware the every time you pass in that header: true option, it will return a CSV::Table.
We can access indices the same was as arrays.

# only it will return a CSV::Row class now
> parsed_data[0]
 => #<CSV::Row "ID":"1" "First Name":"Victoria" "Last Name":"Waite" "Age":"38" "Gender":"F">
> parsed_data[4][16]
 => "M"
> parsed_data[6].to_h
 "First Name"=>"James",
 "Last Name"=>"Coles",
Enter fullscreen mode Exit fullscreen mode

We can access columns by using the #by_col method.

> parsed_data.by_col[2]

# use the bang sign `!` to change the orientation of the table
> parsed_data.by_col!
 => #<CSV::Table mode:col row_count:21>
# now switch it back
> parsed_data.by_row!
 => #<CSV::Table mode:row row_count:21>
> parsed_data[14]["First Name"]
 => "Joanne"
Enter fullscreen mode Exit fullscreen mode

Two more things. Let's see if we can change the format of the integers into floats, so they behave more like currency, and then write the file back to CSV.

> parsed_data.each do |row|
>   row["Age"] = row["Age"].to_f
> end
 => #<CSV::Table mode:row row_count:21>
> parsed_data.by_col[3]
Enter fullscreen mode Exit fullscreen mode

Now we'll write to a new file. For this we'll use the CSV#open method with two arguments, the path, and a 'w' for 'write'.

>'ruby_made_csv.csv', 'w') do |file|
# we start by pushing the headers into the file
>   file << parsed_data.headers
# next we'll push each line in one by one
>   parsed_data.each do |row|
>     file << row
>   end
> end
 => #<CSV::Table mode:col_or_row row_count:21>
# you can execute shell commands by using back-ticks! 😎
> `ls`
 => "contacts.csv\nruby_made_csv.csv\n"
# there they are!
Enter fullscreen mode Exit fullscreen mode

Hopefully this has given you a sample of all you can do with CSVs in Ruby!

To learn how to persist this data to Postgres, read my article here.

Discussion (0)