TLDR;
Using one Ruby method, you can take each row of a CSV file and insert all values into a relational database.
If you want to learn more about CSVs and parsing them using Ruby, please check out my previous article here: Ruby - Convert CSV File to Two Dimensional Array.
Pre-reqs
I'm using MacOS and ZSH for my shell. For our database, we will use PostgreSQL and access it through the 'pg' gem. If this sounds complicated, don't worry. This article will walk you through step by step!
After reading this, you will possess the skill to persist Ruby objects to databases without the help of frameworks! No Rails here, just Ruby. Cool right?! Let's get started.
Getting Started
First you need to make sure you have Postgres and the 'pg' gem installed.
Postgres is a free relational database system that has a great reputation. Relational databases are basically tables with rows and columns. Tables can relate to each other in the database.
The 'pg' gem is how Ruby communicates with Postgres. It's just a module of ruby code that gives us an API to interface with the database. Rails applications that use Postgres also use the 'pg' gem, but you can use it in a plain ruby file and even IRB.
First, check if you have the 'pg' gem.
$ gem info pg
*** LOCAL GEMS ***
pg (1.3.4)
Authors: Michael Granger, Lars Kanis
Homepage: https://github.com/ged/ruby-pg
License: BSD-2-Clause
Installed at: /Users/jvon1904/.rvm/gems/ruby-3.0.3
Pg is the Ruby interface to the PostgreSQL RDBMS
If you didn't get that response, run gem i pg
, then run gem info pg
to make sure you've got it. Don't worry if you're version is different than mine. Next, check on Postgres.
$ psql --version
psql (PostgreSQL) 14.2
If you didn't get that response, you need to install Postgres. You can do so from the website, or you can run brew install postgres
. Now you have total control over Postgres from your terminal. Just run psql
any time and you have a complete CLI to execute any database queries and maintenance. You should also recieve a desktop app, to show your databases. If you double click on one, it will open up a terminal session in psql
.
Creating a Database Using PSQL
Knowing a little SQL will be helpful here but it's not required. Open a terminal session and run psql
.
$ psql
psql (14.2)
Type "help" for help.
jvon1904=#
# most commands start with backslash, \h, \q, \d
First, let's create a Database called 'csv'.
jvon1904=# CREATE DATABASE csv;
CREATE DATABASE
# don't forget the semicolon!
If you mess up, you can always DROP DATABASE <name>;
. Be careful though, because this can't be undone.
Next we need to connect to the database so we can make a table.
# the '\c' command means 'connect'
jvon1904=# \c csv;
You are now connected to database "csv" as user "jvon1904".
csv=#
# notice the prompt changed to your database name.
Now that we're connected, let's make a table.
# I like to name my tables in the plural
# Creating a table requires us to specify each column name and type, there are lot's of additional options as well.
csv=# CREATE TABLE contacts(
id INT PRIMARY KEY NOT NULL,
first_name TEXT,
last_name TEXT,
age INT,
gender CHAR(1)
);
CREATE TABLE
Now you can use the \d
command to 'describe' the table. This lets you see your table schema from the terminal.
csv=# \d contacts
Table "public.contacts"
Column | Type | Collation | Nullable | Default
------------+--------------+-----------+----------+---------
id | integer | | not null |
first_name | text | | |
last_name | text | | |
age | integer | | |
gender | character(1) | | |
Indexes:
"contacts_pkey" PRIMARY KEY, btree (id)
Now that we have a table set up, let's add some data to test it out. Run a SQL INSERT INTO
statement.
csv=# INSERT INTO contacts (id, first_name, last_name, age, gender)
csv-# VALUES (298, 'Rudolph', 'Smith', 50, 'M');
INSERT 0 1
Now run a SQL SELECT
statement to see your data.
csv=# SELECT * FROM contacts;
id | first_name | last_name | age | gender
-----+------------+-----------+-----+--------
298 | Rudolph | Smith | 50 | M
(1 row)
You can try inserting in bad data to make sure your constraints and data types are working properly.
# char(1) should only accept one character.
# so what happens when I try to insert 'xx'?
csv=# INSERT INTO contacts (id, first_name, last_name, age, gender)
VALUES (298, 'Rudolph', 'Smith', 50, 'XX');
ERROR: value too long for type character(1)
# or an entry with a duplicate id
csv=# INSERT INTO contacts (id, first_name, last_name, age, gender)
VALUES (298, 'Rudolphina', 'Smith', 50, 'F');
ERROR: duplicate key value violates unique constraint "contacts_pkey"
DETAIL: Key (id)=(298) already exists.
Awesome! Looks like our new d.b. table is working just fine! And you made this with no UI. Great job!
Connecting to Postgres in Ruby
Pop open an IRB session and access the 'pg' gem by typing require 'pg'
. Any time you are using a new gem, it's a great idea to check out the documentation. Often times there are guides to help beginners. PG Readme.
Connecting to a local Postgres database in Ruby is extremely easy. Just use the #connect
method.
> require 'pg'
=> true
> connection = PG.connect( dbname: 'csv' )
=> #<PG::Connection:0x000000013401bbe0>
Alright! We've got an instance of the PG::Connection class. Now you can do all the SQL you want using Ruby. Just use the #exec
method. In fact, let's see if Rudolph Smith is still in there.
> res = connection.exec("SELECT * FROM contacts")
=> #<PG::Result:0x0000000143b54818 status=PGRES_TUPLES_OK ntuples=1 nfiel...
> res[0]
=>
{"id"=>"298",
"first_name"=>"Rudolph",
"last_name"=>"Smith",
"age"=>"50",
"gender"=>"M"}
# yep, he's still there
# notice we can use bracket notation on the PG::Result class
# and this is how we access a specific field
> res[0]['first_name']
=> "Rudolph"
Sadly, Rudolph has to go, in order to make room for our CSV friends.
> connection.exec("DELETE FROM contacts WHERE first_name = 'Rudolph'"
)
=> #<PG::Result:0x0000000143ad4eb0 status=PGRES_COMMAND_OK ntuples=0 nfields=0 cmd_tuples=1>
As a quick challenge, open up another terminal tab, and check if any data is there now using psql
. Do you remember how?
CSV File => Ruby => Database
My previous article here talks all about CSVs and Ruby. I won't go through the details again, so make sure you're familiar with this topic.
I'm in the directory where I have a CSV file saved. You can make your own, just make sure the columns in the the file match the database fields.
# you can type shell command with backticks
> `pwd`
=> "/Users/jvon1904/csv\n"
You'll need to require Ruby's CSV module with is part of the Ruby Standard Library (Std-lib). Then we'll use the #foreach
method. This will go through the CSV file line by line and insert each into the database.
> require 'csv'
=> true
> require 'pg'
=> true
> connection = PG.connect( dbname: 'csv')
=> #<PG::Connection:0x000000010d81f840>
> CSV.foreach('ruby_made_csv.csv', headers: true) do |row|
> connection.exec("INSERT INTO contacts (id, first_name,
last_name, age, gender)
VALUES (#{row[0]}, '#{row[1]}', '#{row[2]}', #{row[3]},
'#{row[4]}')
")
> end
Now, let's head back to PSQL and see if all the data is there.
psql
psql (14.2)
Type "help" for help.
jvon1904=# \c csv
You are now connected to database "csv" as user "jvon1904".
csv=# SELECT * FROM contacts;
id | first_name | last_name | age | gender
----+-------------+-----------+-----+--------
1 | Victoria | Waite | 38 | F
2 | Jamar | Hayes | 37 | M
3 | Leonard | Brendle | 39 | M
4 | Abby | Atchison | 57 | F
5 | Marc | Stockton | 64 | M
6 | Geraldine | Roybal | 52 | F
7 | James | Coles | 57 | M
8 | Hiram | Spellman | 58 | M
9 | Bradford | Vela | 41 | M
10 | William | Haskell | 74 | M
11 | Christopher | Mason | 70 | M
12 | Thomas | Atkinson | 68 | M
13 | Peggy | Underwood | 37 | F
14 | Charles | Wilson | 66 | M
15 | Joanne | Sanchez | 42 | F
16 | Leo | Sanders | 58 | M
17 | Robert | Castillo | 39 | M
18 | Joan | Traxler | 82 | F
19 | Dana | Pitts | 78 | F
20 | Susan | Dupont | 34 | F
(20 rows)
We did it! Everything in the CSV file has been parsed by Ruby and saved to a relational database.
Top comments (0)