DEV Community

Cover image for Final project part 2
Cris Crawford
Cris Crawford

Posted on • Edited on

Final project part 2

In my last post, I uploaded the voter database to Google cloud storage. Then, using a jupyter notebook, I copied the file to my Google cloud vm instance and unzipped it. Now I have 351 text files that I will convert into parquet files and put on Google BigQuery.

I needed to figure out how to define the schema of the file before I read it in, because pandas is confused about some of the columns.

I started by going to https://console.cloud.google.com and clicking VM instances. Then I started my VM instance. I copied the new external IP address. I opened terminal and cd'd to .ssh. I edited config and pasted in the new IP address. Then I typed ssh de-zoomcamp and cd'd to notebooks, where I have my jupyter notebooks. In VSCode, I opened a directory on my virtual machine by invoking the command palette (command-shift-p) and selecting de-zoomcamp. I opened the terminal window in VSCode, and in the ports tab, I assigned the port 8888 to localhost:8888. Then I typed jupyter notebook in the terminal shell. In my browser, I opened localhost:8888. I had to cut and paste the token from the terminal window. Then I opened the notebook. This is a pretty simple blow-by-blow of how I start, but I need to have it written out somewhere so I don't forget it.

After messing around with the columns and the schema, I decided it was time to set up a repository for the project. I created a repository on GitHub called "voter-data". Then I set up a directory and put all my files in it on the virtual machine, and tried to connect it to my new repo. Well that was a chore, because I needed to set up an ssh key for my virtual machine to connect to GitHub, and I forgot how to do that. Basically, I had to ask ChatGPT how to do it. Here's what I did:

% ssh-keygen -t rsa -b 4096 -C 'crawford.cris@gmail.com'
% eval '$(ssh-agent -s)'
% ssh-add ~/.ssh/id_rsa
% cat ~/.ssh/id_rsa.pub
Enter fullscreen mode Exit fullscreen mode

Then I copied the public key to my clipboard. I'm not sure what all the arguments were for or what the eval command did, but nothing broke, and it worked.

On GitHub, I clicked on my profile icon in the top-right corner of GitHub and selected "Settings". In the left sidebar, I clicked on "SSH and GPG keys". Then I clicked on the "New SSH key" button. I pasted my SSH public key into the "Key" field. Then I clicked "Add SSH key" to save the key to my GitHub account.

Then I was able to push my changes to the voter-data repository on GitHub. I'll write about putting my files back in Google cloud storage as parquet files in the next post.

Top comments (0)