DEV Community

Cover image for How to Upload a File to Google Colab.
Kinyungu Denis
Kinyungu Denis

Posted on

How to Upload a File to Google Colab.

To my dear readers, today I discovered Google Colab, a tool that can be very handy working with huge datasets for example In my case datasets larger than 10 gigabytes are huge and I would not like my computer fan overworking. No required prerequisite for this article, just basic knowledge about computers and working in the internet.

What is Google Colab?

Google Colab is a tool allows you to write and execute Python in your browser, with zero configuration required to access to GPUs free of charge and provides easy sharing of your code.
Colab is essentially the Google Suite version of a Jupyter Notebook.

Google Colab can be used by a student, an Artificial Intelligence Researcher, Machine Learning Engineer, Data Scientist, Data Engineer.

You need access to good internet and go to your favorite browser, (Brave is my favorite browser) type google colab and click on the first link.

Google Colab Search

Google colab is easy to use, you are able to write your python code, run it, share with others, easier installation of packages and sharing of documents. However, when one wants to upload a file or folder to google colab, it is quite a hustle.

To Upload a File or a Folder to Google Colab

Mostly people do download CSV file, upload into the Google Colab, read/load the data frame. After a while one needs to repeat everything again because the data was not stored there anymore. This article solves this issue.

In this article, I will show you how to use PyDrive to read a file in CSV format directly from your Google Drive using Python3 in the Google Colab environment.

First Step: Install PyDrive

The first step is to install PyDrive in our colab.

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Enter fullscreen mode Exit fullscreen mode

Since we are in colab environment our pip will have exclamation (!) at the beginning as it is the set standard.

To install PyDrive

Step Two: Authenticate and Authorize.

We need to authenticate and create a PyDrive client.

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Enter fullscreen mode Exit fullscreen mode

Running Authentication for our PyDrive

When you learn the above code, it will prompt you to allow to give permission for Google Colab to access your Drive click allow and proceed to allow Google colab to access your drive.

Prompt for Permission

Step Three: generate a shareable link

Once you have completed verification, go to Google Drive

  • find your file and click on it;
  • click on the “share” button;
  • generate a shareable link “get link”

The link will be copied into your clipboard and paste this link into a string variable in Colab.

Step four: Getting the file id

Do not share your link with others, to avoid unauthorized users from accessing your file. The link below is just for demonstration to help you understand the file id that one needs.

##https://drive.google.com/file/d/25XVhnRJvieQMAEC9TfrWBNG6ERmtU7X/view?usp=sharing


your_file = drive.CreateFile({'id':'25XVhnRJvieQMAEC9TfrWBNG6ERmtU7X'})

Enter fullscreen mode Exit fullscreen mode

You assign the id to a variable your_file, use drive.CreateFile({'id' : 'id_value'})

Step Five: To load the file and show results.

I was uploading a csv file, so let's see if our process is success by loading the csv file and giving an output.

Indicate the name of the CSV file you want to load into memory.

your_file.GetContentFile('matches.csv')

Enter fullscreen mode Exit fullscreen mode

I use Pandas to turn this into a Data Frame and display its header. I use import pyforest, a package that avails a lot of python packages for me including pandas.

import pyforest 

df = pd.read_csv('matches.csv', delimiter=';' )

df.head()

Enter fullscreen mode Exit fullscreen mode

File uploaded successfully to Google Colab

As you can see in our picture above the csv file was uploaded successfully and we were able to operate on the data using pandas.

Now you know how to upload files, folders into your Google colab. This saves you the need to do everything locally in your machine, you are able to work comfortably with huge datasets.

We are still learning data engineering together. Reading the article to Install Apache PySpark in Ubuntu, you can read it here. Installing PySpark in our Local environment was indeed involving.

In Google Colab, I only have to run the following the following command to install PySpark and py4j library

!pip install pyspark==3.3.0 py4j==0.10.9.5

Enter fullscreen mode Exit fullscreen mode

Then move on to using Apache PySpark in my work. To learn about Apache pySpark, read it here

This was a short comprehensive article to solve a challenge, I faced and solved. Feel free to leave your comments and suggestions.

Top comments (0)