Python – How to import data into google colab from google drive

google-colaboratoryjupyter-notebookpython

I have some data files uploaded on my google drive.
I want to import those files into google colab.

The REST API method and PyDrive method show how to create a new file and upload it on drive and colab. Using that, I am unable to figure out how to read the data files already present on my drive in my python code.

I am a total newbie to this. Can someone help me out?

Best Answer

(Update April 15 2018: The gspread is frequently being updated, so to ensure stable workflow I specify the version)

For spreadsheet file, the basic idea is using packages gspread and pandas to read spreadsheets in Drive and convert them to pandas dataframe format.

In the Colab notebook:

#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0


#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user()  # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default()) 

Then I know 3 ways to read Google spreadsheets.

By file name:

spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()

By url:

 spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
    sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
    df2 = pd.DataFrame(sheet.get_all_records())
    df2.head()

By file key/ID:

spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()

I shared the code above in a Colab notebook: https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing

Source: https://github.com/burnash/gspread