How to Easily Upload a CSV File into Google Colab
What prompted me to write this was I was seeing a lot of confusion amongst data analyst/science learners on how to upload data sets/csv files into Google Colab and then share those notebooks. (Colab is a cloud-based DA/DS app where you can write and execute Python code. It is based on Jupyter Notebook and is hosted in the cloud.)
There are several ways you can upload a csv file into your Google Colab notebook. How you do so will affect how others can interact with your notebook. If you’re just doing research and not sharing your notebook (and don’t need anyone to run the code cells in your notebook), then you don’t need to share your data. But if you are working with someone else on a project they will need easy access to the data set(s) so they can run the notebook cells on their machine. Or, another scenario: if you share your notebook in general and you want others to be able to run the code cells they will need access to your data. I will go over the specific code snippets/steps below.
If you’re not sharing your notebook:
- Do it locally. Use the following code. This code cell, when run will bring up the “Choose Files” window. Then you can select that and upload the files locally from your computer.
2. You can also mount your Google Drive to your Colab notebook with the following code:
The above code will prompt you to follow a link where you get an authorization code. Paste the auth code into the window. And now you can navigate your Drive and add files to your notebook. If you don’t want to type the code, you can grab code snippets from the left pane window here:
I like that one can now easily add code to their notebooks. Thanks, Google!
3. My new favorite — and in my opinion the easiest way — to get a csv file into Colab is by adding it via your GitHub repo. This way anyone can run the code cells in the notebook because the link to the data is coming from your public repository on GitHub. (I am also assuming that you’re using Pandas and not the csv library in Python.)
I have a repo on Github that has a bunch of csv files, or, for specific projects, those csv files are in those repos. (However you want to organize stuff is up to you).
a. Upload the csv file to the repo.
b. Click on the csv file and view it in “raw” format.
c. grab the entire URL.
d. paste the URL as I have above in quotes.
Ex: df = pd.read_csv(“https://raw.githubusercontent.com/yourGHProfile/filepath/filename.csv”)
Boom! You’re done.