I've been inspired by Nityesh Agarwal and his Build to Learn community to build a data science project I don't mind showing to the world and documenting the steps in a way that beginners can follow along with. The first iteration of that exists here as a repl on repl.it.
What's a repl? A repl is an interactive coding environment - basically an online IDE. Here is some basic info about repls.
In a recent dev post filled with project ideas, Nityesh suggested analyzing a database of TED talks, which contains metadata and fully transcribed texts of the the speeches.
It was also Nityesh who suggested using Repl.it as a low-activation-energy way to get started in a preset programming environment. No need to worry about instation, dependencies, or different environments.
My idea in doing a "Hello, World" project is to do a minimum viable proof that things are working. Printing the first speech in the database seemed like a reasonable place to start.
That task took me longer than I'd like to admit, but I partly blame that on the being unfamiliar with the environment and the way it displays pandas dataframes. But I'm getting ahead of myself.
The key steps are downloading the file, saving it to a local file (local to the cloud-based IDE, that is), so I don't need to download the data every time I test the code, reading that file back into the program on subsequent runs, then putting it into pandas and printing the relevant parts.
I was going to use the beginners tag, but I realized that this probably isn't material for complete beginners, since it involves the python packages
requests, as well as reading and writing files.
Later, I will write a post for beginners which walks through the code line by line, and explains my process of building it. My hope is that people can use that forthcoming post to get their hands wet with data for the first time.
But, for now I just want to get my first step of this project out to the world, and start what will be a series of posts on this topic.
More to come soon!