In this post I'll wrap up my final project and give some thoughts on the entire project. This was the final project for Data Talks Club's data engineering zoomcamp.
First, the visualization: I used Google's looker studio. I had wanted to make a heat map of Massachusetts, showing the percentage of registered libertarians in every city and town. But I couldn't figure out how to do it, even with the help of ChatGPT. I think if I could devote more time to it, I could figure out how to do this in Google Looker studio, but I had to back off and just create some bar charts. This actually turned out to be better. I sorted the towns in both ascending and descending order of registered libertarians. Then I created a chart of registered republicans and democrats in the same towns. This data was revealing. In the towns that had a large number of registered democrats, there were few registered libertarians. In the towns where registered democrats were a lower percentage of voters and there were more registered republicans, the number of registered libertarians was greater.
For the voter activity table, I created a bar chart that represented state elections between the years 2000 and 2020, with the number of registered voters in different third parties stacked on each bar. I also had a smaller table of the total number of voters enrolled in third parties over the same number of years. What I learned from this data is that the percentage of voters that chose to identify with a third party grew in that time interval, doubling from approximately half a percent to one percent.
The problem with the chart is that the filter I used to remove democrats, republicans, and unenrolled voters keeps disappearing. So I had to include instructions for how to reproduce it in the README.md for the project repository. I tried to find a solution for this, and found that there were many other questions posted about why the filters didn't work. I couldn't figure out the answer to my problem, however, from these questions. So my conclusion is that Google looker studio has a less than stellar user interface. It has many charts and capabilities, but they're hard to use.
Final thoughts on the project
Overall, the data I chose to use for this project prevented me from using some of the tools we learned in class. I couldn't automate the process of extracting data using terraform. Mage was a total no-go. I tried to read my file and it either hung or was going to take more than half an hour to load. I had to edit some of the data by hand, importing about 16 of 351 text files into Excel and outputting them as .csv files. But I was able to use python in jupyter notebooks to unpack the data and read it into a parquet format in Google cloud storage. I could create partitioned tables in BigQuery using sql. I think the most learning I did as part of this project was to use dbt to transform the data for presentation to "stakeholders." I was able to build models, upload them to github, and create a schema.yml file for documentation and testing.
I have to give credit to ChatGPT. I don't know what I would have done without it. ChatGPT helped me solve bugs every step of the way, from authenticating my Google cloud credentials, to writing SQL queries, to debugging dbt. It's a brave new world.
Top comments (0)