Interactions Tracker, Part 3: Why I stopped and Lessons Learned

#interactionstracker #redshift #beginners

I had taken a break from writing posts to assess which projects I really wanted to work on and do some more research into the field of data engineering. Unfortunately, working with the Interactions Tracker didn't really make the cut. This article explains why and shows which directions I'm looking to for project inspiration now.

Topics for Future Posts

Like you've seen, it's been a while since I've posted. I'm still planning to contribute posts on dev.to, and the break from developing posts and this project have been enriching. I have 3 posts planned, each of which I'll write and publish within the next 3 weeks:

Project Design: Data informed Car Purchases: scraping data off the Edmunds car website to answer which car would be the best buy. Unlike the interactions project, the Edmunds data scraping project has a proof of concept ETL demo already working with AWS infrastructure up and running (Redshift, S3, Lambda, IAM, and IAM Center). I'll show how I implemented this entire setup.
A Resource Review: Kaggle: I give a break down how all of the resources on Kaggle could be used for data engineering, data science and data analytics projects.
A Book Review: How to Read a Book: While it certainly was written for a broader audience than developers, How to Read a Book offers a solution to the problems of imposter syndrome and an ever-expanding knowledge base. I show how the principles in the book have been applied to my computer science projects.

As you can see, I still have plans to contribute interesting articles. I know this section seems strange Keep in touch! Subscribe!

Why I Stopped

1. Accessing the Data Source

Every ETL job starts with Extraction. Every extraction needs a source to extract from. Sources hidden by passwords aren't publicly facing. Sources that have personally identifiable information (PII) are trusted only to particular entities. I am not one of those entities that could access that data.

2. Value to Work Ration

The actual value that an interactions tracker adds is fairly minimal: assuming I only made the Minimum Viable Product, all it does is add the ability to check which students aren't being connected to on campus.

There was another project of significantly more increased scope that I considered: developing a events planning tool that would publish events, track timelines, and enable collaboration with University officials, with calendar and mail add-ons. It would add value to the degree that it integrated with other University services (ideally, a poster, email, and push notification would all go out when you pressed publish on the service, and photos taken and associated with events could be attached to event attendance to make developing yearbooks much easier). But at least at the moment, working on this would take up more of my time than I can allow. To the backburner it goes!

What I Learned

The details of modeling data

I learned that when representing the real world through data, I need to think at the lowest possible grain of detail. One problem I encountered in the dataset was mixing up the grains: I had a "event" grain for interactions as grouped by event, and a "interactions" grain, which was one level below that. As you might have seen in the last article, that only confused the issue.

Make simple, complete projects to start

Every project has to start with a bare bones implementation. But in order to make those projects worthwhile, you have to introduce value early on. That's what keeps the motivation going. A project like my database modeling for student interactions is an interesting thought experiment, but does not immediately deliver value. That's why my next project is associated with something I'll need to do anyway: research for buying a car.

Don't be afraid to start

Even though I didn't complete the project, I think I got some insight into data modeling that I would not otherwise have. Obviously none of this learning would happen if I simply read books about the subject; I actually needed to jump in to try and articulate how this database would work. So if I could limit the scope, then from a learning standpoint that makes any of these projects worthwhile.

Conclusion

I hope any other aspiring data engineers can take a look at my mistakes and avoid them, or at the very least get started. I had sat around waiting for the ideal project to fall on my lap. That strategy hadn't really worked. When you're someone like me, searching for jobs without great success, the winning strategy is to experiment. Just beginning helps. It's only after I was tinkering around with this (admittedly poor) project idea that I started to develop other ideas. So get started! I wish you all the best on your development journey.

DEV Community