In my final year for the minor Data Engineering, together with a group of other students, we were asked to build a solution to classify a dataset so a machine learning algorithm could be built with the results.
This project 'Bursting the Bubble' was provided by ACED (Institute for Art & Journalism).
The team mainly consisted of:
With the project 'Bursting the Bubble' ACED aims to research the impact of clickbait on the online news scene. Are articles intended to inform, attract attention or manipulate?
ACED started a project to crawl all popular Dutch news sites to collect as many articles as possible. With this data they hope to be able to determine which articles are or are not clickbait.
This is where we came in. We had to come up with a solution to start labelling the article titles with 'clickbait' or 'not clickbait'. The big problem was that this labelling this data definitely needed human input, since clickbait is quite a controversial topic.
We started brainstorming and almost immediately came up with the idea to create some sort of a 'swiper'. We have seen swipers applied in a few apps where a binary action is needed, and we were pretty sure everyone is familiar with the concept.
We ended up building a 'clickbait swiper' web app where people could swipe articles for us anonymously, but linked to an id so we are sure they don't swipe the same article twice. This way we are also sure that all the user input
is unique and the data set has some value.
A pinch of gamification was also added to see if we could engage the users a bit and make them swipe just a bit more. We did this by adding a score of how many articles you swiped and some milestones the user could reach. This really helped improve the average swipe count per user.
In the we also presented the collected data in a dashboard to get a glimpse of how the dataset was taking shape. This also helped us tweak the algorithm which made sure the users swiped articles that had not been swiped as much yet.
A api the have control of all the swipes\
For this project you need python 3. The packages are:
How to run the project
In the root of the folder type on your command line
python start.py. The project will auto change on refresh. The project will run on
In the console go to the root of the project. And type
docker build -t (name of your folder):latest . docker will build the image at this point. The server isnt running yet. After the building is done you type:
docker run -p 5000:5000 (name of your folder):latest. This will run the server and start everything. the server will run on port 5000.
For the front-end React was mainly used to set up the base of the app, and Redux was used to handle all the api-calls and data flow. In the end this was a bit overkill but it's always nice to get more comfortable with Redux.
We also used the framer-motion library to build the main 'swipe' functionality.
For the dashboard pretty much the same stack was used. Apex charts was used for the bar chart and line graphs.
For the backend we made the decision to write it in Python. We had chose Python because initially we thought that we were going to also implement a machine learning framework into our API. So it would make sense to also use Python for that because a lot of ML platforms are for Python. With Python we decided to use Flask to make or program a REST API. Flask is a micro service which means that it doesn't cost a lot to run it and that it was easy to learn and setup.
For our database we used MongoDB. The reason why we chose for MongoDB is because we wanted to have a flexible database that we could easily change along the way. So the choice for a no-sql option made sense, and we were also already familiar with MongoDB.
One of the things we found out later is that it can be quite a problem to deploy a Flask API on a server. We ran into quite a few problems, and we really didn't know what we were doing. After some research we found out that we had to use nginx and gunicorn to set up flask in his own environment on the server (check out this lifesaving tutotial).
We ended up learning a lot from this project as web developers, both on the development side of things and the more practical side of things. Like researching the definition of clickbait and finding out it is a more controversial topic than we thought.
We enjoyed talking to the stake-holders and there was a lot of enthusiasm from everyone during the entire project. Unfortunately we never got to the point where we were able to test the labelled dataset with a machine learning algorithm, but this can be a nice project for a new group of students!