DEV Community

Mark Vonk
Mark Vonk

Posted on

Clickbait Identifier

My Final Year Project

In my final year for the minor Data Engineering, together with a group of other students, we were asked to build a solution to classify a dataset so a machine learning algorithm could be built with the results.
This project 'Bursting the Bubble' was provided by ACED (Institute for Art & Journalism).

The team mainly consisted of:

  • Roel, Back-End developer
  • Robin, Web and Mobile developer
  • Mark, Full-stack developer

With the project 'Bursting the Bubble' ACED aims to research the impact of clickbait on the online news scene. Are articles intended to inform, attract attention or manipulate?
ACED started a project to crawl all popular Dutch news sites to collect as many articles as possible. With this data they hope to be able to determine which articles are or are not clickbait.

This is where we came in. We had to come up with a solution to start labelling the article titles with 'clickbait' or 'not clickbait'. The big problem was that this labelling this data definitely needed human input, since clickbait is quite a controversial topic.
We started brainstorming and almost immediately came up with the idea to create some sort of a 'swiper'. We have seen swipers applied in a few apps where a binary action is needed, and we were pretty sure everyone is familiar with the concept.

We ended up building a 'clickbait swiper' web app where people could swipe articles for us anonymously, but linked to an id so we are sure they don't swipe the same article twice. This way we are also sure that all the user input

is unique and the data set has some value.
A pinch of gamification was also added to see if we could engage the users a bit and make them swipe just a bit more. We did this by adding a score of how many articles you swiped and some milestones the user could reach. This really helped improve the average swipe count per user.
In the we also presented the collected data in a dashboard to get a glimpse of how the dataset was taking shape. This also helped us tweak the algorithm which made sure the users swiped articles that had not been swiped as much yet.

Demo Links

Swiper
Dashboard

Link to Code

GitHub logo acedinstitute / swiper-app

Application setup for the swiper classifier app

Swiper app



GitHub logo acedinstitute / swiper-dashboard

Dashboard to create insight into the swiper results

Swiper dashboard

GitHub logo acedinstitute / swipingApi

A api the have control of all the swipes

swipingApi

A api the have control of all the swipes\

Requierments

For this project you need python 3. The packages are:

  • Flask
  • flask_pymongo
  • mongoengine
  • flask_cors

How to run the project

In the root of the folder type on your command line python start.py. The project will auto change on refresh. The project will run on http://127.0.0.1:5000.

Docker

In the console go to the root of the project. And type docker build -t (name of your folder):latest . docker will build the image at this point. The server isnt running yet. After the building is done you type: docker run -p 5000:5000 (name of your folder):latest. This will run the server and start everything. the server will run on port 5000.

How we built it

Front-end

For the front-end React was mainly used to set up the base of the app, and Redux was used to handle all the api-calls and data flow. In the end this was a bit overkill but it's always nice to get more comfortable with Redux.
We also used the framer-motion library to build the main 'swipe' functionality.
For the dashboard pretty much the same stack was used. Apex charts was used for the bar chart and line graphs.

Back-end

For the backend we made the decision to write it in Python. We had chose Python because initially we thought that we were going to also implement a machine learning framework into our API. So it would make sense to also use Python for that because a lot of ML platforms are for Python. With Python we decided to use Flask to make or program a REST API. Flask is a micro service which means that it doesn't cost a lot to run it and that it was easy to learn and setup.

For our database we used MongoDB. The reason why we chose for MongoDB is because we wanted to have a flexible database that we could easily change along the way. So the choice for a no-sql option made sense, and we were also already familiar with MongoDB.

One of the things we found out later is that it can be quite a problem to deploy a Flask API on a server. We ran into quite a few problems, and we really didn't know what we were doing. After some research we found out that we had to use nginx and gunicorn to set up flask in his own environment on the server (check out this lifesaving tutotial).

Additional Thoughts / Feelings / Stories

We ended up learning a lot from this project as web developers, both on the development side of things and the more practical side of things. Like researching the definition of clickbait and finding out it is a more controversial topic than we thought.
We enjoyed talking to the stake-holders and there was a lot of enthusiasm from everyone during the entire project. Unfortunately we never got to the point where we were able to test the labelled dataset with a machine learning algorithm, but this can be a nice project for a new group of students!

Top comments (2)

Collapse
 
dioioio profile image
dioio-io

Hello, interesting article and useful idea!
Just altering that the first demo link is down

Collapse
 
markjhvonk profile image
Mark Vonk

Thanks for the comment, I fixed the link :)