DEV Community

Cover image for DragonFly - real time entity sentiment analysis
ilyocoris
ilyocoris

Posted on

DragonFly - real time entity sentiment analysis

Foreword

I despise trading, whether for stock or crypto. Political manipulation I can see it is morally dubious but I do not care so much about. With that out of the way let"s build Dragonfly! A tool to get real-time sentiment analysis on entities (stocks, coins, political parties) from scraped chunks of text.

Overview of My Submission

The Dragonfly buzzes many riverbeds across the world. The Human will marvel at their flight, their colours, and see the creation of a God that gifts us Beauty in unsuspecting creatures.

But is the Dragonfly a Demon. Mid-flight will capture flies to devour while still alive. Their big glossy eyes have a peripheral view off all, that is under the wings. The Dragonfly Sees, the Dragonfly Hunts. Lean and pretty.

This project tries to give an infrastructure for real-time monitoring of websites on any domain where it is valued to know an opinion regarding an entity. In the basic example submitted to this hackathon, the system is built to monitor the sentiment regarding stock companies on subreddits and generate a timeline of events, those are, positive or negative comments on a certain entity.

The focus of the project is flexibility, wrapping all computationally intensive and domain-dependent elements (scrapers for websites or ML) in grpc microservices, that get called by consumers to populate an event driven architecture (some Kafka topics) that ends up dumping curated events to a MongoDB Time Series. This time series can then be the backbone for dashboards or opinion triggers build on top of it.

Architecture overview

The MongoDB Time Series events collection is thought to be a historical archive of heterogenous events (sentiment of twitter/reddit comments on a company/crypto/party and the market fluctuations of the stock/coin or political polls). Both the easy-to-scale nature of the grpc micro-services and the asynchronous capabilities of kafka topics, backed by the consistency of MongoDB, build towards a real-time opinion temperature of an entity in this fast-moving internet, where a subreddit or telegram can x900% a crypto in hours or a twitter influencer can cascade a political opinion into virality.

Streamlit App

In order to provide a small frontend I built a streamlit app, where you can call the api to scrape some of the best known stock-market subreddits and to view the opinion events added to the Mongo Time Series. As I have only had the pipeline ready for a few days and the entity-recognition grpc microservice is very simple (keyword matching for the tickets), not much data is still in the system.

I am sure there is some non-neoliberal use for this stack. Actually no, not that sure...

This was fun to learn some basics of event-driven architectures, still some parts of the structure I had no idea how to build and I am pretty sure they are quite cringe. Next steps involve general improvements, deploying it to build a juicy Time Series™, fine-tune some NER and SA models to different domains and to do entity-conditioned SA.

Submission Category: Prime Time

Link to Code

GitHub logo ilyocoris / dragonfly

Real time sentiment analysis on entities from scraped text.

DragonFly

Real time sentiment analysis on entities from scraped text.

Overview

The vision behind this project is to create an events collection that is thought to be a historical archive of heterogenous events (sentiment of twitter/reddit comments on a company/crypto/party and the market fluctuations of the stock/coin or political polls). Both the easy-to-scale nature of the grpc micro-services and the asynchronous capabilities of kafka topics, backed by the consistency of MongoDB, build towards a real-time opinion temperature of an entity in this fast-moving internet, where a subreddit or telegram can x900% a crypto in hours or a twitter influencer can cascade a political opinion into virality.

Streamlit frontend to showcase the capabilities of the project. On the first block, we can scrape the lasts posts from some well-known stock subreddits. On a more serious setting of this tool, this should be done via scheduled api calls, in oder to get…

Additional Resources / Info

Epic UI for Kafka Topics:

GitHub logo cloudhut / kowl

Kowl is a Web UI for Apache Kafka that allows exploring messages, consumers, configurations and more with a focus on a good UI & UX.

Kowl - A Web UI for Apache Kafka

License Go Report Card GitHub release (latest SemVer) Discord Chat Docker Repository on Quay

Kowl (previously known as Kafka Owl) is a web application that helps you to explore messages in your Apache Kafka cluster and get better insights on what is actually happening in your Kafka cluster in the most comfortable way:

preview

Features

  • Message viewer: Explore your topics' messages in our message viewer through ad-hoc queries and dynamic filters. Find any message you want using JavaScript functions to filter messages. Supported encodings are: JSON, Avro, Protobuf, XML, MessagePack, Text and Binary (hex view). The used enconding (except Protobuf) is recognized automatically.
  • Consumer groups: List all your active consumer groups along with their active group offsets, edit group offsets (by group, topic or partition) or delete a consumer group.
  • Topic overview: Browse through the list of your Kafka topics, check their configuration, space usage, list all consumers who consume a single topic or watch partition details…

Discussion (0)