DEV Community

Cover image for Storytelling in Heterogeneous Twitter Entity Network based on Hierarchical Cluster Routing
Vidushi Gupta
Vidushi Gupta

Posted on

Storytelling in Heterogeneous Twitter Entity Network based on Hierarchical Cluster Routing

Section wise summary of the research paper titled "Storytelling in heterogeneous Twitter entity network based on hierarchical cluster routing" by X. Zhang, Z. Chen, W. Zhong, A. P. Boedihardjo and C. Lu

Abstract

  • Connecting dots between diverse entities is essential to find hidden relationships.
  • Existing approaches find these relationships in news reports, documents and abstracts. These do not work with Twitter data as this is unstructured, heterogenous and has a massive size.
  • This study designs an entity similarity model which combines traditional entity-related features as well.
  • The effectiveness of this new approach has also been evaluated.

Introduction

  • Features of Twitter:
    • Promptness: instant posting
    • Freedom of expression
    • Social properties: adds value to the information
  • Traditional methods to connect the dots are based on the strong assumption that textual data is robust and well-presented. This is not the case with Twitter data.
  • Challenges:
    • Modeling heterogenous features: like location, likes, mentions
    • Handling large data size: hundreds of millions of tweets are generated daily.
  • Key contributions of the study:
    • Presents a novel method to handle the heterogenous features in a bipartite graph and generate an entity similarity graph for story modeling.
    • Story generation algorithm based on hierarchical clustering.
    • Connecting different entities in twitter.
    • Extensive experimental performance evaluation.

Related work

Entity Similarity Modeling

  • Works focus on the analysis of semantic, syntactic and spatio-temporal entity similarity.
  • Entity similarity methods are used for story generation:
    • Hossain: compute entity similarity by combining Soergel Distance and a k-Clique nearest neighbor approach.
    • Shahaf: linear program method to measure words coherence between documents.
    • Dos Santos: ConceptRank method and spatial closeness to infer relationships to entities in Twitter data.
    • Goel: regression model to compute Twitter users’ similarity based on their common followers, pagerank score and historic follow-through rate.
  • These methods do not consider features in combination.

Microblog Summarization

  • Sharifi: Phrase Reinforcement algorithm
  • Inouye: Hybrid TF-IDF and cluster classifier methods
  • Harabagiu: introduced a framework to synthesize multiple microblog posts on the same topic into a prose summary.
  • Takamura: p-median problem for a stream of microblog posts along a timeline.
  • Lidan: an online tweet stream clustering algorithm and TCV-Rank summarization method

All these methods extract semantic meaning from Twitter but not the relationship between entities.

Connecting the dots

  • Hossain: A* searching algorithm to construct a shortest path
  • Dos Santos: connected entities via a greedy approach
  • Zhu: divide-conquer algorithm to append a median node with maximum transition probability -Faloutsos: find connected subgraphs
  • Shahaf: maximizing its weakest edge with a fixed story length

Most of these techniques use the traditional ways of connecting dots and cannot be applied on Twitter heterogenous data.

Overview and Problem Formulation

Problem setting

  • reveal valuable relationships between two entities via a sequence of intermediate entities in Twitter dataset

Definitions:
Entity Similarity Graph

MinEdge

Storytelling in an entity network

Architecture Overview:

System Architecture

  • Data Preprocessing:
    • Query expansion (to get tweets only in a targeted domain)
    • Entity extraction
    • Twitter user mapping
  • Entity Similarity Modeling

Twitter Heterogeneous Information Network

  • Story generation: performs directed exploration toward a desired entity through an entity similarity graph

Entity Similarity model

Entity tweet features:

  • Tweets
  • Hashtags
  • Mentions
  • Additional links

Twitter user features:

  • Common followers
  • Spatial attribution

Experimental results

  • Effectiveness of εCluster against existing state-of-the-art methods in connecting-the-dots tasks is compared.
  • Hierarchy building time increases linearly as the layer number of hierarchies increases from one to one hundred

Evaluation

Reference

X. Zhang, Z. Chen, W. Zhong, A. P. Boedihardjo and C. Lu, "Storytelling in heterogeneous Twitter entity network based on hierarchical cluster routing," 2016 IEEE International Conference on Big Data (Big Data), 2016, pp. 1522-1531, doi: 10.1109/BigData.2016.7840760.

Top comments (0)