Representing Emoji Usage Using Directed Networks: A Twitter Case Study
Abstract
In online social media, people use emojis to reduce the ambiguity of short texts and to express their feelings in a more clear way. Some text messages contain more than one emoji, and this brings the idea that the sequence of emojis may have useful information that can help us better understand user behavior. One method to analyze the sequence of emojis is to study a directed network of emojis that emerges from the actual sequence for many users. In this paper, in addition to extract a simple undirected co-occurrence network and analyze its corresponding main statistical properties, we build and analyze a directed co-occurrence network from various datasets collected from Twitter.
The results show that the distributions in directed network are not random and follow a truncated power-law distribution. Furthermore, the important emojis for each dataset are conceptually related to the subject of the dataset. Via community analysis, we show that most of the emojis tend to be grouped in the top 4 largest communities. Last, the category-based entropy analysis of communities suggests that regardless of theme, the entropy is somewhat constant across different thematic datasets. This proposes that emojis are not used together just because they are from the same category.