Automatic Construction of an Emoji Sentiment Lexicon
Abstract
Emojis have been frequently used to express users’ sentiments, emotions, and feelings in text-based communication. To facilitate sentiment analysis of users’ posts, an emoji sentiment lexicon with positive, neutral, and negative scores has been recently constructed using manually labeled tweets. However, the number of emojis listed in the lexicon is smaller than that of currently existing emojis, and expanding the lexicon manually requires time and effort to reconstruct the labeled dataset.
This paper presents a simple and efficient method for automatically constructing an emoji sentiment lexicon with arbitrary sentiment categories. The proposed method extracts sentiment words from WordNet-Affect and calculates the cooccurrence frequency between the sentiment words and each emoji. Based on the ratio of the number of occurrences of each emoji among the sentiment categories, each emoji is assigned a multidimensional vector whose elements indicate the strength of the corresponding sentiment. In experiments conducted on a collection of tweets, we show a high correlation between the conventional lexicon and our lexicon for three sentiment categories. We also show the results for a new lexicon constructed with additional sentiment categories.