Varying Linguistic Purposes of Emoji in (Twitter) Context
Abstract
Early research into emoji in textual communication has focused largely on high-frequency usages and ambiguity of interpretations. Investigation of a wide range of emoji usage shows these glyphs serving at least two very different purposes: as content and function words, or as multimodal affective markers. Identifying where an emoji is replacing textual content allows NLP tools the possibility of parsing them as any other word or phrase. Recognizing the import of non-content emoji can be a a significant part of understanding a message as well.
We report on an annotation task on English Twitter data with the goal of classifying emoji uses by these categories, and on the effectiveness of a classifier trained on these annotations. We find that it is possible to train a classifier to tell the difference between those emoji used as linguistic content words and those used as paralinguistic or affective multimodal markers even with a small amount of training data, but that accurate sub-classification of these multimodal emoji into specific classes like attitude, topic, or gesture will require more data and more feature engineering.