Emojis have quickly become a universal language that is used by worldwide users, for everyday tasks, across language barriers, and in different apps and platforms. The prevalence of emojis has quickly attracted great attentions from various research communities such as natural language processing, Web mining, ubiquitous computing, and human-computer interaction, as well as other disciplines including social science, arts, psychology, and linguistics.
This talk summarizes the recent efforts made by my research group and our collaborators on analyzing large-scale emoji data. The usage of emojis by worldwide users presents interesting commonality as well as divergence. In our analysis of emoji usage by millions of smartphone users in 212 countries, we show that the different preferences and usage of emojis provide rich signals for understanding the cultural differences of Internet users, which correlate with the Hofstede’s cultural dimensions.
Emojis play different roles when used alongside text. Through jointly learning the embeddings and topological structures of words and emojis, we reveal that emojis present both complementary and supplementary relations to words. Based on the structural properties of emojis in the semantic spaces, we are able to untangle several factors behind the popularity of emojis.
This talk also highlights the utility of emojis. In general, emojis have been used by Internet users as text supplements to describe objects and situations, express sentiments, or express humor and sarcasm; they are also used as communication tools to attract attention, adjust tones, or establish personal relationships. The benefit of using emojis goes beyond these intentions. In particular, we show that including emojis in the description of an issue report on GitHub results in the issue being responded to by more users and resolved sooner.
Large-scale emoji data can also be utilized by AI systems to improve the quality of Web mining services. In particular, a smart machine learning system can infer the latent topics, sentiments, and even demographic information of users based on how they use emojis online. Our analysis reveals a considerable difference between female and male users of emojis, which is big enough for a machine learning algorithm to accurately predict the gender of a user. In Web services that are customized for gender groups, gender inference models built upon emojis can complement those based on text or behavioral traces with fewer privacy concerns.
Emojis can be also used as an instrument to bridge Web mining tasks across language barriers, especially to transfer sentiment knowledge from a language with rich training labels (e.g., English) to languages that have been difficult for advanced natural language processing tasks. Through this bridge, developers of AI systems and Web services are able to reduce the inequality in the quality of services received by the international users that has been caused by the imbalance of available human annotations in different languages.
In general, emojis have evolved from visual ideograms to a brand-new world language in the era of AI and a new Web. The popularity, roles, and utility of emojis have all gone beyond people’s original intentions, which have created a huge opportunity for future research that calls for joint efforts from multiple disciplines.