The tweets have been collected using the GetOldTweets-python fork that includes emoji support. The Python script bypass some limitations of Twitter Official API like accessing old tweets and requests limit.
The scraping speed is around 3.7 million tweets per hour when running the script in parallel. Specifically, one instance of the script has been used for each day and for each emoji.
In terms of accuracy, the scrapper miss some tweets and missclassify the language of some tweets in other languages as English. However, the data extracted provide good insights in terms of the emoji frequency.
- Total tweets: 3,015,922,953
- Dataset size: 798GB
- Tweets scrapped per hour: 3.7 million (aprox)