US elections 2020: a retrospective analysis of the Twitter corpus

US elections 2020: a retrospective analysis of the Twitter corpus

Social media plays a crucial role, especially during elections’ period either for communication, administration and dissemination. Twitter constitutes one of the most popular social media with millions of active users, while a significant part of the online discourse is part of this network. The analysis of the content can shed light into the discussions and the main topics that prevail in the online discourse. However, the reliability of the content circulated is under dispute, if we consider that an important percentage of users posting are bots working towards specific malicious goals. Twitter has been misused to circulate propaganda [1, 2], manipulate public opinion [3], or influence the electorate towards a specific ideology or party [4]. Towards these purposes bots are used either to reproduce fast, low credibility content using user mentions and popular users, to spread fake news, rumors, or hate speech.

The nature of the discourse in social media demands advanced natural language processing techniques, as well as volume and sentiment analysis. On the other hand, the identification of bots in Twitter is a complicated task that can be performed with a diverse set of methods. Several architectures have been used for bot identification with the most successful using machine learning techniques. In order to perform such tasks, initially we obtain the main corpus regarding the US election from Twitter based on the most popular hashtags like #Trump2020, #Vote, #Biden, #Election2020 etc. The dataset acquisition from 19 July 2020 until 3 November 2020 resulted in a dataset of 7.5M tweets and 1.4Musers, as well as 22.784 related YouTube links, contained in the Twitter dataset, likes, dislikes and comments of the videos.

Initially, we cure the data through a process of removing common terms and characters that can challenge the process of processing and recognize the main entities in the corpus, such as ‘Trump’ and ‘Biden’. The next step includes the usage of state-of-the-art techniques in order to delve into the main topics and sentiment towards the entities recognized.

The results of sentiment analysis indicate that 35,2% express positive sentiment towards Trump in Twitter and 28% positive sentiment towards Biden, while 18% of users express positive sentiment in YouTube metadata gathered towards Trump and 12% positive sentiment towards Biden. Additionally, we perform bot analysis on the same dataset and we manage to achieve classification of 99.7 train accuracy and 93,1 test accuracy, while in a subsample of a month dataset, we managed to identify 4569 bot accounts out of 86927 accounts in total.

The figure shows the number of daily tweets and video comments as they increase in number of comments from July to November, with diurnal patters. Also, there is a peak on September 29 potentially explained by the first debate and of course the second peak for Twitter on the day of the elections. The average sentiment for the entity ‘Biden’ and ‘Trump’. The solid line is the average 280 sentiment in YouTube comments and the dotted line is the average sentiment in the 281 corpus of the tweets. Below zero we have the negative sentiment and above zero the 282 positive sentiment for each social media.


[1] L Neudert, B Kollanyi, and PN Howard. 2017. Junk news and bots during the German parliamentary election: What are German voters sharing over twitter? Technical Report. 1–6 pages.
[2] Marc Owen Jones. 2019. The gulf information war— propaganda, fake news, and fake trends: The weaponization of twitter bots in the gulf crisis. International journal of communication 13 (2019), 27.
[3] Gillian Bolsover and Philip Howard. 2019. Chinese computational propaganda: automation, algorithms and the manipulation of information about Chinese politics on Twitter and Weibo. Information, communication & society 22, 14 (2019), 2063–2080
[4] Yevgeniy Golovchenko, Cody Buntain, Gregory Eady, Megan A Brown, and Joshua A Tucker. 2020. Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube During the 2016 US Presidential Election. The International Journal of Press/Politics (2020), 1940161220912682.

(By Despoina Antonakaki, FORTH)