Spamley web app: studying the cognitive aspect behind the phishing problem

Spamley web app: studying the cognitive aspect behind the phishing problem

For several years now, the digital revolution has radically changed people’s daily lives, both their private and work ones. Although digitalization has been a growing phenomenon for decades, the coronavirus pandemic has accelerated it even further, making it necessary to transfer various activities to the Internet. The e-mail has further consolidated its position as an essential tool for exchanging information and coordinating business processes. 269 billion emails were sent and received every day in 2017 and this figure is expected to reach almost 362 billion daily emails in 2024 [6].

At the same time, e-mail is the most used channel for making cyber-attacks, exposing companies and people to frequent attempts at security breaches. The email threat landscape is constantly evolving and increasingly difficult to counteract even by carrier-grade spam filters, able to block the majority of typically innocuous messages but ineffective against most dangerous ones.

In recent years, emails that impersonate authoritative sites or companies trying to trick the victim into performing controversial actions, such as, for example, installing malicious software (e.g. ransomware) or issuing a fraudulent credit transfer, have become more and more common. These attacks have attracted the attention of government agencies (e.g., Europol [5], FBI [8]), which have stated that email attacks are increasing in number and malignance, with about 12 billion in financial losses worldwide from incidents reported between 2013 and 2018. According to Google, its Gmail service blocks more than 100 million phishing emails every day and in the last year 18 million were related to COVID-19 [7].

The spectrum of email attacks is varied, ranging from the legacy ones concerning purely technical aspects, still feasible due to SMTP protocol vulnerabilities and configurations [9][10], to the more sophisticated socio-technical methods made possible by modern machine learning and social engineering techniques. The most dangerous ones are “tailored” against specific organizations or groups of people and differ significantly from generalist attacks.

As also emerged from the CONCORDIA project results [1], and according to current literature [2][3][4], it has become clear that there is an urge to understand in detail the cognitive aspect of the phishing phenomenon, to design practical and effective solutions. Examples of such solutions include ergonomic mail clients that help users (especially less experienced ones) to recognize phishing, training programmes that are tailored to the specific cognitive vulnerabilities of each user, automatic phishing detection tools based on cognitive aspects. The human factor is the most vulnerable one and most difficult to ‘secure’.

A project called Spamley has recently been launched, with the aim of mitigate the long-standing problem of phishing. To truly understand this phenomenon, the first and most important step is to observe it in depth. Once this knowledge is achieved, it can be transferred to people, so that they are no more victims, and to automated systems to detect attacks effectively.

The building on this idea has already started. A web application has been launched (https://spamley.comics.unina.it/), which has the objective of collecting data on the behavior of users when they receive an email. The purpose of the project is to correlate the characteristics of people (biographical characteristics, education, personality, etc.) with the characteristics of emails (e.g., sense of urgency, presence of a logo, domain squatting of email address or URL etc.) they consider or not phishing. In this way, it is finally possible to know what is going on at a cognitive level, and therefore to be able to implement personalized and specific solutions. Thanks to this web app it is possible to securely collect user’s data to understand why and when phishing occurs. The intention is to share this dataset with other experts in the scientific community, if requested.

The concrete upcoming goals is to allow this dataset to scale, trying to reach as many people as possible. It is a kind of test, completely anonymous, in which you have to indicate whether the emails that are shown seem phishing or legitimate. The test stars with a short survey necessary to obtain reliable results, then the interesting part begins and takes in total about 5-10 minutes. Please feel free to take the test and spread it among your friends and family. The test can be done in English or in Italian, and it is important to do it from the device you usually use to read the email (pc or smartphone).

Immediately after this data collection, the data will be analyzed in order to infer the connections between the characteristics of the people and those of the emails that produce a security phishing incident. The data analysis part will be driven by modern machine learning methodologies, which are a great tool to extract information that cannot be observed by traditional statistical methods. Both supervised and unsupervised algorithms will be considered: the former, to train classification models and to obtain backward information about the importance of features; the latter to profile users and behavior patterns through clustering techniques.

Finally, the work will continue towards translating the insights extracted from the data into actual guidelines for the design of practical solutions (suggestions on how to train users, new security policies, new graphical interfaces of email clients, etc.).

The project also offers the test to companies and public bodies, to reach more people, providing in return a detailed report on the “state of awareness” of employees, including extensive information and statistics.

The need for greater security awareness is clear and this platform has the goal of facilitating awareness campaigns by providing useful information on the behavioral traits of many different categories of users. Since many users who participated in the test stated that they have never received phishing awareness campaigns, the Spamley Platform has fulfilled the dual role of a tool for the collection of data useful for research and a means to introduce the phenomenon of phishing to many people, all over the world.

In general, defining the way that the human mind operates when reading e-mails could be the key to understanding the best way to protect users from falling into phishing. To this end, this work focused on gathering as much information as possible about the background of users (anonymized and treated as requested by EU privacy regulations), in order for it to support future research on the human factor involved in phishing and for the devising of new techniques for phishing prevention; it is safe to believe that the value that this work provides will increase with time, since the more data it collects, the more it will be possible to make accurate statements on users’ behaviors.

Take your test and get your feedback -> https://spamley.comics.unina.it/

References

[1] Luigi Gallo, Alessandro Maiello, Alessio Botta, Giorgio Ventre, 2 Years in the anti-phishing group of a large company, Computers & Security, Volume 105, 2021, 102259, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2021.102259. (https://www.sciencedirect.com/science/article/pii/S0167404821000833)

[2] Luigi Gallo, Alessio Botta, and Giorgio Ventre. 2019. Identifying threats in a large company’s inbox. In Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks (Big-DAMA ’19). Association for Computing Machinery, New York, NY, USA, 1–7. DOI:https://doi.org/10.1145/3359992.3366637

[3] Van Der Heijden, Amber, and Luca Allodi. “Cognitive triaging of phishing attacks.” 28th USENIX Security Symposium (USENIX Security 19). 2019.

[4] L. Allodi, T. Chotza, E. Panina and N. Zannone, “The Need for New Antiphishing Measures Against Spear-Phishing Attacks,” in IEEE Security & Privacy, vol. 18, no. 2, pp. 23-34, March-April 2020, doi: 10.1109/MSEC.2019.2940952.

[5] EC3, E., 2019. Spear phishing, a law enforcement and cross-industry perspective. https://www.europol.europa.eu/sites/default/files/documents/report_on_phishing_-_a_law_enforcement_perspective.pdf

[6] J. Clement. Number of e-mails per day worldwide 2017-2024. 10 2020. https://www.statista.com/statistics/456500/daily-number-of-e-mails-worldwide/.

[7] Sam Lugani Neil Kumaran. Protecting businesses against cyber threats during covid-19 and beyond. 04 2020. https://cloud.google.com/blog/products/identity-security/protecting-against-cyber-threats-during-covid-19-and-beyond

[8] Federal Bureau of Investigation – Internet Crime Compliant Center 2019 Internet Crime Report https://pdf.ic3.gov/2019_IC3Report.pdf

[9] Chen, Jianjun, Vern Paxson, and Jian Jiang. “Composition kills: A case study of email sender authentication.” 29th USENIX Security Symposium (USENIX Security 20). 2020.

[10] Maroofi, Sourena, Maciej Korczynski, and Andrzej Duda. “From defensive registration to subdomain protection: evaluation of email anti-spoofing schemes for high-profile domains.” Proc. Network Traffic Measurement and Analysis Conference (TMA). 2020.

(By De Lutiis Paolo, Telecom Italia)