Research: Anti-Spam Algorithms Showed Political Bias in the 2020 US Election

The spam filtering algorithms (SFA) of three of the world’s largest email providers exhibited political bias in the 2020 US election, with Google’s Gmail leaning left, and Microsoft’s Outlook and Yahoo Mail favoring right-wing candidate emails.

The paper states:

‘Our [observations] revealed that all SFAs exhibited political bias in the months leading up to the 2020 US election. Gmail leaned left (Democrats) while Outlook and Yahoo leaned right (Republicans). Gmail marked 59.3% more emails from good candidates as spam compared to left candidates, while Outlook and Yahoo marked 20.4% and 14.2% of emails respectively as spam. more left-wing candidates as spam compared to good candidates.

The authors’ analysis, they claim, demonstrates “aggregate biases” in AGS activity.

The document also acknowledges the potential for “cultured” spam tagging, where actors seeking to silence opposition voices could solicit or gain access to official communications from “hostile” parties and affiliations in the intention to report the communication as spam, thereby influencing the algorithms that determine the likelihood of spam originating from a particular sender.

However, the researchers observe, this does not explain the marked variations in how different email providers seem to have configured actions based on end-user feedback:

“It can be said that it is also possible that the SFAs of the email services learned from the choices of some voters marking certain campaign emails as spam and started marking these campaign/similar emails as spam for d other voters. While we have no reason to believe that there were any deliberate attempts by these messaging services to create these biases in order to influence voters, the fact remains that their SFAs learned mark more emails from one political affiliation as spam than the other.

“As these leading messaging services are actively used by a significant portion of the voting population, and many voters today rely on information they see (or don’t see) online, such biases may have a negligible impact on the outcome of an election.’

The paper is titled An Overview of Political Bias in Spam Filtering Algorithms in the 2020 US Electionand comes from four researchers from the Department of Computer Science at North Carolina State University.

Around the houses

The researchers’ study covers a five-month period from July 2020 to the end of November of the same year, during which they created 102 new email addresses on the three messaging platforms and subscribed to two email notifications. -mail for presidential candidates, 78 in the senate and 156 in the chamber. lists.

To account for demographics, email accounts were created with varying demographics for each (fictitious) end user and split into two parts: The first investigated general trends in bias in spam filtering algorithms across all combined messaging services for the President, House and Senate. candidates; and the second examined the ways in which various email interactions (such as end-user marking or unmarking as spam) seemed to impact the behavior of algorithmic spam filters.

Several key observations emerged during the study. The authors report that Gmail “leaned to the left”, while Outlook and Yahoo leaned to the right. Yahoo kept 55.2% of all political emails in the user’s inbox, while Outlook filtered 71.8% of emails from political candidates of all stripes.

“Gmail, however, retained the majority of left-wing candidate emails in its inbox (

“We further observed that the percentage of emails marked by Gmail as spam from right-wing candidates increased steadily as election date approached, while the percentage of emails marked as spam from left remained about the same.”

Choose candidates

While the presidential candidates subscribed to in the study were limited to Joe Biden and Donald Trump, the researchers were careful to make representative choices when considering whether to subscribe to email communications from Senate and presidential candidates. the House, for a number of reasons.

First, states have a varying number of seats in the House, depending on the population of the state. Second, the number of Senate and House candidates from the two major political parties varies from state to state. Additionally, some candidates were represented only by official .gov websites, which are legally prohibited from sending campaign emails; and finally, some of the candidates’ subscription lists were protected by CAPTCHAs, which could not be automated by the researchers’ custom data collection framework.

Political Affiliation Breakdown of Senate and House Candidates’ Email Subscriptions. Source: https://arxiv.org/pdf/2203.16743.pdf

To even out the resulting imbalance between Democratic and Republican candidates, the researchers subscribed to campaign email information of the maximum number of candidates in any state where left- and right-wing candidates were in equal numbers, except in states like Alaska, which had only one Republican Senate. candidate.

In total, the authors had to fairly account for 11 of these states and ultimately ended up with all 50 states represented. 78 of the subscriptions in 36 states represented 44 Democratic and 34 Republican Senate candidate lists, while there were 156 subscriptions in 42 states for House candidates – 81 Democrats and 75 Republicans.

Data analysis

The researchers collected 318,108 emails across the three email services during the study’s active data collection period, which was truncated after November 20 due to the rapid drop in email volume after that. dated. Content of data collected for each email included MIME-Version, Content type, Topic, From, For, Dated, Post ID, Delivered to, Receipt-SPFand Received by.

Due to challenges in fairly representing communications from both political parties, propensity score analysis (PSA) was chosen as the statistical method for the data. PSA generates covariates from unbalanced data that even out distributions in exceptional circumstances where control groups and traditional statistical distributions are not readily applicable.

The authors conclude that the SFAs for the messaging services studied exhibit political bias and that the initial relative consistency between services diverges into rather more specific behavior over time.

Gmail marks a higher percentage (67.6%) of right-wing political emails as spam, compared to just 8.2% of left-wing affiliate emails, but responds more dynamically to user interactions that point to e -mails as spam as its cohorts. Outlook, on the other hand, marks 95.8% of political emails from the left as spam, compared to 75.4% for emails from the right, and Yahoo marks 14.2% more emails from the left as spam. than right emails.

Cumulative breakdown of the percentage of Democratic (blue) and Republican (red) emails marked as spam in each of each service's 22 email accounts.

Cumulative breakdown of the percentage of Democratic (blue) and Republican (red) emails marked as spam in each service’s email accounts.

Further, the results suggest that over the study period, Gmail responds fairly generically to an increased volume of emails across all political affiliations by increasingly marking them as spam, regardless of or their origin. Yahoo consistently marked left-wing emails as spam as campaigns progressed, while decreasing the number of right-wing emails flagged as spam. The outlook seemed least affected by the increased volume of mail from either political party, maintaining a general right-wing bias.

Percentage of emails marked as spam across the two political parties and three email providers during the 153-day study period.

Percentage of emails marked as spam across the two political parties and three email providers during the 153-day study period.

Response to user interaction

When we mark a spam email as “Not Spam”, the intention is to cause the mail system not to flag similar emails in the future, although the rule type under- underlying (email-based, content-based, etc.) may not always be fully cleared.

The results of the study revealed that of the three email providers examined, only Gmail responded noticeably to a user’s “non-spam” input. In contrast, this spam to the user-driven inbox (S→I) interaction had a very limited long-term effect in Outlook and Yahoo.

The researchers observe:

‘[Due] to the S→I interaction, the political bias in Gmail has decreased significantly. However, unexpectedly, it increased in both Outlook and Yahoo because neither service reacted noticeably to the user’s desire to not mark emails as spam that both services marked as spam.

Conclusion

The authors conclude that Gmail responds “significantly” to user interaction compared to Outlook and Yahoo, despite its own left-leaning bias.

The authors state:

“While political bias in Gmail remained unchanged after the read interaction, it significantly decreased due to I→S and S→I interactions.”

And carry on:

“While political biases shifted in response to various interactions, Gmail maintained its left-leaning while Outlook and Yahoo maintained their right-leaning in all scenarios.”

The researchers acknowledge that the end user generally expects that spam filters can and will adapt their behavior based on user intervention (such as moving an email from a folder spam to the inbox or marking an email as “not spam”), but that this mechanism is unreliable and certainly not consistent across the three email providers studied.

The newspaper notes:

‘[We] found no consistent action that users could be recommended to help them reduce bias in how the SFA handles political emails sent to them.

First published April 4, 2022.

Comments are closed.