Researchers Dump 2 Billion Scraped Discord Messages Online

If you were in a public discord any time over the past decade, you were weigh just chatting with your friends – You was publicipating in a massive sociological experience. According to 404 MediaA Team of Researchers at Federal University of Minas Gerais in Brazil Scraped more than 2 billion discord messages from public servers and published the anonymZed data online. So hopefully you we were very cordial in your messages, because they’re forever now.

The exact tally of all the messages, which was published as part of the research group’s paper “Discord Unveiled: A Comprehensive Dataset of Public Communication (2015 – 2024). About 10% of the Platform’s Open Servers.

The reason they give for publishing the massive dataset of user messages was to give scientists a sizeable sample of human activity that would be used for other research. “Our Dataset Enables Researchers to Explore the impact of digital platforms on political discourses, the propagation of Misinformation, and the Development of Effective Management and Regulation Strategies TALATEGIES TALATEGIES TAILOREDEATION Environments, “the Paper Authors WroteThe paper sugges potential applications of the data like discourse analysis, looking at the relationship between social media and mental health, and training ai chatbots.

There’s almost certainty interesting information in the dataset, as discord’s lax moderation approach Makes it a particular good place to look for the evolution of the very online. But it’s at least a little uncomfortable to know that this data was just scraped will-nilly and published without users knowing or consenting to it.

The Researchers did anonymize the data, which include replacing usernames with randomly generated pseudonyms, hashing and truncating user and message Identifiers, and remantic etc. Features. But that process often is not as effective as one might thinkEspecially when there is the potential to Piece togeether conversions and series of messages, it may be possible for a person to Glean Details that Could Identtify users.

Also, it’s not entryly clear that project is kosher with discord’s oven rules. While The Researchers Argue That The Messages are from Public Groups, 404 media pointed out That discord’s terms of service explicitly states“Do not mine or scrape any data, content, or information available on or through discord services” – A rule that has been in place in place since at least 2020.

If nothing else, the paper is a good reminder to watch what you say. You Never Know Who Might Be Listening (Or, this case, reading it a decade laater).

Leave a Comment