Online hate speech mirrors language in Cluster B personality disorder forums

A new study published in PLOS Digital Health suggests that hate speech and misinformation on Reddit may reflect language patterns similar to those found in online communities centered around certain psychiatric disorders, particularly those involving Cluster B personality traits. By using large language models and a mathematical technique called topological data analysis, researchers mapped how online discourse in hate speech and misinformation communities aligns with language used in forums for conditions like borderline, antisocial, and narcissistic personality disorders.

The explosion of online communication, especially on platforms like Reddit, has raised concerns about the spread of hate speech and misinformation. These types of content have been linked to real-world consequences, including increased prejudice, public health confusion, and even violence.

Prior work has connected toxic online behavior to personality features often discussed under the “Dark Triad” — narcissism, Machiavellianism, and psychopathy. Yet this is only part of the clinical terrain. In psychiatry’s diagnostic manuals, a set of conditions known as Cluster B personality disorders (antisocial, borderline, histrionic, and narcissistic) are grouped together because they tend to involve heightened emotionality, impulsive or risk-taking behavior, and persistent difficulties in relationships.

People who meet criteria for these disorders may show patterns such as intense and rapidly shifting emotions, a strong need for admiration or attention, disregard for rules or others’ rights, manipulative or hostile interactions, and fluctuating self-image. These features can manifest as antagonistic or combative language, quick escalation during conflict, and a tendency to frame interactions in terms of status, betrayal, or threat.

Researchers have had less success establishing how hate speech and misinformation might relate to psychiatric conditions more broadly, especially across the diagnostic spectrum in the DSM-5. A central question is whether distinctive linguistic patterns in toxic content align with features commonly associated with Cluster B—such as hostility, impulsivity, or unstable self-concept—or whether they reflect a more general psychological profile that cuts across diagnoses.

One challenge is the massive scale of online posts, which makes it difficult to analyze patterns across communities. Another difficulty lies in identifying whether speech patterns linked to hate or misinformation reflect a general psychological profile or traits more consistent with specific mental health conditions.

But advances in artificial intelligence, particularly large language models like GPT-3, have created new opportunities to address these challenges. By extracting high-dimensional representations—called embeddings—of written text, researchers can now study the underlying structure and similarity of speech patterns across different types of content.

“I had noticed that there was a certain antisocial component to hate speech, and had become curious if there would be a connection between hate speech and Cluster B personality disorders,” said study author Andrew Alexander, a 2nd year psychiatry resident at University of Alabama at Birmingham Medical Center.

“When I came up with the idea of analyzing posts from these communities with large language models, I thought about misinformation as well and was curious, so I added misinformation communities to the analysis as well. I also expanded beyond solely Cluster B disorders and included more psychiatric disorders for the same reason.”

To explore this, the researchers turned to Reddit, a platform organized into thousands of thematic forums known as subreddits. Because Reddit hosts a wide range of communities—including support groups for mental health, forums focused on conspiracies or misinformation, and communities known for controversial or hateful speech—it offered a rich dataset for comparing linguistic features across different types of discourse.

The research team selected 54 subreddits for their analysis, dividing them into four categories: hate speech, misinformation, psychiatric disorder forums, and control groups. Communities flagged as spreading hate or misinformation were identified based on prior academic literature or Reddit’s own moderation decisions. Psychiatric disorder forums were included if they self-identified as such and contained user posts that reflected those conditions. Control communities were those unrelated to mental health, hate, or misinformation.

From each subreddit, the team gathered 1,000 posts. Two kinds of data representations were then created. First, “individual user post” embeddings were generated, based on single Reddit posts. Second, “distilled embeddings” were constructed by combining multiple posts from a community to reflect its overall tone and content. The researchers used GPT-3’s embedding model to translate each post into a 1,536-dimensional vector representing its semantic content.

After constructing these embeddings, the researchers applied a method called zero-shot classification. This technique involves training a machine learning model on one type of data and then asking it to classify new data from a category it has not seen before. For this study, models were trained on embeddings from psychiatric disorder and control communities, and then used to classify embeddings from hate speech and misinformation communities. The goal was to see whether the speech patterns in hate or misinformation forums more closely resembled those found in psychiatric disorder communities or in neutral control forums.

In addition to this classification approach, the team used topological data analysis to visualize relationships between different types of communities. This method allowed them to create a map of the embedding space, showing how closely different clusters—such as hate speech, misinformation, or depression forums—were related based on the underlying structure of their language.

The classification results suggested that the language in hate speech communities was more similar to that of psychiatric disorder communities than to control groups. Nearly 80% of the hate speech embeddings were classified as resembling psychiatric disorder communities, with the strongest associations found for Cluster B personality disorders: antisocial, borderline, and narcissistic personality disorder. Another frequent match was Schizoid Personality Disorder, a Cluster A condition characterized by social detachment.

While hate speech posts often matched these disorder-associated embeddings, misinformation posts were more mixed. About a quarter of the misinformation embeddings were classified as resembling psychiatric disorder communities, with some resemblance to anxiety-related conditions. The rest were more closely aligned with control communities, suggesting that the linguistic markers in pure misinformation posts are not as tightly linked to psychiatric disorder forums as those found in hate speech communities.

Topological data analysis reinforced these findings. On the map generated from the embeddings, psychiatric disorder forums clustered in a region that was directly connected to hate speech embeddings. Control communities, in contrast, were generally located in more distant or isolated regions. Interestingly, some misinformation communities also showed a bridge-like connection to the psychiatric disorder cluster, especially those involving COVID-related conspiracy theories, which have been associated with anxiety in prior studies.

Further analysis of the spatial layout revealed a gradient of similarity. Communities with the most overlap with hate speech—such as those focused on narcissistic and schizoid personality disorders—were positioned closest to the hate speech region. Others, like depression or post-traumatic stress disorder forums, were located further away. When the researchers re-ran their classification models while removing narcissistic and schizoid communities from the training data, hate speech embeddings were redistributed to other Cluster B disorders like borderline and antisocial personality disorder, as well as Complex Post-Traumatic Stress Disorder.

This finding suggests that hate speech shares a broad linguistic resemblance to communities characterized by emotional dysregulation, impulsivity, or hostility, rather than to just one specific disorder. Meanwhile, PTSD forums did not appear to share the same proximity to hate speech in the map, suggesting that Complex PTSD and classic PTSD may differ in meaningful linguistic ways—possibly supporting arguments that CPTSD is more closely aligned with personality disorders than previously thought.

Misinformation communities showed weaker and more variable connections. The clearest pattern emerged for anxiety-related content, especially in forums focused on COVID-19 misinformation. However, these links were less consistent across the dataset, and most misinformation embeddings clustered closer to control forums on the topological map.

“I think the most important takeaways are that there seems to be similarities in speech between hate speech and communities for certain psychiatric disorders, though it is not possible at the moment to conclusively determine why,” Alexander told PsyPost. “It could be that people with these disorders are more likely to engage in hate speech, or it is possible that these communities foster a lack of empathy that is reflected by speech similar to those seen in these disorders. For misinformation, I think the big takeaway is that there seems to be a slight similarity seen with the speech patterns in anxiety disorder communities, but psychiatric disorders likely are not the driving force behind misinformation.”

However, the researchers emphasized that their findings should not be interpreted as implying that individuals with psychiatric disorders are more likely to spread hate speech or misinformation. Instead, the results suggest that certain linguistic features present in hate speech resemble those found in online communities where users self-identify as having particular mental health conditions. These associations do not establish causality and should not be used to stigmatize people with mental illness.

One limitation of the study lies in its reliance on self-identified communities rather than clinically verified diagnoses. It remains uncertain whether the users in these psychiatric forums actually meet diagnostic criteria for the conditions discussed. Additionally, the study focused only on Reddit, which may not generalize to other social platforms with different user demographics or moderation practices.

“It should be emphasized that this is based on social media community posts, not directly confirmed diagnoses. This means that this study would need to be followed up with one evaluating individuals with confirmed diagnoses to be more certain of the results, though I believe that these results are a fairly good indicator nonetheless.”

Future research could benefit from incorporating more communities across platforms, expanding the range of mental health conditions studied, and collecting clinical data when possible. A more detailed understanding of how hate speech and misinformation evolve in different online environments could also help inform efforts to counteract their spread.

The study, “Topological data mapping of online hate speech, misinformation, and general mental health: A large language model based study,” was authored by Andrew William Alexander and Hongbin Wang.

Online hate speech mirrors language in Cluster B personality disorder forums

Tags: