Datameer Blog post

Trump vs. Clinton: The Surprising Results I Discovered From Data

by on Feb 23, 2018

In an election, personal opinions matter to a great extent. When we make voting decisions, we often look to the viewpoints of others. These public viewpoints influence us one way or another.

Increasingly, these opinions float across web and social media on a massive scale. Using big data analytics, I tapped into social media, seeking to understand what online users think about the 2016 U.S. Presidential major party candidates, Donald Trump and Hillary Clinton.

I collected over 10,000 tweets from more than 5,000 users over a period of 30 days. These tweets were in English, were posted in the US and, of course, were about Donald Trump or Hillary Clinton.

The data required some cleansing and preparation, from unifying records to removing stop words. I used Datameer’s text mining capabilities and its Excel-like interface to derive some very interesting results. I’ll summarize some of the results in this post.

Surprisingly, at the time of this writing, my analysis of social media posts shows that overall sentiment levels for Trump and Hillary are the same.

When I first created the US Presidential Candidates application, four major candidates ranked in the following order, from most popular to least popular:

  • Bernie Sanders
  • Donald Trump
  • Hillary Clinton
  • Ted Cruz

With primaries and caucuses now over, we’re left with just one candidate for each of the two major U.S. political parties, narrowing the field to Trump and Clinton. While the two candidates are vastly different, they receive the same level of sentiment levels from social media users.

The range of sentiment level is from 0 to 100, with a higher number indicating sentiment that is more positive. Both Trump and Clinton have sentiment scores under 50, so we can say users’ online conversations tend to carry slightly more negative than positive connotations.

Sentiment Level and Tweets Over Time

Over the 30 day period, there was considerable fluctuation in the sentiment score derived from users’ Twitter conversations about each candidate. Looking at the graph overall, sentiment about Trump might seem to be more positive than Clinton’s. Yet with the fluctuation, the positive and negative sentiments tend to average out.

The tweets per day vary greatly, and the number is often related to political events. In fact, we see a spike in tweets on August 08, when a Clinton rally took place in St. Petersburg, Florida. We see a similar spike during a Trump rally in Akron, Ohio on August 22.

These political events may excite their supporters or frustrate their critics, and result in an increase in opinion postings on social media.

Candidate Sentiment Level by Selected States

The four states with the highest number of tweets are California, Florida, New York and Texas. Together, those tweets account for approximately 40 percent of the collected tweets in the US.

The results demonstrate that sentiment levels are higher for Hillary in New York and Florida, but higher for Trump in California and Texas.

Followers of Users on Twitter

Online public opinion can be a key influencer in voting decisions and can spread across users on social media platforms like Twitter very quickly. How quickly those messages spread is related to how many followers the user has.

I looked at the number of user followers on Twitter for both candidates to see which had potentially more online social influence.

The result shows that users who talk about Hillary have more followers — with a median of 301 followers per user — than users who talk about Trump — who have a median 233 followers per user.

Hashtags and Mentions

I also extracted hashtags and mentions, which is shown in the word cloud above, to discover the most frequently discussed topics and conversations among the Clinton and Trump-focused tweets.

Many of the conversations mention political media such as Fox News, CNN and Politico.
The POTUS handle, which is associated with Barack Obama, is the second most mentioned after Fox News.

As for hashtags here are the top four, according to how often these hashtags appeared in the Clinton and Trump-focused tweets:

  • #imwithher
  • #trump
  • #nevertrump
  • #neverhillary

In Conclusion

By analyzing a large volume of unstructured data on social media, we can learn a lot about online opinions, views, sentiment and topics of discussion.

If you have any questions or comments, please feel free to reach out to me via Twitter, @SokhnaVor.

Posted in Big Data & Brews

At Datameer, we’re obsessed with making data the most valuable asset in any organization. We believe that when people have unconstrained access to explore massive amounts of data at the speed of thought, they can make data-driven decisions that can wholly impact the future of any business.

Back to Overview

Subscribe to the Datameer Blog