A Sentiment Analyzer for 2016 Candidates based on Realtime Tweets
This project is designed to visualize what twitter users are talking about for the 2016 presidential election and their preference towards dif erent candidates based on realtime twitter data. All visualizations would be updated every 5 minutes. A tree map is used to visualize the composition and percentage of high frequently mentioned words on twitter. The results of sentiment analysis on tweets were visualized through a scatter plot. The two axes correspond to the two dimensions that we used for sentiment analysis: polarity and objectivity. Also, a bar chart was used to visualize the polarity of tweets particularly.
What is the problem I want to solve and who has this problem?
The result of president election poll demonstrates the perspective of all voters. However, such result is not able to solve all problems since election poll is subject to all voters. What if I am interested in the preference on candidates of just a specific group of people such as Twitter users? In this project, I want to investigate this problem and develop an UI to help those people who are interested in this topic to find out what twitter users are talking about for the 2016 presidential election and their preference towards different candidates.
People who are interested in politics would definitely benefit from my UI to get an intuitive idea about how to use social media to help them get campaign success. In addition, the campaign group of various candidates can also leverage my UI to make and adjust their campaign strategy to attract a certain group of voters.
What questions do I want to be able to answer with myvisualization?
Who is the most popular candidate on twitter according to the realtime data? The number of tweets related to each candidate will be different each time I fetched the data. The percentage composition of tweets will be visualized to find which candidate is talked more comparing to others.
Which words are discussed most about different candidates on twitter? People usually talk about different things towards different candidates. What words are mentioned more when people talking about a specific candidate?
What sentiment are expressed when people use the words mentioned above? For some words or topic mentioned frequently, whether positive or negative emotion are expressed?
What are people’s attitudes towards each candidate in terms of sentiment? Based on the sentiment analysis result of tweets, what do people think of each candidate? Two dimensions, polarity and subjectivity, were used to evaluate people’s potential attitudes.
Which candidate is receiving more positive/negative attitude comparing to others? A value of positive/negative attitude will be assigned to each tweet based on the sentiment analysis, the average value of all tweets related to a candidate will be calculated and compared with others, from which I could know who is getting higher/lower score, thus more likely to be supported/unsupported by twitter users.
What does my data look like? Where does it come from? What realworld phenomena does it capture?
Since I used Twitter Stream API to fetch realtime tweets, the response data was a txt file in JSON format, which included extra information about timestamp, user location, geotag and etc. in addition to the tweet text.
However, before I finally visualize the information, text is going to be processed first. Specifically, I performed a sentiment analysis on it. By conducting such analysis, the subjectivity/objectivity and polarity of each tweet were calculated via Python TextBlob. Two sentiment scores (one positive and one negative) were then be calculated by taking the average of total polarity score that belongs to a specific candidates. For example, if I have 100 tweets with respect to Trump that express negative sentiment with a total polarity score of 70, the positive sentiment score for Trump will be 0.7. In addition, I found out the frequency of each word by counting its occurrence in all tweets after all necessary preprocessing treatment (removing stop words, lemmatizing, stemming) were applied on tweet corpus.
The attributes I visualized on UI are:
|Attribute Names||Attribute Type||Meaning|
|candidate names||text||Names of the candidates|
|keywords||text||Popular words appeared in the tweets|
|keywords popularity||quantitative [0, ∞)||The number of occurrence of the keyword in all tweets|
|subjectivity/objectivity||quantitative [0, 1]||The score that indicates the extent of expressed subjectivity/objectivity in a tweet (0.0 is very objective and 1.0 is very subjective)|
|polarity||quantitative [-1, 1]||The score that indicates the extent of expressed positive/negative sentiment in a tweet (1.0 is very negative and 1.0 is very positive)|
|sentiment score||quantitative [-1, 1]||The average of polarity for all tweets against a specific candidate|