We’re in the midst of the 2016 Election right now. As most Americans have noticed, politician’s tweets have been making up half of the headlines (despite being as long as headlines themselves). In this series I’ll be using the Python library Tweepy to look at the popularity/loyalty of the candidates’ Twitter accounts, as well as the topics they’ve chosen to discuss over time.
Macro Bonzanini’s Tweepy guide held my hand through this entire process. His more comprehensive Tweepy tutorials can be found here.
Using Tweepy requires a set of preliminary steps, such as creating a Twitter app to interact with Twitter’s API. This has been well documented on some other Tweepy guides so I will be breezing over it for now. Just note that creating the “auth” field as I have done below requires “consumer key” and “consumer secret” fields which are unique to your Twitter app.
Setting up Tweepy:
import tweepy key = 'put yours here' secret = 'put yours here' access_token = 'put yours here' access_secret = 'put yours here' auth = OAuthHandler(key, secret) auth.set_access_token(access_token, access_secret) api = tweepy.API(auth)
Relative Popularity, Loyalty, and Engagement
Before finding out anything else, I was curious how popular Donald Trump and Hillary Clinton were among their own followers in relative and absolute terms.
Twitter’s API offers a pretty vast range of fields for each tweet. A list of all these possibilities can be found here. For the first test I looked at the favorite count and retweet count of each politician’s last 1000 tweets. I also looked at the follower count of both, which is not a tweet specific field but a user specific field.
userID = 'HillaryClinton' followers = api.get_user(id=userID).followers_count setsize = 1000 fav_avg = 0 rt_avg = 0 for status in tweepy.Cursor(api.user_timeline, id = userID).items(setsize): fav_avg += status.favorite_count rt_avg += status.retweet_count fav_avg = fav_avg / setsize rt_avg = rt_avg / setsize print (userID) print ('Followers: ' + str(followers)) print ('Average Favorites: ' + str(fav_avg)) print ('Average Retweets: ' + str(rt_avg)) print ('Average Favorites (% of Followers): ' + '%f' % (fav_avg/followers * 100) + '%') print ('Average Retweets (% of Followers): ' + '%f' % (rt_avg/followers * 100) + '%')
The above segment outputs:
HillaryClinton Followers: 8,327,832 Average Favorites: 6,005.776 Average Retweets: 2,866.363 Average Favorites (% of Followers): 0.072117% Average Retweets (% of Followers): 0.034419%
Just by changing userID to “realDonaldTrump”, we can see the equivalent numbers for Donald:
realDonaldTrump Followers: 10,934,978 Average Favorites: 23,026.52 Average Retweets: 8,071.84 Average Favorites (% of Followers): 0.210577% Average Retweets (% of Followers): 0.073817%
The differences in absolute terms are pretty striking, but the differences as a % of total followers are more telling. Donald’s numbers for retweets are double Hillary’s, and his numbers for favorites are nearly triple her’s. Not only does Donald have more followers and favorites, but individual tweets engage his audience more. But the two don’t always scale. Here’s Barack Obama’s numbers for comparison:
BarackObama Average Followers: 76,754,938 Average Favorites: 4,033.78 Average Retweets: 1,851.41 Average Favorites (% of Followers): 0.005255% Average Retweets (% of Followers): 0.002412%
Obama has a ton of followers (76.8 million), likely because he’s the President. But his tweets have even fewer favorites and retweets on average than Hillary’s.
I wondered if there was a distinct difference in tweeting volume here (there is), so I decided to check the average number of tweets per day for both candidates using the following code segment:
import datetime test_date = datetime.datetime.now() + datetime.timedelta(-30) userID = 'HillaryClinton' tweetCount = 0 for status in tweepy.Cursor(api.user_timeline, id = userID).items(): if (status.created_at > test_date): tweetCount = tweetCount + 1 else: break print (userID) print (tweetCount)
I found that over the last 30 days, Hillary tweeted an average of 16.5 times a day whereas Donald tweeted an average of 9.4. I imagine this affects the amount of favorites the average tweet garners, as followers naturally spread their favorites out amongst whatever tweets come in that day.