Exago BI vs Public Data: Tweeting @ElonMusk
Anyone who knows me well knows I have a (potentially unhealthy) obsession with Elon Musk. I follow the entrepreneur on Twitter and often read his replies to questions that seemingly “normal people” tweet to him on a regular basis. This got me wondering if there is a best time to ask a question of Elon in order to maximize the likelihood of receiving a response. So, I decided to pull all his tweets to do some time-series analysis.
The Data Model
For this mini-project, the data model was just a collection of tweets from a single user. I wanted to pull down all of Elon’s tweets and examine the tweet content and meta information like the time at which the tweet was posted.
Thankfully, Twitter has not only a public REST API, but wrappers exist for pretty much any language you could dream up. I got set up with Tweepy, a Twitter API wrapper for Python, and the resulting script to dump Elon’s tweets to CSV ended up fitting comfortably into 63 lines. Python!
I did run into a couple of minor issues working with Python in this context (I’m no Python expert). The first was getting Python to spit out the correct line breaks into the CSV file. I use Windows (for better or worse), and I had a hard time getting my script to spit out the correct “rn” line breaks when the script wanted to output an additional carriage return for some reason. This was remedied by using the CSV module built into Python which abstracts things sufficiently that I didn’t need to worry about manually outputting line breaks. The second issue I ran into was regarding string encoding. There is a wealth of information out there on the subject, but somehow as programmers we (or at least I) tend to learn just enough to get our current project working, then move on! After I figured out that the data returned by Twitter was in UTF-8 encoding and got past a separate double-encoding problem, I had my nice clean CSV.
Elon is involved with several different ventures, so I made a histogram plotting the frequency that Elon mentioned his various companies in tweets. This was done simply in Exago BI by creating a formula group to split the data into tweets that mentioned one of the below companies, or mentioned none of the companies.
No surprise here: Elon has publicly admitted to Tesla being drama-prone, and the Boring Company...well...I’ll avoid the urge to come up with some fitting pun here and move on.
Let’s get to the good stuff: when is this guy actually active online? After trying a few different schemes, I ended up deciding to slice the tweets by the time of day each tweet was posted. Since I was primarily interested in Elon’s interaction with other users, I wanted to plot the number of times he replied to a user compared to when he sent out a tweet to the community at large. I did this in Exago BI by creating two groups in the Advanced Report designer, with the following break conditions:
- Time of day, by hour
- Does the tweet start with the ‘@’ symbol (user mention)
Yes, a tweet doesn’t have to start with the user mention to be a reply, but for the purposes of my limited analysis I determined this to be adequate criteria to designate a reply tweet. So in actuality, Elon’s reply rate might actually be higher than shown below!
It appears that you are more than 40% more likely to receive a tweet back from Elon between the hours of 10am and 11am, west coast US time, than during any other time of day. So the only thing left to do here is to jump on Twitter at 1pm EST (maybe 1:15??) and give it a shot!
To learn more about using Exago BI, check out our Training and Support Lab videos. Have you done any interesting data scrapes? Tell us what you learned in the comments!