So I finally was able to get the Twitter mining to run without any major interruptions. I decided to look at how many tweets and retweets are sent in the United States as a function of time. This enables me to get some insights into when people are most active (as in do Twitter users mostly follow a normal sleep schedule). I should have focused just on a particular region, since I am currently looking at 6 different time zones. But the two non-continental US time zones have a drastically smaller population, so it may not be the biggest issue.
We can see that the bulk of the tweets are coming in from the 9 AM to midnight range EST. So it does more or less match up with a normal sleep schedule. There are some anomalies in the data. Towards the end of the collection, I needed to take my laptop off of Ethernet and bring it to a lab. The Wi-Fi was a little spotty and I was dropping the stream every so often. Furthermore, earlier today I was running into more rate limit issues with the Twitter API, and thus missed some more tweets.