A set of MapReduce scripts to get descriptive stats from Twitter objects stored in MongoDB.
To use this script, you need:
- A MongoDB database with a collection containing BSON documents imported from a raw Twitter dataset in JSON format.
- PyMongo
-
Download tweetStats.py
-
Make sure your MongoDB server is running
-
In the same directory where tweetsStats.py is located, type:
python tweetStats.py -cm COMMAND -db DATABASE -coll COLLECTION [-regen REGENERATE] [-lim LIMIT]
-
-cm[--command]: The command you want tweetStats to execute. Available commands include:
-
getDescriptives: Generate basic descriptives, such as TotalNumTweets, TotalNumberOfUsers, NumberOfTweetsPerUser, MostMentionedUsers, MostUsedHashtags, and MostLinkedToUrls
-
getTotalNumberOfRTd
-
getMostRepliedToUsers
-
-db[--database]: the name of the MongoDB database to use.
-
-coll[--collection]: the name of the collection to use.
-
[-regen[--regenerate]]: (True/False) Boolean indicating whether you would like the results to be recalculated. Default: True.
-
[-lim[--limit]]: (Int) Number of results to return. Default: 10