Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troll Tweets-- Version 2 #25

Closed

Conversation

patrick-lee-warren
Copy link

Removes the original files and replaces them with a more complete version, including new variables and removal of some tweets that were accidentally included. These new files fix the UTF-8 double encoding problems, and should be sufficiently "bite sized". Readme is updated to reflect both these changes.

@EvanCarroll
Copy link

EvanCarroll commented Aug 27, 2018

I have a branch of this with

  • duplicates removed
  • data files as csv text for easy import
  • a PostgreSQL loader and Dumper script by tweet_id order

this would allow us to predictably regenerate files when changing them so the distribution would stop growing by a GB every time. It would also serve as a reference implementation for people brining this into the database.

You can see my branch here: https://github.com/EvanCarroll/russian-troll-tweets/tree/version_2

The only hold up is whether or not you're ok with CSV file changing with what @patrick-lee-warren calls "Version 2" if you are please get back soon so I can fix this up for you and send you a new pull request with all his changes.

@dmil
Copy link
Contributor

dmil commented Aug 27, 2018

Thanks. I've just posted the latest version of the data (#28) that @patrick-lee-warren sent us.

@dmil dmil closed this Aug 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants