Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mission regions and shifted columns #18

Closed
chrisgherbert opened this issue Aug 4, 2018 · 2 comments
Closed

Mission regions and shifted columns #18

chrisgherbert opened this issue Aug 4, 2018 · 2 comments

Comments

@chrisgherbert
Copy link

There are a handful of tweets that are missing a region and have the following columns shifted to the left:

external_author_id author content region language publish_date harvested_date following followers updates post_type account_type new_june_2018 retweet account_category date_mysql
1670762347 ADAMCHAPMANJR Check out the video I made with @LAEducators to #ThankaLAEducator https://t.co/N7J70kSDsn,United States English 10/6/2016 21:04 10/6/2016 21:07 582 934 1807 0 left 0 1 0 NULL 2016-10-06 21:07:00
1850866398 BRICEGELLER Check out the video I made with @LAEducators to #ThankaLAEducator https://t.co/N7J70kSDsn,United States English 10/6/2016 20:54 10/6/2016 20:55 851 852 1587 0 left 0 1 0 NULL 2016-10-06 20:55:00
1626302035 CLAYPAIGEBOO Check out the video I made with @LAEducators to #ThankaLAEducator https://t.co/N7J70kSDsn,United States English 10/6/2016 20:58 10/6/2016 20:58 776 916 1562 0 left 0 1 0 NULL 2016-10-06 20:58:00
1692501152 CORNELLBURCHET Check out the video I made with @LAEducators to #ThankaLAEducator https://t.co/N7J70kSDsn,United States English 10/6/2016 20:55 10/6/2016 20:55 725 774 1536 0 left 0 1 0 NULL 2016-10-06 20:55:00
2882037326 DANAGEEZUS I need a dance break right meow! (•_•) <) )╯all the single ladies / \ (•_•) \( (> all the single ladies / \ (•_•) <) )╯oh oh oh / ,United States English 7/6/2015 15:03 7/6/2015 15:03 3740 9351 1849 NULL Hashtager 0 0 0 NULL 2015-07-06 15:03:00
2577152109 DENN_NIKITIN курточка моя так хорошо сидит на ней \ты пропел эти слова в голове,Unknown Russian 4/15/2017 13:19 4/15/2017 13:19 68 113 6277 0 Russian 0 1 0 NULL 2017-04-15 13:19:00
@bet4a
Copy link

bet4a commented Aug 4, 2018

It looks like something may have gone wrong with CSV parsing? All of these entries do have a region (or an Unknown region). But in your table, they’re appearing at the end of the content columns.

Taking a look at some of these entries on my computer, they seem to share something in common—the final character of the content text is a backslash. So it seems your CSV parser is treating \, as an escape code for a comma. But the flavor of the CSV used by the dataset doesn’t use backslash escape codes; instead, that combination should be parsed as a textual backslash followed by a , column separator.

@chrisgherbert
Copy link
Author

Ah, thanks. I'll reimport this and make sure my importer isn't doing that.

@dmil dmil closed this as completed Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants