You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -32,3 +51,85 @@ Birdwatch generates the following in the snapshots folder:
32
51
9.png
33
52
tweets.json # Metadata of each screen-capped tweet.
34
53
```
54
+
55
+
### Self Boosted Tweet Detection
56
+
57
+
A self-boosted tweet is a tweet where the original author retweets.
58
+
These types of tweets are marked with `potential_boost` as true in `tweets.json`.
59
+
The script detects these by matching exact meta-datas e.g. duplicate posts.
60
+
61
+
## Schemas
62
+
63
+
Assume all data is UTF-8 compliant.
64
+
65
+
### Input File
66
+
67
+
These files are what the Twitter exporter should generate (`.js` file) from the users you are following:
68
+
69
+
```json
70
+
window.* = [
71
+
{
72
+
"following": {
73
+
"accountId": <id>,
74
+
"userLink": <url>
75
+
}
76
+
...
77
+
}
78
+
]
79
+
```
80
+
81
+
You can rename as json or specify via input flags to parse the file. `window.* =` is automatically removed by the script and is default generated by Twitter. However, you can also manually remove it to parse the file as JSON directly.
82
+
83
+
### tweets.json
84
+
85
+
```json
86
+
[
87
+
{
88
+
"id": int,
89
+
"tag_text": str,
90
+
"name": str,
91
+
"handle"str,
92
+
"timestamp": str,
93
+
"tweet_text": str,
94
+
"retweet_count": str,
95
+
"like_count": str,
96
+
"reply_count": str,
97
+
"potential_boost": bool
98
+
}
99
+
]
100
+
```
101
+
102
+
Invalid string entries will be marked as "NULL".
103
+
104
+
### metadata.json
105
+
106
+
```json
107
+
{
108
+
"bio": str,
109
+
"name": str,
110
+
"username": str,
111
+
"location": str,
112
+
"website": str,
113
+
"join_date": str,
114
+
"following": str,
115
+
"followers": str
116
+
}
117
+
```
118
+
119
+
Invalid string entries will be marked as "NULL".
120
+
121
+
122
+
## Troubleshoot
123
+
124
+
* My scraper terminates early?
125
+
126
+
It is possible that either your images are taking sometime to load Consider using `-s` to adjust load-time.
127
+
Or your scrolling height is too low / too high. Consider using `--scroll-algorithm` to adjust the type of algorithm
128
+
Then passing in a value to the algorithm `--scroll-value`.
129
+
130
+
Help has more information as to what `--scroll-value` encodes.
131
+
132
+
## Future Updates
133
+
134
+
* Support Running Multiple Sessions to Resume Per-Profile
0 commit comments