-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to retrieve only a single Tweet, not subtweets #99
Comments
@cguess is this what you were looking for? search_tweets_task = stweet.SearchTweetsTask(from_username=tweet_author, replies_filter=RepliesFilter.ONLY_ORIGINAL)
stweet/stweet/search_runner/replies_filter.py Lines 6 to 10 in fe34e98
|
@oneroyalace are you aware of any ways to apply such filters for TweetsByIdTask? I have not found any options in the code and tried playing with the context but it seems to work as a counter for stats and doesn't seem to provide any options for limiting the number of replies/requests. |
I've done some quick research and seems like the stweet/stweet/tweets_by_ids_runner/tweets_by_id_runner.py Lines 71 to 77 in fe34e98
If you only want the exact tweet you asked, you may force the parsed_list = get_all_tweets_from_json(response.text)
# cursors = [it for it in parsed_list if isinstance(it, Cursor)]
# cursor = cursors[0] if len(cursors) > 0 else None
cursor = None # force the cursor to be None
user_tweet_raw = [it for it in parsed_list if isinstance(it, UserTweetRaw)]
self.tweets_by_id_context.add_downloaded_tweets_count_in_request(len(user_tweet_raw))
self.tweets_by_id_context.cursor = cursor
self._process_new_tweets_to_output(user_tweet_raw) By editing this, in the This will still result in an "one-level" scrapping, so if the tweet has replies, some (or all) of the replies will still be included. You may then filter the desired tweet by checking the "id_str" in the raw dictionary. But still this will dramatically increase the speed especially for tweets with large amount of interactions. |
I've actually achieved a much easier implementation that you don't need to change the source code. from stweet.tweets_by_ids_runner.tweets_by_id_context import TweetsByIdContext
class DummyContext(TweetsByIdContext):
def __setattr__(self, __name: str, __value: Any) -> None:
if __name == "cursor":
__value = None
return super().__setattr__(__name, __value) Create a Example Usage: stweet.TweetsByIdRunner(
tweets_by_id_task = task,
raw_data_outputs = [output],
tweets_by_ids_context = DummyContext()
).run() |
Many thanks @junyilou ! I have had success with the strategy you provided. Results still need filtering as you have foreseen, but speed is unharmed. |
Running the sample code for getting a tweet with a single id actually scrapes dozens of tweets taking 30+ seconds and return a massive List. Is it possible to just get the first tweet object and then stop running so it can be more time efficient?
I've tried messing around with the context, but I can't seem to wrap my head around how that's actually used in this project. I could perhaps be being dense however, and any clarification would be greatly appreciated.
The text was updated successfully, but these errors were encountered: