Skip to content

Conversation

@kustra
Copy link

@kustra kustra commented Nov 12, 2025

This PR enables combine_folder_multiprocess.py to reuse an existing pushshift_working folder with a different --file_filter value.

When a new --file_filter is provided, the generated output files will include only data from input files that match the filter, even if pushshift_working contains additional data from a previous run.

This is particularly useful for partitioning output after encountering unexpectedly large results.

@Watchful1
Copy link
Owner

I'm traveling the next couple weeks and won't be able to test this till mid December. But I will get to it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants