-
Notifications
You must be signed in to change notification settings - Fork 50
Metadata crawler improvments #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
@@ -38,8 +44,14 @@ def crawl_uri(metadata_uri: str) -> Any: | |||
result = None | |||
while retry < 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sugest to re-write this func to avoid increment of retry
var only in exceptions. But would do this by default at the end of loop and if response.status == 200:
then write result and break the while
loop.
But if this version works well, then probably there are no need any changes.
already_parsed = get_current_metadata_for_address( | ||
db_session=db_session, blockchain_type=blockchain_type, address=address | ||
logger.info( | ||
f"Start crawling {len(not_updated_tokens)} tokens of address {address}" | ||
) | ||
|
||
for requests_chunk in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to read, probably better to add another variable on top of this for
loop.
|
||
if token_uri_data.token_id not in already_parsed: | ||
metadata = crawl_uri(token_uri_data.token_uri) | ||
with ThreadPoolExecutor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With thread pool executor better to store results and wait
them like in this code https://github.com/bugout-dev/moonstream/blob/e5240d2fd85685383bb40ea4cad741ec90b4fd6d/crawlers/mooncrawl/mooncrawl/blockchain.py#L367
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this threads which inside another loop from list comprehension range will be under better control))
Changes
How to test these changes?
Related issues