Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

Add number of extracted, loaded, and failed resources in batch responses #693

Closed
wants to merge 13 commits into from

Conversation

Jasopaum
Copy link
Contributor

@Jasopaum Jasopaum commented Oct 25, 2021

TODO: front

Fixes

Fixes #668 by @MiskoG

Description

  • Back: add a progressions field to Batch
  • Front: display this information in the "ETL view"

Technical details

If the batch is running, the progression is fetched from redis. Otherwise, it's persisted and fetched from the DB

Definition of Done

  • I followed the Arkhn Code Book (I swear!).
  • I have written tests for the code I added or updated.
  • I have updated the documentation according to my changes.
  • I have updated the deployment configuration if needed.

@Jasopaum Jasopaum changed the title Display number of extracted, loaded, and failed resources in ETL view [WIP] Display number of extracted, loaded, and failed resources in ETL view Oct 25, 2021
@Jasopaum Jasopaum requested review from tevariou and simonvadee and removed request for tevariou October 25, 2021 14:19
Copy link
Contributor

@tevariou tevariou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update river-schema.yml as well?

django/river/api/serializers/serializers.py Outdated Show resolved Hide resolved
django/river/api/serializers/serializers.py Show resolved Hide resolved
@Jasopaum Jasopaum requested a review from tevariou October 25, 2021 15:26
Copy link
Contributor

@simonvadee simonvadee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's best to use a dedicated table for Progression. having a jsonfield with a custom structure in the Batch table doesn't make sense in a relational db, especially if we want to make queries involving a progression counter by resource_id.
@tevariou what's your opinion ?

@tevariou
Copy link
Contributor

IMO it's best to use a dedicated table for Progression. having a jsonfield with a custom structure in the Batch table doesn't make sense in a relational db, especially if we want to make queries involving a progression counter by resource_id. @tevariou what's your opinion ?

I agree it may be more practical to have a dedicated table for filtering

@Jasopaum
Copy link
Contributor Author

Also, @tevariou, @simonvadee, do you know if there is an easy way to know the number of resources that have been updated compared to inserted?

@simonvadee
Copy link
Contributor

do you know if there is an easy way to know the number of resources that have been updated compared to inserted?

good question: there isn't. In loader, we use dao.update() whether we want to insert or update a resource, and we always increment the loaded_counters. We could make a distinction between inserted and updated depending on the return value of dao.update() maybe ? I encourage you to create an issue.

@Jasopaum Jasopaum requested a review from simonvadee October 26, 2021 10:46
@Jasopaum Jasopaum changed the title [WIP] Display number of extracted, loaded, and failed resources in ETL view Display number of extracted, loaded, and failed resources in ETL view Oct 26, 2021
@@ -11,6 +15,7 @@ class Meta:

class BatchSerializer(serializers.ModelSerializer):
errors = ErrorSerializer(many=True, read_only=True)
progressions = serializers.SerializerMethodField()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we created a model for this field, maybe we need to do something like that https://www.django-rest-framework.org/api-guide/relations/#nested-relationships

But it adds complexity (and I know Valentin didn't like nested serializers).. maybe it would be better to create a new route

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I didn't know how to combine a SerializerMethodField and model Serializers... Do you know how we could do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to stop using SerializerMethodField

@Jasopaum Jasopaum changed the title Display number of extracted, loaded, and failed resources in ETL view Add number of extracted, loaded, and failed resources in batch responses Oct 26, 2021
@Jasopaum Jasopaum force-pushed the jason/display-extract-load branch from 9d1791c to b6981ee Compare October 27, 2021 10:07
Copy link
Contributor

@simonvadee simonvadee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only small comments 💯

@@ -21,6 +42,40 @@ class Meta:
"completed_at": {"allow_null": True},
}

def get_progressions(self, obj) -> List[Tuple[str, Dict]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not used anymore: you use ProgressionSerializer and ResourceForProgressionSerializer instead

Comment on lines +66 to +67
progression_factory.create(batch=batch, resource=r1)
progression_factory.create(batch=batch, resource=r2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you define extracted/loaded/failed here instead of directly in the factory ? easier to understand

Comment on lines +125 to +126
progression_factory.create(batch=batch, resource=r1)
progression_factory.create(batch=batch, resource=r2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


response_get = api_client.get(url)
assert response_get.json()["canceled_at"] is not None
print(response_get.json()["progressions"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +47 to +49
extracted = 100
loaded = 50
failed = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO: should be removed

@Jasopaum Jasopaum closed this Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show number of instances created or updated within a batch
3 participants