Skip to content

Conversation

@AlexVelezLl
Copy link
Member

@AlexVelezLl AlexVelezLl commented Oct 8, 2025

Summary

  • Adds support for a retry_on argument in the @task decorator to specify a list of potential non-deterministic exceptions that can be retried if the job failed because of them.
    • The user won't see the task failed until the same task was re-attempted 3 times.
  • Updates the setupwizard frontend to handle failed tasks.
  • Updates the setupwizard frontend to persists users being imported.
  • Disable back button in the import users page when importing users to prevent unexpected page layouts.

References

Closes #11836.

Reviewer guidance

@AlexVelezLl AlexVelezLl requested a review from rtibbles October 8, 2025 23:22
@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... APP: Setup Wizard Re: Setup Wizard (facility import, superuser creation, settings, etc.) DEV: frontend SIZE: medium labels Oct 8, 2025
@AlexVelezLl AlexVelezLl force-pushed the fix-lod-import-multi-users branch from 4e2fbc9 to 236f654 Compare October 8, 2025 23:26
@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

@rtibbles rtibbles self-assigned this Oct 9, 2025
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can maintain the current separation of concerns, and it may be worth the effort of adding a new column to track the retries rather than keeping it in the extra_metadata.

To allow us to migrate the SQLAlchemy table, adding alembic as a dependency feels a bit heavy duty. So perhaps the answer is to clear the jobs table of any finished tasks, then dump the remainder to a temporary CSV, clear the table, recreate, and then reload the data?

permission_classes=None,
long_running=False,
status_fn=None,
retry_on=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job avoiding a classic Python gotcha! (passing mutable values as default arguments, such as [] is a very common mistake that can cause issues)

total_progress=0,
result=None,
long_running=False,
retry_on=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we don't need to store this in the job object - we're not allowing this to be customized per job, only per task - so I think we can just reference this from the task itself, rather than having to pass it in at job initialization. This also saves us having to coerce the exception classes to import paths.

)
setattr(current_state_tracker, "job", None)

def should_retry(self, exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather defer all this logic to the reschedule_finished_job_if_needed method on the storage class, rather than having it in the job class.


def should_retry(self, exception):
retries = self.extra_metadata.get("retries", 0) + 1
self.extra_metadata["retries"] = retries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit iffy about using extra_metadata for tracking this - I think if we want to hack the existing schema, 'repeat' is probably a better place for this, but I wonder if instead we should add to the job table schema to add error_retries so that we can put a sensible default in place for failing tasks so they don't endlessly repeat.

I also think I'd rather have the retry interval defined by the task registration (we could also set a sensible default if retryable exceptions are set).

@AlexVelezLl AlexVelezLl force-pushed the fix-lod-import-multi-users branch from 55e3dba to 431cfd0 Compare December 10, 2025 02:48
@AlexVelezLl AlexVelezLl force-pushed the fix-lod-import-multi-users branch from a2765cb to 1a2f204 Compare December 10, 2025 15:18
Comment on lines -145 to -149
from django import db

# Destroy current connections and create new ones:
db.connections.close_all()
db.connections = db.ConnectionHandler()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these db.connections overrides and have used the patch("django.db.connections" instead. These overrides were having some side effects on the job tests that involved having multiple threads, and it was messing things up in the teardown process.

However, not sure if removing these lines may cause somehow a false positive in the test.

Comment on lines -206 to -208
from django import db

db.connections["default"].connection = None
Copy link
Member Author

@AlexVelezLl AlexVelezLl Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem, will instead rely on the django.db.connections patch. But not sure if this may cause false positives.

@AlexVelezLl AlexVelezLl requested a review from rtibbles December 10, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APP: Setup Wizard Re: Setup Wizard (facility import, superuser creation, settings, etc.) DEV: backend Python, databases, networking, filesystem... DEV: frontend SIZE: large SIZE: medium SIZE: very large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup Wizard - Confusing behavior when importing multiple learners

3 participants