Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always delete associated jobs when deleting datasets #468

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

stijn-uva
Copy link
Member

Currently deleting a dataset is not coupled with deleting jobs with a reference to that dataset. This is done when a dataset is deleted via the webtool, but in other cases dataset.delete() can be called and potentially leave orphaned jobs.

There is no scenario in which a job with a reference to a deleted dataset should exist and keep running, so this PR moves the logic for cleaning up those jobs into the dataset.delete() method.

It's a relatively simple change but worth testing properly, so doing this via a PR rather than a direct commit.

@rphlmcosta
Copy link

I tried using an extension, but it kept running indefinitely, so I deleted the dataset. Unfortunately, the job remained in the queue, and I couldn’t find a way to get rid of it.

I’ve already tried restarting 4CAT and disabling the data source (web-studies), but nothing worked.

My issue seems somewhat similar to yours, so I thought this might be the right place to ask: do you know how I can remove the job from the queue? I’ll eventually reinstall the extension to try again, but with those jobs stuck in the queue, it might not work properly.

Thanks in advance!

@dale-wahl
Copy link
Member

dale-wahl commented Feb 14, 2025

Check your backend logs. They should be visible through the UI in the control panel. Most likely the dataset crashed/failed due to an error that would have been raised there. That is going to explain why the dataset ran “indefinitely” (if it crashed and does not properly do anything to inform the user of why, it will appear to still be running).

As to removing the job itself, you’d have to do that in the database directly. Connecting to psql and deleting the record in the jobs table. But I would check the error and attempt to fix it, then restart 4cat and it will reattempt the job with the fixed code.

The queue ignores jobs that have failed (they are still in the queue but non blocking of other jobs even of the same type). The only way they would block is if they are actually running but stuck in a loop. All existing jobs of the same type will run with updated code on restart of 4cat.

@stijn-uva stijn-uva merged commit e75eaf0 into master Feb 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants