Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest release breaks Hangfire #153

Open
soenneker opened this issue Jun 17, 2024 · 17 comments
Open

Latest release breaks Hangfire #153

soenneker opened this issue Jun 17, 2024 · 17 comments

Comments

@soenneker
Copy link
Contributor

Not sure if you guys have tried the new version, but we had to revert to 1.9.3 because it would not complete jobs.

Related to this work: #150

@marcoCasamento
Copy link
Owner

No, not tried the latest, but the merge contains nothing new.
What does it means that it doesn't complete the jobs? Are jobs being executed but the status is not updated? Or are jobs not fetched?

@soenneker
Copy link
Contributor Author

No, not tried the latest, but the merge contains nothing new.

I'm not sure I follow, it looks like a lot of changes from 1.9.3. I strongly suggest delisting the latest NuGet package!

What does it means that it doesn't complete the jobs?

I mean once the job starts processing, it doesn't finish it once all of the code has run. Additionally it looks like parallelization may be broken? Please, give it a try and you'll see what I'm talking about.

@marcoCasamento
Copy link
Owner

I'm not sure I follow, it looks like a lot of changes from 1.9.3. I strongly suggest delisting the latest NuGet package!

Sure, it contains changes from 1.9.5 and the two PR about job requeiung and changes for uniforming async use. But ok, unlisting it's the safest option here.

Please, give it a try and you'll see what I'm talking about.

I'll do for sure.

@marcoCasamento
Copy link
Owner

I can confirm that a job is being requeued after being succesfully executed, in an apparently infinite loop. Didn't spot anything about parallelization.
I think that involves the code at commit e437998, that touch both RedisFetchedJobs and FetchedJobsWatcher.
I'm going to try the code at commit immediately before and immediately after

@marcoCasamento
Copy link
Owner

It turns out that the problems arise with:
JobStorageFeatures.Transaction.RemoveFromQueue(typeof(RedisFetchedJob)), true
if RedisStorage returns true for that feature, the above problem arise.
Unfortunately I don't know what that options means in Hangfire's logics.

@Lexy2 Can you shed a light on that flag ?

@marcoCasamento
Copy link
Owner

marcoCasamento commented Jun 18, 2024

It seems that at least IWriteOnlyTransaction.RemoveFromQueue([NotNull] IFetchedJob fetchedJob) should be implemented
See https://github.com/HangfireIO/Hangfire/blob/main/src/Hangfire.Core/Storage/JobStorageTransaction.cs

public virtual void RemoveFromQueue([NotNull] IFetchedJob fetchedJob)
{
    throw JobStorageFeatures.GetNotSupportedException(JobStorageFeatures.Transaction.RemoveFromQueue(fetchedJob.GetType()));
}

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 18, 2024

I'll take a look. For now I think the best way is to delist the package, revert #152 and publish a version without this particular change.

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 18, 2024

Yeah, it's completely busted. I wonder how this happened, I'll take a look tomorrow.

marcoCasamento pushed a commit that referenced this issue Jun 18, 2024
…b fetchedJob). Seems to to address #153 but further investigatios needed.
@marcoCasamento
Copy link
Owner

I believe Hangfire modifies its behavior if the storage indicates support for RemoveFromQueue. It seems to depend on IWriteOnlyTransaction.RemoveFromQueue to dequeue a job once it is completed.

I've just implemented this method with:

public override void RemoveFromQueue([NotNull] IFetchedJob fetchedJob)
{
    fetchedJob.RemoveFromQueue();
}

It appears to process the jobs correctly, but further tests and documentation are required for this function.

I plan to release a pre-release version, 1.10.1-Beta1, to facilitate testing.

@marcoCasamento
Copy link
Owner

I wonder how this happened, I'll take a look tomorrow.

I wonder the same, I was sure the code was already running before the lasts merges and this problem predates it.

As you're at it, would you please check my last two commits on RedisFetchedJob ? There was probably a copy-and-paste error on RemoveFromQueue that was actually Requeuing the job (if no UseTransactions), I believe it could be implied in this issue

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 19, 2024

It appears to process the jobs correctly

The jobs stay fetched, so it's not yet complete :-(

@marcoCasamento
Copy link
Owner

True. It appears it depends because of line 63 of RedisFetchedJob, the condition returns false.

if (fetchedAt == FetchedAt) { RemoveFromFetchedListAsync(transaction); }

Do you remember why this check was added ?

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 19, 2024

Yes, this check is actually a fix for #135. Hangfire orders the storage to remove the timed out job, and storage should only do this if it's the same job (and not its requeued copy).

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 19, 2024

Wouldn't it be easier just to remove the capability from the storage so that Hangfire would use the other methods?

@marcoCasamento
Copy link
Owner

Wouldn't it be easier just to remove the capability from the storage so that Hangfire would use the other methods?

I don't know what the RemoveFromQueue is supposed to do, I was hoping you know more. Problem is, however, that even removing the feature, the job stay fetched as the check still returns false.

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 19, 2024

Yuck. I'll try to find time to look into it. Still don't understand why it worked then and does not now.

@Lexy2
Copy link
Contributor

Lexy2 commented Jun 21, 2024

Okay, so I added this
{ JobStorageFeatures.Transaction.RemoveFromQueue(typeof(RedisFetchedJob)), true } to RedisStorage.cs, but did not properly test it. Like, this was a last minute thing.

As for jobs not being removed from dequeued state, this seems to be a recent change from Hangfire.
I tested everything on 1.8.5, and it works (just if you remove this RemoveFromQueue transactional feature), but as soon as you upgrade to 1.8.12, it stops working.

Change in behaviour. I'm reviewing changes to Hangfire core to see if I can spot the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants