Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to start batches from cron jobs? #1611

Open
francois opened this issue Feb 26, 2025 · 2 comments
Open

How to start batches from cron jobs? #1611

francois opened this issue Feb 26, 2025 · 2 comments

Comments

@francois
Copy link
Contributor

Hi! We are migrating existing jobs from Heroku Scheduler to GoodJob cron. We already have two cases where the job that we start from cron will enqueue a bunch of other jobs. One of the jobs is to examine all orders and set tags appropriately. The main job finds orders that need to be updated, then it enqueues a bunch of jobs to actually do the work. Ideally, we would like to have a batch around the N+1 jobs.

What we currently have is a job we call the Launcher job that creates a batch, then adds the main job:

class RebuildOrderTagsLauncherJob < ApplicationJob
  def perform
    batch = GoodJob::Batch.new
    batch.add(RebuildOrderOrderTagsJob.new)
    batch.enqueue
  end
end

class RebuildOrderOrderTagsJob < ApplicationJob
  # include GoodJob Batch extension

  def perform
    orders = find_orders_with_tags_that_need_to_change
    orders.each_slice(50).each_with_index do |slice, index|
      batch.add(RebuildOrderTagsJob.new(slice).set(wait: 10.seconds * index)
    end
  end
end

class RebuildOrderTagsJob < ApplicationJob
  def perform(order_ids)
    # do the work
  end
end

We were wondering if the cron entry's class could be a lambda/proc so we could build the batch from there? Reading the code, it looks like we can pass a lambda, but the lambda must still return a Class instance, not directly enqueue the job.

The other thing we tried was to build the batch in the job, then add the current job to the batch:

class RebuildOrderOrderTagsJob < ApplicationJob
  # include GoodJob Batch extension

  def perform
    self.batch = GoodJob::Batch.new
    batch.add(self)

    orders = find_orders_with_tags_that_need_to_change
    orders.each_slice(50).each_with_index do |slice, index|
      batch.add(RebuildOrderTagsJob.new(slice).set(wait: 10.seconds * index)
    end
  end
end

class RebuildOrderTagsJob < ApplicationJob
  def perform(order_ids)
    # do the work
  end
end

This crashes with NoMethodError #batch= and frankly, was kind of expected.

@bensheldon, do you have other suggestions on how we could achieve our goal? Thanks a bunch!

@bensheldon
Copy link
Owner

Interesting! Why are you not adding all the jobs to the batch immediately e.g. with only 2 jobs, not 3:

class RebuildOrderTagsLauncherJob < ApplicationJob
  def perform
    orders = find_orders_with_tags_that_need_to_change
    
    batch = GoodJob::Batch.new
    orders.each_slice(50).each_with_index do |slice, index|
      batch.add(RebuildOrderTagsJob.new(slice).set(wait: 10.seconds * index)
    end
    batch.enqueue
  end
end

class RebuildOrderTagsJob < ApplicationJob
  def perform(order_ids)
    # do the work
  end
end

I also do think it could/should work to add the current job to a batch. I think that seems like a reasonable feature.

Lastly, I think maybe what you really want here is an iteratable job, rather than necessarily a batch. I say that because I see that you're doing job smearing (scheduling at increments into the future), which is generally discouraged as non-optimal e.g.:

class RebuildOrderTagsIteratingJob < ApplicationJob
  def perform
    next_orders = find_orders_with_tags_that_need_to_change.limit(50)
    if next_orders.any?
      next_orders.each { process_order }

      retry_job # <-- that's what makes it iterating
    else
      # everything is processed, allow to finish
    end
  end
end

@francois
Copy link
Contributor Author

Why are you not adding all the jobs to the batch immediately e.g. with only 2 jobs, not 3:

Because I would have preferred the 1st job to be in the batch too. It's not a requirement, but a nice-to-have. With the launcher job, I can add the top-level job to the batch. I might be overthinking this.

I think maybe what you really want here is an iteratable job, rather than necessarily a batch.

I didn't know about the name "iterable job". I have used this pattern before. You're right that it might be better to use that pattern instead. Is that ActiveJob::Exceptions#retry_job ? That method does not accept parameters, so I must re-execute with the exact same parameters.

I would switch to this pattern instead:

class RebuildOrderTagsIteratingJob < ApplicationJob
  # include GoodJob Batch extension

  def perform(last_seen_id)
    next_orders = find_orders_with_tags_that_need_to_change(last_seen_id).to_a
    next_orders.each { process_order }
    return if next_orders.empty?

    batch.add(
      RebuildOrderTagsIteratingJob
        .new(next_orders.max(&:id)) # start somewhere in the middle of the list
        .set(wait: 10.seconds)
    )
  end

  def find_orders_with_tags_that_need_to_change(last_seen_id)
    scope = Order
    scope = scope.where('id > ?', last_seen_id) if last_seen_id
    # more filtering logic
    scope
  end
end

The disadvantage is that the expensive #find_orders_with_tags_that_need_to_change query runs multiple times (once per slice), instead of once at the beginning of the batch.

Thanks for the help with thinking. Feel free to close this issue, or keep it open if you want to be able to add the current job to a newly created batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants