Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for job ID fails on Heroku due to 30s request limit #1618

Open
francois opened this issue Mar 12, 2025 · 2 comments · May be fixed by #1619
Open

Searching for job ID fails on Heroku due to 30s request limit #1618

francois opened this issue Mar 12, 2025 · 2 comments · May be fixed by #1619

Comments

@francois
Copy link
Contributor

We are on 4.9.3 and have run good_job:update and db:migrate.

On the dashboard, searching for a job ID fails on Heroku, because our 120k rows good_jobs table takes a very long time to actually complete. Of course, it is possible to open any job, then hack the URL to replace it with the job ID that we have on hand. This works, and takes less than 30s, but is error-prone.

The actual query that runs is:

SELECT "good_jobs"."id", "good_jobs"."queue_name", "good_jobs"."priority", "good_jobs"."serialized_params", "good_jobs"."scheduled_at", "good_jobs"."performed_at", "good_jobs"."finished_at", "good_jobs"."error", "good_jobs"."created_at", "good_jobs"."updated_at", "good_jobs"."active_job_id", "good_jobs"."concurrency_key", "good_jobs"."cron_key", "good_jobs"."cron_at", "good_jobs"."batch_id", "good_jobs"."batch_callback_id", "good_jobs"."executions_count", "good_jobs"."job_class", "good_jobs"."error_event", "good_jobs"."labels", "good_jobs"."locked_by_id", "good_jobs"."locked_at" FROM "good_jobs" WHERE ((to_tsvector('english', id::text) || to_tsvector('english', COALESCE(active_job_id::text, '')) || to_tsvector('english', serialized_params) || to_tsvector('english', COALESCE(serialized_params->>'arguments', '')) || to_tsvector('english', COALESCE(error, '')) || to_tsvector('english', COALESCE(array_to_string(labels, ' '), ''))) @@ websearch_to_tsquery('38979b0b-af58-4bf4-969e-017b10fa3bb7')) ORDER BY ts_rank((to_tsvector('english', id::text) || to_tsvector('english', COALESCE(active_job_id::text, '')) || to_tsvector('english', serialized_params) || to_tsvector('english', COALESCE(serialized_params->>'arguments', '')) || to_tsvector('english', COALESCE(error, '')) || to_tsvector('english', COALESCE(array_to_string(labels, ' '), ''))), websearch_to_tsquery('38979b0b-af58-4bf4-969e-017b10fa3bb7')) DESC, created_at desc, id desc LIMIT 25

In our database, this means converting 125,000 rows, callingto_tsvector() on 9 columns, and converting JSONB, UUID and TEXT columns back to TEXT. That's almost a million to_tsvector calls!

There are two ways forward:

  1. Add a full-text search index
  2. Recognize a "common" case and let JobsFilter recognize this type of value and make the query more performant

I'm worried that adding an index may negatively impact performance. We can't forget that GoodJob's database is hosted within another one, and if GoodJob takes more CPU/RAM/disk than the customer's database, then GoodJob would be considered a bad citizen.

francois added a commit to francois/good_job that referenced this issue Mar 12, 2025
Because there are no indices for full-text search, searching for a job
ID takes an inordinate amount of time. This commit examines the search
query and if the only item is a UUID, it specifically searches for a job
matching that ID.

Fixes bensheldon#1618
@francois
Copy link
Contributor Author

On our production cluster, we preserve job records for 4 days only. That allows us to cover Saturday, Sunday and Monday, while we come back on Tuesday after a 3-day weekend.

Search is a really interesting feature, but unfortunately, it isn't quite useable at the moment.

@francois
Copy link
Contributor Author

francois commented Mar 12, 2025

Ick, array_to_string is only stable, not immutable. A full-text search on cannot be made on the existing query. It would have to change to support that.

# create index index_good_jobs_full_text on good_jobs using gin (  ((to_tsvector('english', id::text) || to_tsvector('english', COALESCE(active_job_id::text, '')) || to_tsvector('english', serialized_params) || to_tsvector('english', COALESCE(serialized_params->>'arguments', '')) || to_tsvector('english', COALESCE(error, '')) || to_tsvector('english', COALESCE(array_to_string(labels, ' '), ''))))  );;
ERROR:  functions in index expression must be marked IMMUTABLE

# \df+ array_to_string
                                                                                                                    List of functions
   Schema   |      Name       | Result data type | Argument data types  | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language |   Internal name    |                              Description
------------+-----------------+------------------+----------------------+------+------------+----------+----------+----------+-------------------+----------+--------------------+------------------------------------------------------------------------
 pg_catalog | array_to_string | text             | anyarray, text       | func | stable     | safe     | postgres | invoker  |                   | internal | array_to_text      | concatenate array elements, using delimiter, into text
 pg_catalog | array_to_string | text             | anyarray, text, text | func | stable     | safe     | postgres | invoker  |                   | internal | array_to_text_null | concatenate array elements, using delimiter and null string, into text
(2 rows)

I was curious to see how large the index would be. Original database size:

# select pg_size_pretty(pg_total_relation_size('good_jobs'));
 pg_size_pretty
----------------
 109 MB

After adding the index, but without labels:

# select pg_size_pretty(pg_total_relation_size('good_jobs'));
 pg_size_pretty
----------------
 159 MB
# select count(*) from good_jobs;
 count
--------
 133982

We don't use labels yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

Successfully merging a pull request may close this issue.

1 participant