-
-
Notifications
You must be signed in to change notification settings - Fork 292
export RQ statistics as prometheus metrics #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
288bbf5
to
30e3eb4
Compare
79f6394
to
70f07f8
Compare
de7de29
to
84c84bf
Compare
django_rq/metrics_collector.py
Outdated
|
||
with self.summary.time(): | ||
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename rq_workers_success
to successful_job_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the names should all be prefixed by "rq_", but I agree this is about the jobs not the workers so I renamed it to rq_job_successful_total
. I used that prefix to keep both success/failure counts to be prefixed by "rq_job_" and used "total" instead of "count" because Prometheus' naming conventions suggest that the unit should be a suffix and "total" is preferred over count:
See: https://prometheus.io/docs/practices/naming/ & prometheus/docs#2465 (comment)
django_rq/metrics_collector.py
Outdated
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) | ||
rq_workers_failed = CounterMetricFamily('rq_workers_failed', 'RQ workers fail count', labels=['name', 'queues']) | ||
rq_workers_working_time = CounterMetricFamily('rq_workers_working_time', 'RQ workers spent seconds', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total_working_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reasons in the thread #666 (comment) I went with rq_working_seconds_total
I don't mind merging this in, but could you document this feature? A simple unit test to show that this view works would also be great. |
84c84bf
to
8066be6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind merging this in, but could you document this feature? A simple unit test to show that this view works would also be great.
Sorry for the delay getting this updated, but here it is. The tests took a bit of finagling to get working, but I hope this is acceptable (running tests twice).
You can let me know if you have a strong preference about the metric names, but I reviewed what I put and adjusted them to match the Prometheus naming conventions (they were close but could use some updating), and I factored in the comments you made.
django_rq/metrics_collector.py
Outdated
|
||
with self.summary.time(): | ||
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the names should all be prefixed by "rq_", but I agree this is about the jobs not the workers so I renamed it to rq_job_successful_total
. I used that prefix to keep both success/failure counts to be prefixed by "rq_job_" and used "total" instead of "count" because Prometheus' naming conventions suggest that the unit should be a suffix and "total" is preferred over count:
See: https://prometheus.io/docs/practices/naming/ & prometheus/docs#2465 (comment)
django_rq/metrics_collector.py
Outdated
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) | ||
rq_workers_failed = CounterMetricFamily('rq_workers_failed', 'RQ workers fail count', labels=['name', 'queues']) | ||
rq_workers_working_time = CounterMetricFamily('rq_workers_working_time', 'RQ workers spent seconds', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reasons in the thread #666 (comment) I went with rq_working_seconds_total
3dda38a
to
0463893
Compare
It looks like the test failures are related to the fix I added in #700 |
@terencehonles sorry I released version 3.0.1 in a rush because Django 5.2 is out and it's breaking admin pages in various projects. Do you mind pulling in master to this PR? I'll make a follow up release for this feature. |
0463893
to
10192f5
Compare
No worries, I understand. I rebased just now |
It looks like #704 broke the workflow file, I can possibly look at it tomorrow, but will be offline for the night |
10192f5
to
de17a2d
Compare
Looks like it was actually a bad merge on git's part since the indentation of the workflow file changed and the line it was reporting wan't the line that broke the indentation it was the container of that line. |
de17a2d
to
39c9999
Compare
39c9999
to
5ebc28c
Compare
The test failures appear to be because of redis/redis-py#3614 / redis/redis-py#3622 and I fixed with #706, I had to update my test environment locally in order to reproduce the error. |
django_rq/views.py
Outdated
@@ -49,6 +58,25 @@ def stats_json(request, token=None): | |||
) | |||
|
|||
|
|||
@never_cache | |||
@staff_member_required | |||
def prometheus_metrics(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind moving this to a different view file? (probably stats_views.py
). Since views.py
is becoming large, I want to split the current views.py
into workers_view.py
, queues_views.py
etc.
This change follows the implementation of
rq_exporter
and the prometheus client to export the RQ statistics as a Django view.fixes: #503