Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#144): additional configuration for cht-user-management #146

Merged
merged 3 commits into from
Mar 19, 2025

Conversation

kennsippell
Copy link
Member

@kennsippell kennsippell commented Mar 12, 2025

#144

cc @ernestoteo since he was working on UMT grafana monitoring

@kennsippell kennsippell changed the title Additional Configuration - cht-user-management feat(#144): additional configuration for cht-user-management Mar 12, 2025
@@ -0,0 +1,7 @@
scrape_configs:
- job_name: cht-user-management-metrics
metrics_path: /metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default path is /metrics so we can drop this for simplicity:

Suggested change
metrics_path: /metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per other comment, still seeing metrics_path: /metrics in there - so assume local changes weren't pushed

@mrjones-plip
Copy link
Contributor

mrjones-plip commented Mar 13, 2025

@kennsippell - yay! Love the simplicity of the request here - just enough, not too much. Thanks for opening the PR!

That said there's a disconnect in that it assumes metrics are on /metrics and they're actually on two different URLs of /bullmq-metrics AND /fastify-metrics. If it's not too late (I know, UMT code as already shipped), I suggest we actually make two changes to the upstream UMT repo to compliment this watchdog PR:

  1. concat both outputs to one URL instead having two separate ones
  2. make that URL be /metrics

The reason is that the two URLs introduce a lot of complexity in how you monitor them. Further, Prometheus assumes that you're going to use /metrics and it's a minor hassle to keep track of which custom URLs apps use when then deviate from the norm.

(sorry - forgot to include this summary on my request for changes)

@kennsippell
Copy link
Member Author

concat both outputs to one URL instead having two separate ones

I didn't realise this was the best practice for prometheus, but you're right. I'll make that change.

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey again! I wanted to actually test the dashboard you created in the PR. Once installed, the dashboard works great - nice work!

That said, I had some issues getting the compose file to work with the new dashboard json file and made a suggestion on how to simplify things.

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello again! I did some more testing to ensure two things:

  1. system scales with N instances
  2. all panels are getting data

system scales with N instances

For # 2 - works great! I stood up a second UMT instance on port 3501 and put it in the scrape.yml like this:

scrape_configs:
  - job_name: cht-user-management-metrics
    metrics_path: /fastify-metrics
    static_configs:
      - targets:
         - 172.17.0.1:3500
         - 172.17.0.1:3501

And I was able to see the two discretely in watchdog:

image

However, this made me realize that we're going to be editing the scrape.yml by definition to add which ever instance we want to monitor. This means if there's a change to it upstream (like metrics_path: /fastify-metrics -> metrics_path: /metrics) there will be a conflict in this file and you won't be able to do a git pull origin or similar call.

I suggest we move scrape.yml file to be something like ./exporters/cht-user-management/scrape-dist.yml and then we instruct people to copy it.

all panels are getting data

There's a number of panels that are not getting data. The top half is missing data for:

  1. Transferred
  2. Transfer Rate
  3. Number of Open connections

image

On the bottom half:

  1. Response Latency (90th Percentile), Top 5 request Duration and Top 5 request count all only ever showed /fastify-metrics . In a worst case this could mean that all the stats that the /fastify-metrics URL is gathering ONLY show data for the /fastify-metrics endpoint? In a best case it's just these 3 panels. We should double check that all these stats monitor the entire holistic UMT app and not just the one Fastify endpiont
  2. Top Error Duration has no data - this may be correct because there's no errors?
  3. Top Error Count has no data - this may be correct because there's no errors?
  4. Response Size has no data

image

@kennsippell
Copy link
Member Author

I suggest we move scrape.yml file to be something like ./exporters/cht-user-management/scrape-dist.yml and then we instruct people to copy it

I had been proposing here scrape-custom.yml to match the naming and setup procedure from the postgres exporter.

Further, Prometheus assumes that you're going to use /metrics and it's a minor hassle to keep track of which custom URLs apps use

Ready here medic/cht-user-management#278

all panels are getting data

There appears to be a bug in fastify-metrics around default argument values. Workaround is that we need to explicitly set defaults. I also think we were setting summary: false which I don't think is necessary. Good catch thanks!

@mrjones-plip
Copy link
Contributor

I had been proposing here scrape-custom.yml to match the naming and setup procedure from the postgres exporter.

Ah - sure enough - we do indeed tell folks to edit the revision controlled files. Ok - let's fix this another time then. OK to leave as is for now!

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have local changes not yet pushed! Spot checking, I see nothing has changed.

@@ -0,0 +1,7 @@
scrape_configs:
- job_name: cht-user-management-metrics
metrics_path: /metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per other comment, still seeing metrics_path: /metrics in there - so assume local changes weren't pushed

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! Just one quick file name change to match things up.

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go! Thanks for iterating with me.

We should merge this and the other UMT PR!

@kennsippell kennsippell merged commit 8cf360b into main Mar 19, 2025
5 checks passed
@kennsippell kennsippell deleted the 144-umt branch March 19, 2025 08:57
@medic-ci
Copy link
Collaborator

🎉 This PR is included in version 1.18.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants