Skip to content

Conversation

@kanecko
Copy link
Contributor

@kanecko kanecko commented Nov 19, 2025

To ease and accelerate the PR submission, please use the template below to provide all required information.
Please see our "Contributing to Rockstor documentation" in our Rockstor's documentation

Fixes #572

This pull request's proposal

An installation and first dashboard set-up for the Observability Starter Kit rock-on

Checklist

  • With the proposed changes no Sphinx errors or warnings are generated.
  • I have added my name to the AUTHORS file, if required (descending alphabetical order).

This document provides an overview of the Observability Starter Kit rock-on, including installation instructions, prerequisites, and component descriptions.
Added detailed instructions for setting up the Observability Starter Kit, including user and group creation, share configuration, and Grafana dashboard setup.
Updated the observability starter kit documentation for clarity and consistency, including changes to image widths and text phrasing.
@kanecko kanecko marked this pull request as ready for review November 20, 2025 09:30
@phillxnet
Copy link
Member

phillxnet commented Nov 20, 2025

@kanecko Just a quick note here: there seems to be a new broken link in these additions:

(interface/docker-based-rock-ons/observability-starter-kit: line  103) broken    http://osk-victoria-metrics:8428 - HTTPConnectionPool(host='osk-victoria-metrics', port=8428): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f85373d2c30>: Failed to resolve 'osk-victoria-metrics' ([Errno -3] Temporary failure in name resolution)"))

[EDIT] I've now addressed the prior failure of this sort, i.e. an example URL being picked up as a real one and failing our link check. Take a look at the following PR to see who we look to have standardized on formatting for this type of example URL:

I.e. we code format then. Above PR as the details of the fix now required in this branch.

@phillxnet
Copy link
Member

@kanecko It loos like we now have a timeout error with one of the new links introcuded in this branch:

  • (interface/docker-based-rock-ons/observability-starter-kit: line 298) timeout https://opentelemetry.io/community/ - HTTPSConnectionPool(host='opentelemetry.io', port=443): Read timed out. (read timeout=10)

Could you take a look at that? Thanks for merging in the existing fix re the zoneminder doc entry by the way. I.e. does this also show up when you run the linkcheck locally?

@kanecko
Copy link
Contributor Author

kanecko commented Nov 21, 2025

nope, must be an intermittent problem:
(interface/docker-based-rock-ons/observability-starter-kit: line 298) ok https://opentelemetry.io/community/

@phillxnet phillxnet added the needs review must have clean Sphinx build and readability in html label Nov 22, 2025
Copy link
Member

@Hooverdan96 Hooverdan96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great writeup for setup and a tutorial on how to do one's first dashboard! I left a few comments/suggestions below.

#. Under the *SYSTEM* menu, select the *Users* menu item.
#. Add two users: osk-grafana and osk-opentelemetry. Assign the group with the same name to each of them.
#. Under the *STORAGE* menu, select the *Shares* menu item.
#. Add three shares: osk-opentelemetry-collector, osk-victoria-metrics, osk-grafana
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the example in the Rockon installation wizard is different compared to what you have here in the documentation: osk-opentelemetry-collector.

Image

Not a big deal, but since the other two are aligned with your example in the Rockon definition, it might make sense to harmonize this one as well (either change it here or in the Rockon definition).

Comment on lines +63 to +64
#. Example*: :code:`chown -R 472:472 /mnt2/osk-grafana`
#. Example*: :code:`chown -R 10001:10001 /mnt2/osk-opentelemetry-collector`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is changing the ownership setting in the WebUI after creating the share is not sufficient? Because now the UI information and the actual ownership on the disk is different from each other. Not a huge deal, but want to make sure that there was not a specific reason it had to be done via the command line?

#. Once logged-in, click on **Data sources**. You can find it on the left-side menu, below *Connections*.
#. Click on the big **Add data source** button.
#. Find and select "VictoriaMetrics" in the list.
#. In the HTTP **URL** field, input :code:`http://osk-victoria-metrics:`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. In the HTTP **URL** field, input :code:`http://osk-victoria-metrics:`.
#. In the HTTP **URL** field, input :code:`http://osk-victoria-metrics:8428`.

That's the only way I was able to save & test the data source.

#. Select the "Code" option. You will find it at the bottom, next to the "Run queries" button.
#. Replace the current metric with the following expression: ``rate(system.disk.io[$__interval])``
#. Change the unit from "Data / bytes (IEC)" to "Data rate / bytes/sec(IEC)"
#. Scroll up to the "Tooltip" section, and select the "All" option, and then the "Descending" option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. Scroll up to the "Tooltip" section, and select the "All" option, and then the "Descending" option.
#. Scroll up to the "Tooltip" section, and select the "All" option, and then the "Descending" option.
#. Click on the "Run queries" button to update the dashboard, if it hasn't automatically already.

Just to be consistent

The time window in this case is denoted by ``[$__interval]`` which is bound to the time range
that you can select above the chart (next to the "Refresh" button).

The changed tooltip mode has enabled us to mouse-over the chart to see a list of current values of all devices.
Copy link
Member

@Hooverdan96 Hooverdan96 Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is not a nvme based system, my legend has some awfully long names. Is there a setting that I missed to reduce that to device name only?

Image

I guess, when changing the label in the query options from auto to custom and something like: {{device}}.

Image

Might be worthwhile to add that in as well to reflect the label format you are showing in the screenshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs review must have clean Sphinx build and readability in html

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Observability Starter Kit writeup

3 participants