Skip to content

Commit 27e9998

Browse files
authored
katib metrics-collector: mention supported writers (#3999)
* katib metrics-collector: mention supported writers See kubeflow/katib#2467 Signed-off-by: Gary Miguel <[email protected]> * add 'metrics' word Signed-off-by: Gary Miguel <[email protected]> --------- Signed-off-by: Gary Miguel <[email protected]>
1 parent 8ad90c5 commit 27e9998

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

content/en/docs/components/katib/user-guides/metrics-collector.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Before running your hyperparameter tuning Katib Experiment with Python SDK,
1212
ensure the namespace label `katib.kubeflow.org/metrics-collector-injection: enabled`
1313
is present. This label enables the sidecar container injection for pull-based metrics collectors to collect metrics during the experiment.
1414

15-
You can configure the namespace by adding the following label `katib.kubeflow.org/metrics-collector-injection: enabled`
15+
You can configure the namespace by adding the following label `katib.kubeflow.org/metrics-collector-injection: enabled`
1616
as is shown in the sample code:
1717

1818
```yaml
@@ -44,7 +44,7 @@ define how Katib should collect the metrics from each Trial, such as the accurac
4444

4545
## Pull-based Metrics Collector
4646

47-
Your training code can record the metrics into `StdOut` or into arbitrary output files.
47+
Your training code can record the metrics into `StdOut` or into arbitrary output files.
4848

4949
To define the pull-based metrics collector for your Experiment:
5050

@@ -73,6 +73,9 @@ To define the pull-based metrics collector for your Experiment:
7373

7474
- `TensorFlowEvent`: Katib collects the metrics from a directory path
7575
containing a [tf.Event](https://www.tensorflow.org/api_docs/python/tf/compat/v1/Event).
76+
These are typically written by [tensorflow.summary](https://www.tensorflow.org/api_docs/python/tf/summary).
77+
As of Katib 0.18, [torch.utils.tensorboard](https://pytorch.org/docs/stable/tensorboard.html) or
78+
[tensorboardX](https://tensorboardx.readthedocs.io/en/latest/index.html) may also be used to write metrics.
7679
You should specify the path in the `.source.fileSystemPath.path` field. Check the
7780
[TFJob example](https://github.com/kubeflow/katib/blob/ea46a7f2b73b2d316b6b7619f99eb440ede1909b/examples/v1beta1/kubeflow-training-operator/tfjob-mnist-with-summaries.yaml#L17-L23).
7881
The default directory path is `/var/log/katib/tfevent/`.
@@ -110,10 +113,10 @@ To define the pull-based metrics collector for your Experiment:
110113

111114
## Push-based Metrics Collector
112115

113-
Your training code needs to call [`report_metrics()`](https://github.com/kubeflow/katib/blob/e251a07cb9491e2d892db306d925dddf51cb0930/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics.
114-
The `report_metrics()` function works by parsing the metrics in `metrics` field into a gRPC request, automatically adding the current timestamp for users, and sending the request to Katib DB Manager.
116+
Your training code needs to call [`report_metrics()`](https://github.com/kubeflow/katib/blob/e251a07cb9491e2d892db306d925dddf51cb0930/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics.
117+
The `report_metrics()` function works by parsing the metrics in `metrics` field into a gRPC request, automatically adding the current timestamp for users, and sending the request to Katib DB Manager.
115118

116-
But before that, `kubeflow-katib` package should be installed in your training container.
119+
But before that, `kubeflow-katib` package should be installed in your training container.
117120

118121
To define the push-based metrics collector for your Experiment, you have two options:
119122

@@ -146,7 +149,7 @@ To define the push-based metrics collector for your Experiment, you have two opt
146149
max_trial_count=2,
147150
metrics_collector_config={"kind": "Push"},
148151
# When SDK is released, replace it with packages_to_install=["kubeflow-katib==0.18.0"].
149-
# Currently, the training container should have `git` package to install this SDK.
152+
# Currently, the training container should have `git` package to install this SDK.
150153
packages_to_install=["git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1"],
151154
)
152155
```

0 commit comments

Comments
 (0)