7
7
msgstr ""
8
8
"Project-Id-Version : InternLM \n "
9
9
"Report-Msgid-Bugs-To : \n "
10
- "POT-Creation-Date : 2024-08-30 16:07 +0800\n "
10
+ "POT-Creation-Date : 2024-11-20 15:01 +0800\n "
11
11
"PO-Revision-Date : YEAR-MO-DA HO:MI+ZONE\n "
12
12
"Last-Translator : FULL NAME <EMAIL@ADDRESS>\n "
13
13
"Language : en\n "
@@ -16,7 +16,7 @@ msgstr ""
16
16
"MIME-Version : 1.0\n "
17
17
"Content-Type : text/plain; charset=utf-8\n "
18
18
"Content-Transfer-Encoding : 8bit\n "
19
- "Generated-By : Babel 2.15 .0\n "
19
+ "Generated-By : Babel 2.14 .0\n "
20
20
21
21
#: ../../source/monitor.rst:2
22
22
msgid "监控和告警"
@@ -56,25 +56,12 @@ msgstr ""
56
56
"``internlm.monitor.alert.send_feishu_msg_with_webhook()``."
57
57
58
58
#: ../../source/monitor.rst:25
59
- msgid "轻量监控 "
60
- msgstr "Light Monitoring "
59
+ msgid "监控告警配置 "
60
+ msgstr "Monitor Config "
61
61
62
- #: ../../source/monitor.rst:27
62
+ #: ../../source/monitor.rst:28
63
63
msgid ""
64
- "InternEvo轻量级监控工具采用心跳机制实时监测训练过程中的各项指标,如loss、grad_norm、训练阶段的耗时等。同时,InternEvo还可以通过"
65
- " `grafana dashboard <https://grafana.com/grafana/dashboards/>`_ "
66
- "直观地呈现这些指标信息,以便用户进行更加全面和深入的训练分析。"
67
- msgstr ""
68
- "The InternEvo light monitoring tool employs a heartbeat mechanism to "
69
- "real-time monitor various metrics during the training process, such as "
70
- "loss, grad_norm, and training phase duration. Additionally, InternEvo can"
71
- " present these metric details through a `grafana dashboard "
72
- "<https://grafana.com/grafana/dashboards/>`_, allowing users to conduct "
73
- "more comprehensive and in-depth training analysis in an intuitive manner."
74
-
75
- #: ../../source/monitor.rst:29
76
- msgid ""
77
- "轻量监控的配置由配置文件中的 ``monitor`` 字段指定, 用户可以通过修改配置文件 `config file "
64
+ "配置由配置文件中的 ``monitor`` 字段指定, 用户可以通过修改配置文件 `config file "
78
65
"<https://github.com/InternLM/InternEvo/blob/develop/configs/7B_sft.py>`_ "
79
66
"来更改监控配置。以下是一个监控配置的示例:"
80
67
msgstr ""
@@ -84,23 +71,17 @@ msgstr ""
84
71
"<https://github.com/InternLM/InternEvo/blob/develop/configs/7B_sft.py>`_."
85
72
" Here is an example of a monitoring configuration:"
86
73
87
- #: ../../source/monitor.rst:42
74
+ #: ../../source/monitor.rst:40
88
75
msgid "enable_feishu_alert (bool):是否启用飞书告警。默认值:False。"
89
76
msgstr "enable_feishu_alert: Whether to enable Feishu alerts. Defaults: False."
90
77
91
- #: ../../source/monitor.rst:43
78
+ #: ../../source/monitor.rst:41
92
79
msgid "feishu_alert_address (str):飞书告警的 Webhook 地址。默认值:None。"
93
80
msgstr ""
94
81
"feishu_alert_address: The webhook address for Feishu alerts. Defaults: "
95
82
"None."
96
83
97
- #: ../../source/monitor.rst:44
98
- msgid "light_monitor_address (str):轻量监控的地址。默认值:None。"
99
- msgstr ""
100
- "light_monitor_address: The address for lightweight monitoring. Defaults: "
101
- "None."
102
-
103
- #: ../../source/monitor.rst:45
84
+ #: ../../source/monitor.rst:42
104
85
msgid "alert_file_path (str):告警存储路径。默认值:None。"
105
86
msgstr "alert_file_path: path of alert. Defaults: None."
106
87
@@ -213,60 +194,3 @@ msgstr "alert_file_path: path of alert. Defaults: None."
213
194
214
195
#~ msgid "示例"
215
196
#~ msgstr "Example"
216
-
217
- #~ msgid ""
218
- #~ "Initialize the monitoring module with "
219
- #~ "the default address ``initialize_light_monitor()``"
220
- #~ msgstr ""
221
-
222
- #~ msgid "Send a heartbeat message to a monitoring server."
223
- #~ msgstr ""
224
-
225
- #~ msgid ""
226
- #~ "The type of heartbeat message, e.g., "
227
- #~ "\"train_metrics\", \"init_time\", \"stage_time\"."
228
- #~ msgstr ""
229
-
230
- #~ msgid "A dictionary containing message data to be included in the heartbeat."
231
- #~ msgstr ""
232
-
233
- #~ msgid ""
234
- #~ "Sending a heartbeat message for training"
235
- #~ " metrics ``send_heartbeat(\"train_metrics\", {\"loss\":"
236
- #~ " 0.1, \"accuracy\": 0.95})``"
237
- #~ msgstr ""
238
-
239
- #~ msgid ""
240
- #~ "Sending a heartbeat message for "
241
- #~ "initialization time ``send_heartbeat(\"init_time\", "
242
- #~ "{\"import_time\": 0.25})``"
243
- #~ msgstr ""
244
-
245
- #~ msgid ""
246
- #~ "Sending a heartbeat message for stage"
247
- #~ " time ``send_heartbeat(\"stage_time\", {\"fwd_time\":"
248
- #~ " 2.3, \"bwd_time\": 6.2})``"
249
- #~ msgstr ""
250
-
251
- #~ msgid ""
252
- #~ "InternEvo 使用 "
253
- #~ "``internlm.monitor.alert.initialize_light_monitor`` "
254
- #~ "来初始化轻量监控客户端。一旦初始化完成,它会建立与监控服务器的连接。在训练过程中,使用 "
255
- #~ "``internlm.monitor.alert.send_heartbeat`` "
256
- #~ "来发送不同类型的心跳信息至监控服务器。监控服务器会根据这些心跳信息来检测训练是否出现异常,并在需要时发送警报消息。"
257
- #~ msgstr ""
258
- #~ "InternEvo uses "
259
- #~ "``internlm.monitor.alert.initialize_light_monitor`` to "
260
- #~ "initialize the lightweight monitoring client."
261
- #~ " Once initialization is complete, it "
262
- #~ "establishes a connection with the "
263
- #~ "monitoring server. During the training "
264
- #~ "process, it uses "
265
- #~ "``internlm.monitor.alert.send_heartbeat`` to send "
266
- #~ "various types of heartbeat messages to"
267
- #~ " the monitoring server. The monitoring "
268
- #~ "server uses these heartbeat messages to"
269
- #~ " detect if the training encounters "
270
- #~ "any abnormalities and sends alert "
271
- #~ "messages as needed."
272
-
0 commit comments