Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加对节点状态的变更的一个事件处理流程 #191

Open
Ashlee1994 opened this issue Sep 28, 2023 · 1 comment
Open

添加对节点状态的变更的一个事件处理流程 #191

Ashlee1994 opened this issue Sep 28, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Ashlee1994
Copy link
Member

Ashlee1994 commented Sep 28, 2023

需要添加一个对节点状态变更的修改功能,以及将对节点状态变更信息存储到数据库中。

需要实现以下功能:

  1. 修改节点状态,如果节点状态改为drain或者down之后,不会有任务提交到该节点
  2. 恢复节点状态,节点可以恢复为idle状态,重新进入任务资源池,可以运行任务
  3. 查看节点不可以运行作业的原因,在drain,down状态下不可以运行作业,显示原因

具体修改方式参考下图:
image

@Ashlee1994
Copy link
Member Author

需要在目前系统中创建一个event表,可以参考slurm表格设计:

slurm event_table: 记录集群节点事件信息
{ "time_start", "bigint unsigned not null" }, Start of period
{ "time_end", "bigint unsigned default 0 not null" }, End of period
{ "node_name", "tinytext default '' not null" }, Name of node (only set in a node event)
{ "cluster_nodes", "text not null default ''" }, node list in cluster during time period (only set in a cluster event)
{ "reason", "tinytext not null" }, reason node is in state during time period (only set in a node event)
{ "reason_uid", "int unsigned default 0xfffffffe not null" }, uid of that who set the reason
{ "state", "int unsigned default 0 not null" }, State of node during time period (only set in a node event)
{ "tres", "text not null default ''" }, TRES touched by this event

通过sacctmgr show event指令可以查看系统历史所有event信息:
image

@Ashlee1994 Ashlee1994 changed the title 添加一个事件处理流程,包含对节点状态的变更 添加对节点状态的变更的一个事件处理流程 Sep 28, 2023
@L-Xiafeng L-Xiafeng added enhancement New feature or request and removed NewFeatures labels Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants