Skip to content

handling no jobs in a QoS in identify_problems #11

@paciorek

Description

@paciorek

In this stanza: if pending_job['REASON'] in ['QOSGrpNodeLimit', 'QOSGrpCpuLimit', 'QOSGrpGRES']: in identify_problems, if there are no running jobs in the QoS then we get an error. I.e., a job can't run because of, say, QOSGrpCpuLimit but somehow no other jobs are in the QoS.

This should never happen but was happening for a user, perhaps related to Slurm issues after a downtime.

 Traceback (most recent call last):
 File "sq.py", line 527, in
 File "sq.py", line 466, in display_queued_jobs
 File "pandas/core/frame.py", line 7547, in apply
 File "pandas/core/apply.py", line 180, in get_result
 File "pandas/core/apply.py", line 255, in apply_standard
 File "pandas/core/apply.py", line 284, in apply_series_generator
 File "sq.py", line 418, in inner
 TypeError: sequence item 0: expected str instance, float found

Here's the view from pdb:

Traceback (most recent call last):
  File "sq.py", line 530, in <module>
    if slurm_info.has_current_jobs(username) and (not args.all_jobs):
  File "sq.py", line 470, in display_queued_jobs
    df['PROBLEMS'] = df.apply(identify_problems(slurm_info), axis=1)
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/frame.py", line 7547, in apply
    return op.get_result()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 180, in get_result
    return self.apply_standard()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 255, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 284, in apply_series_generator
    results[i] = self.f(v)
  File "sq.py", line 422, in inner
    qos_running_jobs_str = ', '.join(qos_running_jobs['JOBID'] + ' (' + qos_running_jobs.apply(lambda x: filter_keys(qos_resource_limit, parse_tres_queue_job(x)), axis=1).apply(display_grp_tres) + ')')
TypeError: sequence item 0: expected str instance, float found

There's a problem with the .join because of the exact structure of the inputs when there are no jobs.

I have a freeze directory for this example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions