qiita.slurm_resource_allocations insertion #3415

Gossty · 2024-06-05T18:14:00Z

No description provided.

- Added functionality of updating resource allocation data into db in qiita_db/util.py - Added tests for the added functionality in qiita_db/test/test_util.py - Moved MaxRSS_helper function from qiita_core/util.py to qiita_db/util.py. - Moved MaxRSS_helper test from qiita_core/tests/test_util.py to qiita_db/test/test_util.py

…est-util-db

wasade · 2024-06-05T19:23:47Z

qiita_db/util.py

+    sacct = ['sacct', '-p', '--format=JobID,ElapsedRaw,MaxRSS,Submit,Start,'
+             'CPUTimeRAW,ReqMem,AllocCPUs,AveVMSize', '-j']


Rather than what could be a very large number of calls to sacct, why not batch jobs and query ','.join(list_of_job_ids)? The primary concern is creating a high burden for the resource manager, and secondary is the overhead of system calls

I actually discussed this with @Gossty and this decision was made based on my previous experience with sacct and pgsql. As far as I was able to tell, sacct didn't create any burden to the resource manager - I think is because it is an actual database, which is not actually directly connected to the resource manager. Then in the past we have had issues with large inserts (vs. multiple single inserts) so I think basic inserts are the way to go here.

See the warning here

Thank you for the link; if I read this correctly, smaller/simpler calls are preferred (a single job) vs. larger multiple jobs; no?

"That database needs to be modified constantly as jobs start and complete, so we don't want it tied up answering sacct queries."

As you know, SQL engines are really good at bulk queries as there is a lot of overhead for a single query.

wasade · 2024-06-05T19:32:43Z

qiita_db/util.py

+    for index, row in df.iterrows():
+        with qdb.sql_connection.TRN:
+            sql = """
+                INSERT INTO qiita.slurm_resource_allocations (


The resources used are sensitive to extrinsic factors like other work on the system or cluster. Scoping to include that type of information carries a reasonable amount of complexity, so I don't think we want to attempt that. However, if the date the job was started on was tracked, then it would at least allow for excluding periods of known high burden that artificially increase resource use.

There are also intrinsic factors though that are important for interpretation of resource use, such as the node which ran the job, and ideally the model of processor (/proc/cpuinfo) on the node (as hostnames are not assured to be unique over time). It would be great to have information about the memory of the system but that may require sudo (see e.g., dmidecode) so could be out of scope. These values are important as our compute cluster is heterogeneous

@wasade, if I understand the comment correctly basically you are suggesting storing:
a. the node name
b. the submission timestamp
correct?

(a) yes (b) the job start time stamp, not submission and ideally (c) the actual CPU model on the node

Got it, thank you for the clarification. BTW why explicitly (c) vs. only (a)? Also, will sacct return (c)?

"... hostnames are not assured to be unique over time..." so just getting hte name isn't reliable over time. I'm not sure what information is or isn't easily available from slurm

coveralls · 2024-06-05T20:23:36Z

coverage: 92.797% (-0.05%) from 92.85%
when pulling 022cd7c on Gossty:test-util-db
into ffe1ec8 on qiita-spots:dev.

coveralls · 2024-06-13T21:12:31Z

coverage: 92.816% (-0.03%) from 92.85%
when pulling 61cb992 on Gossty:test-util-db
into ffe1ec8 on qiita-spots:dev.

coveralls · 2024-06-18T04:11:20Z

coverage: 92.816% (-0.03%) from 92.85%
when pulling 07fa018 on Gossty:test-util-db
into ffe1ec8 on qiita-spots:dev.

antgonza

A few requests. Please let me know if you would like to chat to clarify any of the requests.

antgonza · 2024-06-14T15:32:32Z

qiita_db/util.py

+    return y
+
+
+def update_resource_allocation_table(dates=['2023-04-28', '2023-05-05'],


Can you add a docstring explaining the parameters?

antgonza · 2024-06-18T15:13:26Z

qiita_db/util.py

+    sql_timestamp = """
+        SELECT
+            sra.job_start
+        FROM
+            qiita.slurm_resource_allocations sra
+        ORDER BY job_start DESC;
+    """


You could add LIMIT 1 so you only get 1 result; this will make the query faster.

antgonza · 2024-06-18T15:14:41Z

qiita_db/util.py

+        columns = ['timestamp']
+        timestamps = pd.DataFrame(res, columns=columns)
+        if len(timestamps['timestamp']) == 0:
+            dates[0] = datetime.strptime('2023-04-28', '%Y-%m-%d')
+        else:
+            dates[0] = str(timestamps['timestamp'].iloc[0]).split(' ')[0]
+        date0 = datetime.strptime(dates[0], '%Y-%m-%d')
+        date1 = date0 + timedelta(weeks=1)
+        dates[1] = date1.strftime('%Y-%m-%d')


Why convert to a dataframe? I mean, the database should already return a timestamp object that can be used

qiita_db/util.py

antgonza · 2024-06-18T15:18:11Z

qiita_db/util.py

+    sql_command = """
+            SELECT
+                pj.processing_job_id AS processing_job_id,
+                pj.external_job_id AS external_job_id
+            FROM
+                qiita.software_command sc
+            JOIN
+                qiita.processing_job pj ON pj.command_id = sc.command_id
+            JOIN
+                qiita.processing_job_status pjs
+                ON pj.processing_job_status_id = pjs.processing_job_status_id
+            LEFT JOIN
+                qiita.slurm_resource_allocations sra
+                ON pj.processing_job_id = sra.processing_job_id
+            WHERE
+                pjs.processing_job_status = 'success'
+            AND
+                pj.external_job_id ~ '^[0-9]+$'
+            AND
+                sra.processing_job_id IS NULL;
+        """


Note that this query will return all the rows, which will be a lot so it will be good to limit based on the job_id of the latests result in the previous block (what you are using to extract the time). Could you limit to just return anything newer than that job?

Would we determine whether the job is newer through external_id? I don't think qiita.processing_job stores information about when the job was submitted.

That's correct, the external id is an incremental integer so it can be used to find what's newer.

coveralls · 2024-06-18T16:44:23Z

coverage: 92.816% (-0.03%) from 92.85%
when pulling 7c848bc on Gossty:test-util-db
into ffe1ec8 on qiita-spots:dev.

antgonza

Looks great, thank you!

Gossty added 12 commits April 29, 2024 17:28

Update to DB qiita.slurm_resource_allocations

88dc031

Merge branch 'dev' of https://github.com/qiita-spots/qiita into dev

447e01a

connected tests to database

4578086

Update util.py

e357512

debugging changes to test

42c1974

Update test_util.py

aa708a2

Update test_util.py

933bdde

Tests update

8102d3b

Update test_meta_util.py

900d915

Updates to @antgonza comments

3f70bc1

Updates to @charles-cowart comments

4fc8735

Gossty changed the title ~~qiita.slurm_resource_allocations insertion into DB~~ qiita.slurm_resource_allocations insertion Jun 5, 2024

Gossty and others added 3 commits June 5, 2024 11:35

Merge branch 'dev' into test-util-db

3cb0e87

Debug Memory Model

f9d678d

Merge branch 'test-util-db' of https://github.com/Gossty/qiita into t…

b77ef2e

…est-util-db

wasade reviewed Jun 5, 2024

View reviewed changes

Update test_util.py

022cd7c

Gossty added 3 commits June 13, 2024 11:13

Added columns to slurm_resource_allocation table

beba0ca

Update test_processing_job.py

e004f2a

Update test_util.py

61cb992

Automatic date adjustment in utill.py

07fa018

antgonza requested changes Jun 18, 2024

View reviewed changes

Update util.py

7c848bc

antgonza approved these changes Jun 19, 2024

View reviewed changes

antgonza merged commit cca6de7 into qiita-spots:dev Jun 19, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qiita.slurm_resource_allocations insertion #3415

qiita.slurm_resource_allocations insertion #3415

Gossty commented Jun 5, 2024

wasade Jun 5, 2024

antgonza Jun 5, 2024

wasade Jun 5, 2024

antgonza Jun 5, 2024

wasade Jun 5, 2024

wasade Jun 5, 2024

antgonza Jun 5, 2024

wasade Jun 5, 2024

antgonza Jun 5, 2024

wasade Jun 5, 2024

coveralls commented Jun 5, 2024

coveralls commented Jun 13, 2024

coveralls commented Jun 18, 2024

antgonza left a comment

antgonza Jun 14, 2024

antgonza Jun 18, 2024

antgonza Jun 18, 2024

antgonza Jun 18, 2024

Gossty Jun 18, 2024

antgonza Jun 18, 2024

coveralls commented Jun 18, 2024

antgonza left a comment

		sacct = ['sacct', '-p', '--format=JobID,ElapsedRaw,MaxRSS,Submit,Start,'
		'CPUTimeRAW,ReqMem,AllocCPUs,AveVMSize', '-j']

		return y


		def update_resource_allocation_table(dates=['2023-04-28', '2023-05-05'],

qiita.slurm_resource_allocations insertion #3415

qiita.slurm_resource_allocations insertion #3415

Conversation

Gossty commented Jun 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 5, 2024

coveralls commented Jun 13, 2024

coveralls commented Jun 18, 2024

antgonza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 18, 2024

antgonza left a comment

Choose a reason for hiding this comment