-
Notifications
You must be signed in to change notification settings - Fork 0
Add ESMF profiling tool #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
minghangli-uni
wants to merge
15
commits into
main
Choose a base branch
from
add-ESMF_Profiling_iss38
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
007ca4f
Add ESMF profiling tool
minghangli-uni 7a0b2e5
Code format
minghangli-uni 5fdbfdb
Add README.md
minghangli-uni db09a87
Update README.md to include instructions for generating ESMF profilin…
minghangli-uni 3761c9d
Add try except, where esmf files are not ready, preventing errors but…
minghangli-uni 95b576f
remove unused lines
minghangli-uni 4be6f4d
Support prefix-based matching for regionNames in runtime collection.
minghangli-uni ae45de4
Add comments for the provided example.
minghangli-uni bd5a276
Update README.md
minghangli-uni d72b3d9
code format
minghangli-uni 922c63f
Update ESMF path for the provided example test
minghangli-uni 59f6666
Update ESMF_profiling/esmfFileParser.py
minghangli-uni 7f884b3
Update ESMF_profiling/esmfFileParser.py
minghangli-uni 3cda772
Update ESMF_profiling/esmfRegion.py
minghangli-uni 5d19323
Fix bug: in case ESMF profiling data contains only a single space sep…
minghangli-uni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# ESMF Profiling tool | ||
The **ESMF Profiling tool** is a Python-based tool designed to read and process performance profile data from ESMF profiling log files for **ACCESS-OM3**. It provides a structured way to extract hierachical timing and computational stats for various regions within ESMF log files, enabling detailed performance analysis. | ||
|
||
## Directory structure | ||
``` | ||
├── esmfFileParser.py | ||
├── esmfRegion.py | ||
└── README.md | ||
``` | ||
|
||
### Components: | ||
1. `esmfFileParser.py`: | ||
- handles the input ESMF profile files. | ||
- constructs the hierarchical data structure. | ||
- outputs runtimes for specific regions. | ||
- supports both exact and prefix matching for `regionNames` in runtime collection - an example is provided in this script. | ||
2. `esmfRegion.py`: | ||
- defines the ESMFRegion class. | ||
- represents individual ESMF regions, capturing performance metrics. | ||
|
||
The tool supports two profiling data formats, where metrics are included in the brackets for indexing: | ||
1. `ESMF_Profile.xxxx`: (0. count, 1. total, 2. self_time, 3. mean , 4. min_time, 5. max_time ) | ||
2. `ESMF_Profile.summary`: (0. count, 1. PETs , 2. mean , 3. min_time, 4. max_time, 5. min_PET, 6. max_PET) | ||
|
||
## Usage | ||
Before using the scripts, ensure that ESMF profiling log files are generated first. To enable profiling, simply set the `ESMF_RUNTIME_PROFILE` and `ESMF_RUNTIME_PROFILE_OUTPUT` options in the `config.yaml` file for the desired configuration: | ||
```yaml | ||
env: | ||
ESMF_RUNTIME_PROFILE: "ON" | ||
ESMF_RUNTIME_PROFILE_OUTPUT: "TEXT SUMMARY" | ||
``` | ||
where `TEXT` collects profiling logs per PET, and `SUMMARY` collects profiling data in a single summary text file that aggregates timings over multiple PETs. Details can be found [here](https://earthsystemmodeling.org/docs/nightly/develop/ESMF_refdoc/node6.html#SECTION060140000000000000000). | ||
|
||
After running the configuration, specify the path to the profiling logs, along with any specific regions of interest. | ||
|
||
One example for demonstration: | ||
---------- | ||
```python | ||
# Collect runtime info for specific regions | ||
ESMF_path = ['/path/to/your/ESMF/output/files'] # '/g/data/tm70/ml0072/ESMF_Profile_test_example' | ||
regionNames=[ | ||
"[ESMF]", | ||
"[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[OCN] RunPhase1", | ||
"[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED]", | ||
] | ||
profile_prefix = 'ESMF_Profile.' | ||
esmf_summary = True # Set to True for summary profiling | ||
index = 2 # Choose the metric above to extract (e.g., mean time for summary profiling) | ||
runtime_totals = collect_runtime_tot(ESMF_path, | ||
regionNames=region_names, | ||
profile_prefix=profile_prefix, | ||
esmf_summary=esmf_summary, | ||
index=index) | ||
print(runtime_totals) | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,285 @@ | ||
#!/usr/bin/env python3 | ||
""" | ||
ESMF Profiling tool - OM3 | ||
The ESMF Profiling tool is a Python-based tool designed to read and process | ||
performance profile data from ESMF profiling log files. It provides a | ||
structured way to extract hierachical timing and computational stats for | ||
various regions within ESMF runs, enabling detailed performance analysis. | ||
|
||
1. esmfFileParser.py: | ||
- handles the input ESMF profile files. | ||
- constructs the hierarchical data structure. | ||
- outputs runtimes for specific regions. | ||
2. esmfRegion.py: | ||
- defines the ESMFRegion class. | ||
- represents individual ESMF regions, capturing performance metrics. | ||
|
||
The tool supports two profiling data formats: | ||
1. ESMF_Profile.xxxx: (0. count, 1. total, 2. self_time, 3. mean , 4. min_time, 5. max_time ) | ||
2. ESMF_Profile.summary: (0. count, 1. PETs , 2. mean , 3. min_time, 4. max_time, 5. min_PET, 6. max_PET) | ||
|
||
One example for demonstration: | ||
---------- | ||
# Collect runtime info for specific regions | ||
ESMF_path = ['/path/to/your/ESMF/output/files'] | ||
region_names = ['[ESMF]', '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[OCN] RunPhase1'] | ||
profile_prefix = 'ESMF_Profile.' | ||
esmf_summary = True # Set to True for summary profiling | ||
index = 2 # Choose the metric to extract (e.g., mean time for summary profiling) | ||
|
||
runtime_totals = collect_runtime_tot(ESMF_path, regionNames=region_names, profile_prefix=profile_prefix, esmf_summary=esmf_summary, index=index) | ||
print(runtime_totals) | ||
|
||
Latest version: xxx | ||
Author: Minghang Li | ||
Email: [email protected] | ||
License: Apache 2.0 License http://www.apache.org/licenses/LICENSE-2.0.txt | ||
""" | ||
|
||
|
||
# =========================================================================== | ||
import os | ||
import re | ||
import warnings | ||
from esmfRegion import ESMFRegion | ||
import time | ||
|
||
|
||
def collect_runtime_tot( | ||
ESMF_path, | ||
regionNames=["[ESMF]"], | ||
profile_prefix="ESMF_Profile.", | ||
esmf_summary=True, | ||
index=2, | ||
): | ||
|
||
runtime_tot = [] | ||
for i in range(len(ESMF_path)): | ||
subfiles_path = _list_esmf_files( | ||
ESMF_path[i], | ||
profile_prefix, | ||
esmf_summary, | ||
summary_profile="ESMF_Profile.summary", | ||
) | ||
esmf_region_all = [] | ||
for subfile in subfiles_path: | ||
with open(subfile, "r") as f: | ||
esmf_region = ESMFProfileTrees(esmf_summary).build_ESMF_trees(f) | ||
esmf_region_all.append(esmf_region) | ||
|
||
runtime = _region_time_consumption( | ||
regionNames, esmf_region_all, index, esmf_summary | ||
) | ||
runtime_tot.append(runtime) | ||
return runtime_tot | ||
|
||
|
||
def _list_esmf_files(dir_path, profile_prefix, esmf_summary, summary_profile): | ||
"""Lists ESMF files based on a prefix.""" | ||
try: | ||
files = os.listdir(dir_path) | ||
if not esmf_summary: | ||
# ESMF_Profile.xxxx | ||
matching_files = [ | ||
file | ||
for file in files | ||
if file.startswith(profile_prefix) and file != summary_profile | ||
] | ||
matching_files.sort(key=lambda x: int(x[len(profile_prefix) :])) | ||
else: | ||
# ESMF_Profile.summary | ||
matching_files = [summary_profile] | ||
|
||
matching_files_path = [ | ||
os.path.join(dir_path, matching_file) for matching_file in matching_files | ||
] | ||
except FileNotFoundError: | ||
warnings.warn(f"Directory {dir_path} does not exist! Skipping!") | ||
matching_files_path = [] | ||
return matching_files_path | ||
|
||
|
||
def _region_time_consumption(regionNames, esmf_region_all, index, esmf_summary): | ||
"""Calculates time consumption for specific regions""" | ||
runtime = {} | ||
|
||
for regionName in regionNames: | ||
region_values = [ | ||
_find_region_values(sub_ESMF_region, regionName, esmf_summary) | ||
for sub_ESMF_region in esmf_region_all | ||
] | ||
|
||
# Process each region value | ||
for region_value in region_values: | ||
for key, val in region_value.items(): | ||
if val and val[0] is not None: | ||
if key not in runtime: | ||
runtime[key] = [] | ||
runtime[key].append(val[index]) | ||
return runtime | ||
|
||
|
||
def _find_region_values(region, target_region, esmf_summary): | ||
""" | ||
Searches for all region values based on a hierarchical path prefix. | ||
""" | ||
target_parts = target_region.split("/") | ||
matches = {} | ||
default_nans = (None,) * 6 | ||
|
||
def search_region(region, path, current_path): | ||
"""Recursive helper function to search regions.""" | ||
if region.name.startswith(path[0]): | ||
if len(path) == 1: | ||
matches[current_path] = _get_region_data(region, esmf_summary) | ||
return | ||
|
||
# Continue searching in children if more path components remain | ||
for child in region.children: | ||
search_region(child, path[1:], f"{current_path}/{child.name}") | ||
|
||
def _get_region_data(region, esmf_summary): | ||
"""Returns the region data tuple based on esmf_summary setting.""" | ||
return ( | ||
( | ||
region.count, | ||
region.total, | ||
region.self_time, | ||
region.mean, | ||
region.min_time, | ||
region.max_time, | ||
) | ||
if not esmf_summary | ||
else ( | ||
region.count, | ||
region.PETs, | ||
region.mean, | ||
region.min_time, | ||
region.max_time, | ||
region.min_PET, | ||
region.max_PET, | ||
) | ||
) | ||
|
||
# recursive search | ||
search_region(region, target_parts, region.name) | ||
|
||
if not matches: | ||
print(f"No matches found for target prefix: {target_region}") | ||
return matches if matches else {f"{target_region}": default_nans} | ||
|
||
|
||
class ESMFProfileTrees(object): | ||
def __init__(self, esmf_summary=True): | ||
self.esmf_summary = esmf_summary | ||
|
||
def build_ESMF_trees(self, lines): | ||
"""Builds a hierarchical tree of ESMF regions.""" | ||
stack = [] | ||
esmf_region = None | ||
skip = False | ||
for line in lines: | ||
if not line.strip(): | ||
continue | ||
if line.startswith(" [ESMF]"): | ||
skip = True | ||
if skip: | ||
indent_level = len(line) - len(line.lstrip()) | ||
collect_data = self._parse_line(line) | ||
region = ESMFRegion(collect_data, self.esmf_summary) | ||
if not stack: | ||
# the first esmf_region | ||
esmf_region = region | ||
else: | ||
while stack and stack[-1][1] >= indent_level: | ||
stack.pop() | ||
if stack: | ||
stack[-1][0].add_child(region) | ||
stack.append((region, indent_level)) | ||
return esmf_region | ||
|
||
def _parse_line(self, line): | ||
"""Parses a line from an ESMF file based on summary flag.""" | ||
line = re.sub(r"(\d+)(\s+)(\d+\.\d+)", r"\1 \3", line) | ||
parts = re.split(r"\s{2,}", line.strip()) | ||
|
||
if not self.esmf_summary: | ||
name = parts[0] | ||
count = int(parts[1]) | ||
total = float(parts[2]) | ||
self_time = float(parts[3]) | ||
mean = float(parts[4]) | ||
min_time = float(parts[5]) | ||
max_time = float(parts[6]) | ||
collect_data = (name, count, total, self_time, mean, min_time, max_time) | ||
else: | ||
name = parts[0] | ||
PETs = int(parts[1]) | ||
PEs = int(parts[2]) | ||
count = int(parts[3]) | ||
mean = float(parts[4]) | ||
min_time = float(parts[5]) | ||
min_PET = int(parts[6]) | ||
max_time = float(parts[7]) | ||
max_PET = int(parts[8]) | ||
collect_data = ( | ||
name, | ||
count, | ||
PETs, | ||
PEs, | ||
mean, | ||
min_time, | ||
max_time, | ||
min_PET, | ||
max_PET, | ||
) | ||
return collect_data | ||
|
||
|
||
if __name__ == "__main__": | ||
# index for different profiling files (ESMF_Profile.xxxx or ESMF_Profile.summary) | ||
# ESMF_Profile.xxxx : (0. count, 1. total, 2. self_time, 3. mean , 4. min_time, 5. max_time ) | ||
# ESMF_Profile.summary: (0. count, 1. PETs , 2. mean , 3. min_time, 4. max_time, 5. min_PET, 6. max_PET) | ||
# It takes ~20 seconds to read and output runtimes for 1440 ESMF_Profile.xxxx files, | ||
# while reading the ESMF_Profile.summary file only takes about 0.02 seconds. | ||
time_start = time.time() | ||
ESMF_path = ["/g/data/tm70/ml0072/ESMF_Profile_test_example"] | ||
runtime_tot = collect_runtime_tot( | ||
ESMF_path, | ||
regionNames=[ | ||
"[ESMF]", | ||
"[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[OCN] RunPhase1", | ||
"[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED]", | ||
], | ||
profile_prefix="ESMF_Profile.", | ||
esmf_summary=True, | ||
index=2, | ||
) | ||
time_end = time.time() | ||
elaps = time_end - time_start | ||
print(elaps) | ||
print(runtime_tot) | ||
|
||
# The above example presents three items in `regionNames` variable, | ||
# where "[ESMF]" and "[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[OCN] RunPhase1" are the exact region names in ESMF profile files, | ||
# while "[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED]" is a prefix for: | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_aofluxes_run', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_accum', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_atm', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_ice_ice2med', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_ice_med2ice', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_ocn', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_print', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_diag_rof', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_history_write', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_ocnalb_run', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_post_atm', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_post_ice', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_post_ocn', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_post_rof', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_prep_ice', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_prep_ocn_accum', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_prep_ocn_avg', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_profile', | ||
# '[ESMF]/[ensemble] RunPhase1/[ESM0001] RunPhase1/[MED] med_phases_restart_write' | ||
## Hence, this tool supports runtime collection for both exact match and prefix-based match. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.