Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coll/tuned dynamic file in json format #13104

Merged
merged 6 commits into from
Apr 7, 2025
Merged

Conversation

lrbison
Copy link
Contributor

@lrbison lrbison commented Feb 17, 2025

The existing coll/tuned dynamic file format allows for limited extensibility. Recently two PRs (#12827 and #12321) have wanted to add a field to the file, and either were never merged or merged with some file peeking logic to see if the field was there or not.

This PR offers a new file format which will be implemented in json. The goal is to easily support optional new fields and allow them to be introduced without breaking old tuning files that users already have installed on their system.

Additionally in contrib there is a converter python script that can be used to update old-format files into the new version.

As an example, an old file like this:

rule-file-version-2
2   # num of collectives
1   # collective ID
3   # number of comm sizes
#=====================
64  # comm size
1   # number of rules
0            1 0 0 0
100 # comm size
1   # number of rules
0            2 0 0 0
105 # comm size
1   # number of rules
0            3 0 0 0

2   # collective ID
1   # number of comm sizes
0   # comm size
4   # number of rules
# Bytes   alg topo segs reqs
#----------------------
0         1 0 0 0
64        2 0 0 64
100       3 0 0 64
105       4 0 0 64

Will now look like this:

{
    "rule_file_version": 3, 
    "module": "tuned", 
    "collectives": {
        "allgatherv": [
            {
                "comm_size_min": 0, 
                "comm_size_max": 63, 
                "rules": [
                    {
                        "msg_size_min": 0, 
                        "alg": 1
                    }
                ]
            }, 
            {
                "comm_size_min": 64, 
                "comm_size_max": 99, 
                "rules": [
                    {
                        "msg_size_min": 0, 
                        "alg": 2
                    }
                ]
            }, 
            {
                "comm_size_min": 100, 
                "rules": [
                    {
                        "msg_size_min": 0, 
                        "alg": 3
                    }
                ]
            }
        ], 
        "allreduce": [
            {
                "comm_size_min": 0, 
                 "comm_rank_distribution" : "one-per-node",
                "rules": [
                    {
                        "msg_size_min": 0, 
                        "msg_size_max": 63, 
                        "alg": 1
                    }, 
                    {
                        "msg_size_min": 64, 
                        "msg_size_max": 99, 
                        "alg": 2, 
                        "reqs": 64
                    }, 
                    {
                        "msg_size_min": 100, 
                        "msg_size_max": 104, 
                        "alg": 3, 
                        "reqs": 64
                    }, 
                    {
                        "msg_size_min": 105, 
                        "alg": 4, 
                        "reqs": 64
                    }
                ]
            }
        ]
    }
}

Additionally, we change the matching rules to be first-match-wins, rather than necessarily ordered by message size. This allows the introduction of new flags, such as comm_rank_distribution which can be either any (previous behavior), single-node or one-per-node and have such rules be non-unique in communicator size. Similarly the message sizes could be extended as well.

Still Todo: align on the format details:

  • Align on some minor format details: list of collectives by name, case insensitive, with/without MPI_ prefix?
  • Introduce the disjoint flag
  • Complete the reader implementation
  • [ ] Do something similar in han?

Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

396ac28: WIP: writing coll/tuned dynamic file reader in jso...

  • check_signed_off: does not contain a valid Signed-off-by line

54fbd56: coll/tuned: simplify arguments to dynamic file cre...

  • check_signed_off: does not contain a valid Signed-off-by line

81f8c38: coll/tuned: fix warning about fscanf with sign dis...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@bosilca
Copy link
Member

bosilca commented Feb 19, 2025

I like this, the idea is sound.

I would love to see this extended to all collectives, aka the "module" will be the name of a collective component (han, accelerator, adapt, sm, ...).

  • the names for the collective could follow the MPI scheme, i for nonblocking, p for persistent, n for neighborhood.
  • most of the time we want ranges for the indexed values such as comm_size and bytes so maybe we can accept both form:comm_size : 5 and comm_size : [min, max] (accepting empty min or/and max to allow for unbounded intervals)
  • don't predefine the keys to allow each module to have its own. The modules will need to generate warnings if they encounter a key they don't understand.
  • allow for unbounded (or default) rules, that will stop the matching for the remainder of the rules. As an example
    {
        "comm_size": [,],
        "rules" : [ { "alg" : 0, "reqs" : 0 } ]
    }
  • it would be nice to have qualifiers such as comm_type (intra or inter), ppn, single_node, memkind and certainly more.

@lrbison
Copy link
Contributor Author

lrbison commented Feb 19, 2025

I would love to see this extended to all collectives, aka the "module" will be the name of a collective component (han, accelerator, adapt, sm, ...).

Precisely why I added the module field. I toyed with adding a similar reader for han but honestly I got somewhat lost in the "number of configurations" for each topological rule. If you want to propose an example json for a han file I'd be happy to write up the reader.

However I'm not yet sure it's wise to try and unify all the readers into some generic json reader. For example, I believe the han file has a somewhat different structure having another layer of configurations. Beyond that we could probably unify the outer-layers and let each module finish parsing the inner layer.

Other than han and tuned, I also see fcoll has a "dynamic_file.c" and I see acoll has dynamic rules (but I didn't quickly find a file reader).

Maybe we can target a future PR/version for such extension?

we want ranges for the indexed values such as comm_size

I was hesitant to change the selection logic in this PR. Right now the json format follows the original format pretty closely which just requires monotonically increasing comm_size and byte_size rules. It's not the most pleasant user experience, but it is relatively easy to understand what's happening. If we add the ability for users to select blocks of byte size or comm size then we need to either error or resolve what to do about overlaps. If we don't allow overlaps then it seems that we aren't adding much usability to the format (but see below).

don't predefine the keys to allow each module to have its own. The modules will need to generate warnings if they encounter a key they don't understand.

Agreed. There is some structure that is required, but other fields are optional, and we shouldn't warn too loudly on unexpected fields, since this is how we expect to introduce new features in the future.

Come to think of it we shouldn't warn too loudly on repeated byte or comm sizes, since in the future we might introduce a flag to distinguish the two (like comm_type), so maybe we should allow overlaps and then enable ranges.

allow for unbounded (or default) rules, that will stop the matching for the remainder of the rules.

While I understand the desire for a default rule, I don't understand your motivation for stopping the matching. Is this a sneaky way to comment out a section? Can you describe the purpose? (also: The statement "comm_size": [,]" is not valid json, so our utility reader won't parse it, but `comm_size": "default" would work fine).

it would be nice to have qualifiers such as comm_type (intra or inter), ppn, single_node, memkind and certainly more.

Agree. At the moment I was thinking of putting such specifiers in the comm rule list, so that it would indicate something along the lines of "when you have a comm_size > 12 of disjoint nodes" or "a comm size >100 and ppn < 16". Given the number of combinations this is one of the reasons I think a simple first-matching-rule-wins approach is best.

@lrbison
Copy link
Contributor Author

lrbison commented Feb 19, 2025

I think in my next revision I will:

  • Allow comm rules to have a range of sizes
  • Allow message size rules to have a range of sizes
  • Explicitly allow "default" selection in both comm rules and message size rules.
  • Change the matching logic to include the ranges, and select the first matching rule.

@lrbison
Copy link
Contributor Author

lrbison commented Feb 28, 2025

I'm trying to unify a format for tuning files for both tuned and han. I think this might be the common version:

"collectives": { 
  "<some-collective-name>" : [
      <comm-selector1>,  
      <comm-selector2> ] }

Tuned has comm selectors based on comm size only right now. I will likely add things like disjoint or singlenode in this PR or later.
Han has topologic levels "global_communicator", "inter_node", "intra_node".

Then within each of those selectors, is the typical message-size rules.

I'm working towards that today.

@bosilca
Copy link
Member

bosilca commented Mar 1, 2025

singlenode is understandable, but disjoint is not. What if instead we go for npernode, and then use 1 for disjoint and all for single node.

Tuned should have a selector based on message size. Based on few issues we had with support for non-contiguous types, it seems smart to add a selector for memory layout (contiguous or not).

@ggouaillardet
Copy link
Contributor

my apology for being late to the party, here are my two cents:

At first glance, the file looks pretty simple, so would YAML be a better fit than JSON for that purpose?
The rationale is YAML is easier and less verbose to generate by hand compared to JSON

@lrbison
Copy link
Contributor Author

lrbison commented Mar 4, 2025

@ggouaillardet I prefer yaml as well, but I leaned towards json based on #12531.

@lrbison lrbison force-pushed the tuned_json branch 3 times, most recently from e7447ef to e32d08b Compare March 5, 2025 22:13
@lrbison lrbison marked this pull request as ready for review March 6, 2025 01:55
@lrbison
Copy link
Contributor Author

lrbison commented Mar 6, 2025

@bosilca I was hesitant to use your concept of the "npernode" directly because values of arbitrary N are hard to verify and perhaps have limited value, but I made a string that accepts a few options indicating the same thing.

In writing files myself I certainly agree that I wish this were in YAML, but I just don't practically see how I can switch it over to yaml in a timely way. If anyone has a favorite C yaml library with compatible licensing we can include let me know. The bulk of the effort would be re-writing the opal util to use the yaml functions.

lrbison added 5 commits March 6, 2025 14:25
Previously we passed n_collectives which we hard-coded to COLLCOUNT.
Now we just include coll_base_functions.h and use COLLCOUNT directly.

Signed-off-by: Luke Robison <[email protected]>
Retrieves values from a json object according to their index.

Signed-off-by: Luke Robison <[email protected]>
For smaller things like type mismatch, errors should be handled by
caller rather than show_help

Signed-off-by: Luke Robison <[email protected]>
This commit provides a new format for the coll/tuned dynamic rules
file using json.  It also modifies matching rules to use a
first-valid-match strategy, which is intended to be simpler in cases
where a new matching rule may be added.

A converter written in python is available in the contrib directory,
however we still accept files in either format.

Signed-off-by: Luke Robison <[email protected]>
Previously the coll_tuned output stream didn't use
opal_output_set_verbosity at all.  This made it difficult to
distinguish problems from noisy logging.

Signed-off-by: Luke Robison <[email protected]>
@lrbison
Copy link
Contributor Author

lrbison commented Mar 6, 2025

Updated again just to add the Copyright lines

@lrbison lrbison changed the title WIP: coll/tuned dynamic file in json format coll/tuned dynamic file in json format Mar 31, 2025
@lrbison lrbison merged commit 96db7e2 into open-mpi:main Apr 7, 2025
24 checks passed
@lrbison lrbison deleted the tuned_json branch April 7, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants