-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll/tuned dynamic file in json format #13104
Conversation
Hello! The Git Commit Checker CI bot found a few problems with this PR: 396ac28: WIP: writing coll/tuned dynamic file reader in jso...
54fbd56: coll/tuned: simplify arguments to dynamic file cre...
81f8c38: coll/tuned: fix warning about fscanf with sign dis...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
I like this, the idea is sound. I would love to see this extended to all collectives, aka the "module" will be the name of a collective component (han, accelerator, adapt, sm, ...).
|
Precisely why I added the module field. I toyed with adding a similar reader for han but honestly I got somewhat lost in the "number of configurations" for each topological rule. If you want to propose an example json for a han file I'd be happy to write up the reader. However I'm not yet sure it's wise to try and unify all the readers into some generic json reader. For example, I believe the han file has a somewhat different structure having another layer of configurations. Beyond that we could probably unify the outer-layers and let each module finish parsing the inner layer. Other than han and tuned, I also see fcoll has a "dynamic_file.c" and I see acoll has dynamic rules (but I didn't quickly find a file reader). Maybe we can target a future PR/version for such extension?
I was hesitant to change the selection logic in this PR. Right now the json format follows the original format pretty closely which just requires monotonically increasing comm_size and byte_size rules. It's not the most pleasant user experience, but it is relatively easy to understand what's happening. If we add the ability for users to select blocks of byte size or comm size then we need to either error or resolve what to do about overlaps. If we don't allow overlaps then it seems that we aren't adding much usability to the format (but see below).
Agreed. There is some structure that is required, but other fields are optional, and we shouldn't warn too loudly on unexpected fields, since this is how we expect to introduce new features in the future. Come to think of it we shouldn't warn too loudly on repeated byte or comm sizes, since in the future we might introduce a flag to distinguish the two (like comm_type), so maybe we should allow overlaps and then enable ranges.
While I understand the desire for a default rule, I don't understand your motivation for stopping the matching. Is this a sneaky way to comment out a section? Can you describe the purpose? (also: The statement
Agree. At the moment I was thinking of putting such specifiers in the comm rule list, so that it would indicate something along the lines of "when you have a comm_size > 12 of disjoint nodes" or "a comm size >100 and ppn < 16". Given the number of combinations this is one of the reasons I think a simple first-matching-rule-wins approach is best. |
I think in my next revision I will:
|
I'm trying to unify a format for tuning files for both tuned and han. I think this might be the common version:
Tuned has comm selectors based on comm size only right now. I will likely add things like Then within each of those selectors, is the typical message-size rules. I'm working towards that today. |
Tuned should have a selector based on message size. Based on few issues we had with support for non-contiguous types, it seems smart to add a selector for memory layout (contiguous or not). |
my apology for being late to the party, here are my two cents: At first glance, the file looks pretty simple, so would YAML be a better fit than JSON for that purpose? |
@ggouaillardet I prefer yaml as well, but I leaned towards json based on #12531. |
Signed-off-by: Luke Robison <[email protected]>
e7447ef
to
e32d08b
Compare
@bosilca I was hesitant to use your concept of the "npernode" directly because values of arbitrary N are hard to verify and perhaps have limited value, but I made a string that accepts a few options indicating the same thing. In writing files myself I certainly agree that I wish this were in YAML, but I just don't practically see how I can switch it over to yaml in a timely way. If anyone has a favorite C yaml library with compatible licensing we can include let me know. The bulk of the effort would be re-writing the opal util to use the yaml functions. |
Previously we passed n_collectives which we hard-coded to COLLCOUNT. Now we just include coll_base_functions.h and use COLLCOUNT directly. Signed-off-by: Luke Robison <[email protected]>
Retrieves values from a json object according to their index. Signed-off-by: Luke Robison <[email protected]>
For smaller things like type mismatch, errors should be handled by caller rather than show_help Signed-off-by: Luke Robison <[email protected]>
This commit provides a new format for the coll/tuned dynamic rules file using json. It also modifies matching rules to use a first-valid-match strategy, which is intended to be simpler in cases where a new matching rule may be added. A converter written in python is available in the contrib directory, however we still accept files in either format. Signed-off-by: Luke Robison <[email protected]>
Previously the coll_tuned output stream didn't use opal_output_set_verbosity at all. This made it difficult to distinguish problems from noisy logging. Signed-off-by: Luke Robison <[email protected]>
Updated again just to add the Copyright lines |
The existing coll/tuned dynamic file format allows for limited extensibility. Recently two PRs (#12827 and #12321) have wanted to add a field to the file, and either were never merged or merged with some file peeking logic to see if the field was there or not.
This PR offers a new file format which will be implemented in json. The goal is to easily support optional new fields and allow them to be introduced without breaking old tuning files that users already have installed on their system.
Additionally in contrib there is a converter python script that can be used to update old-format files into the new version.
As an example, an old file like this:
Will now look like this:
Additionally, we change the matching rules to be first-match-wins, rather than necessarily ordered by message size. This allows the introduction of new flags, such as
comm_rank_distribution
which can be eitherany
(previous behavior),single-node
orone-per-node
and have such rules be non-unique in communicator size. Similarly the message sizes could be extended as well.Still Todo: align on the format details:
MPI_
prefix?[ ] Do something similar inhan
?