-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tuner] Improving ease of use for the tuner #814
Comments
Another issue is that it does not support matching constants that may be used in bodies of linalg ops. |
Another big action item should be to automatically collect profiles so that users don't have to collect Tracy traces and manually select ops to tune. This is described in the original tuner issue: iree-org/iree#16952 . This will require compiler support as well. One more thing: support dispatches with dynamic shapes. This requires us to add support for generating benchmarks for dynamic shapes: iree-org/iree#19518 |
Thanks for the suggestions! I'll add them to the task list. When you say automatically collect profiles, do you specifically mean tracy profiles? One of my tasks above talks about adding some simple hooks in the compiler to track total run time, but I did not include automating the full tracy trace, since I didn't think the full tracy trace was necessary for the tuning loop. |
Not exactly tracy profiles but something equivalent with enough fidelity for the tuner to identify top dispatches. Ideally we should survey existing profile data formats used in PGO/AutoFDO and pick something portable, if that exists. |
I have added some more bullets to the list at the top, but we have a lot of tasks to work on here. Let's try to order this a bit in terms of priority. I'll start in this comment with what I think is the best first task to tackle, and we can build from there, and create sub-issues. 1. Support More Dispatch TypesImmediate first priority in my mind is to add support for tuning more dispatch types, since it has a direct impact on how far we can tune a given model. This requires us to lay some initial groundwork, though:
The above 3 tasks are the important things I have in my mind right now before we start to add tuning for more dispatch types. We can start with these tasks, and build on it or break them down as needed. |
Overview and Goals
This issue is for listing out the goals for the future state of the tuner, focusing on better testing and ease of setup/use.
In the simplest terms, the end goal of this issue is for the tuner to have little to no setup time, and if a user is able to compile and run a program, then the user should be able to (nearly) just as easily tune the program using the tuner. This means that nearly all of the current process for tuning needs to be automated, and hooked into components that are directly generated from the compiler, which leads to the next point:
Another focus of this issue is to continue hooking the tuner into components directly generated by the compiler. The current state of the tuner requires the user to know about many special flags (marking root ops, dumping benchmarks, etc), and then manually arrange the necessary inputs (flag file, benchmark files) and outputs (concatenated tuning TD spec). All of the inputs to the tuner should be easily directly generated by the compiler, and all outputs should be directly generated by the tuner.
Future Tasks
There is a lot to be done, so I will try to break down some of the work into smaller sub-projects:
Extracting Dispatches to Tune
In the current state, the first manual step of the tuner is to collect a tracy profile, and pick out the top dispatches to tune based on the runtime percentage in the model. This should ultimately be automated somehow.
Offload Work to the Compiler
There is a lot of python code to go from benchmark -> candidate TD spec in the tuner. Ideally, the compiler should generate something that is easy for the tuner ingest, and the TD spec should be very simple to create.
transform.iree.match.cast_compatible_dag_from_root
to match the operation, but this op is very sensitive to extra attributes, and we need to be careful about what attributes are present in the TD spec. Ideally there should be a TD op designed for tuning spec matching, which is less sensitive to extraneous attributes.Tuner Ease of Use
This refers to an overall easier user experience. This means reducing the number of flags required by the user, and automating the setup process for the tuner.
examples/simple
example for tuning, but that is only meant to be an example for how to make a tuning client. There should be a central tuning loop, and it should be obvious to the user how to use it.Improve Benchmarking Stability
This is partly documentation, partly implementation. We can implement features to attempt to reduce noise as much as possible, and warn when noise is detected, but it is impossible to prevent all noise, so a user should also be aware of things that cause noisy tuning results.
Further Tuning Support and Maintainability
Improve Test Coverage in the Tuner
The poor test coverage was made very clear in the last sprint for SDXL tuning, as there were many bugs found in the new tuner path once real model tuning loops were being used. There needs to be overall better testing coverage and error handling in the tuner, since each bug that is hit at the end of tuning leads to the loss of a lot of time, which is very important in time pressure.
Packaging Default Tuning Specs with IREE
We should also have a good solution for packaging tuning specs with IREE, so we can get good performance out of the box with certain important ops/shapes.
The text was updated successfully, but these errors were encountered: