Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more logging to sharktank data tests #1017

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

erman-gurses
Copy link

@erman-gurses erman-gurses commented Mar 3, 2025

Add more logging to sharktank data tests
It continues on this PR #917

@erman-gurses erman-gurses added the infra General category for infrastructure-related requests for common triaging and prioritization label Mar 3, 2025
@erman-gurses
Copy link
Author

erman-gurses commented Mar 3, 2025

Will double check and add more logs for the other tests.

@erman-gurses
Copy link
Author

@ScottTodd , raised a draft PR, let me know if you have any feedback. I can iterate over that.

@ScottTodd
Copy link
Member

Can you link to example logs before/after in the PR description? That would make review easier.

@erman-gurses
Copy link
Author

erman-gurses commented Mar 6, 2025

Before logging: https://github.com/nod-ai/shark-ai/actions/runs/13727645858/job/38397644679?pr=1017#step:7:108
After logging: https://github.com/nod-ai/shark-ai/actions/runs/13667985379/job/38212871488?pr=1017#step:7:282
My understanding is that the position of those logs (below ones) is wrong. Will correct them tomorrow.

INFO:tests.models.llama.benchmark_amdgpu_test BenchmarkLlama3_1_8B...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark8B_f16_Non_Decomposed_Input_Len_128...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark8B_f16_Non_Decomposed_Input_Len_2048...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark8B_fp8_Non_Decomposed...

INFO:tests.models.llama.benchmark_amdgpu_test Starting BenchmarkLlama3_1_70B...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark70B_f16_TP8_Non_Decomposed_Input_Len_1[28](https://github.com/nod-ai/shark-ai/actions/runs/13667985379/job/38212871488?pr=1017#step:7:29)...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark70B_f16_TP8_Non_Decomposed_Input_Len_2048128...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark70B_fp8_TP8_Decomposed128...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark70B_fp8_TP8_Non_Decomposed...

INFO:tests.models.llama.benchmark_amdgpu_test Starting Benchmark Llama3_1_405B...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark405B_f16_TP8_Non_Decomposed_Input_Len_128...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark[40](https://github.com/nod-ai/shark-ai/actions/runs/13667985379/job/38212871488?pr=1017#step:7:41)5B_f16_TP8_Non_Decomposed_Input_Len_2048...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark405B_fp8_TP8_Decomposed...

INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark405B_fp8_TP8_Non_Decomposed...

Also added those kinds logs:
I believe their positions are correct.

INFO:tests.models.llama.benchmark_amdgpu_test Compiling MLIR file...
INFO:tests.models.llama.benchmark_amdgpu_test IREE Benchmark Prefill...
INFO:tests.models.llama.benchmark_amdgpu_test IREE Benchmark Decode...
...

Signed-off-by: erman-gurses <[email protected]>
@erman-gurses erman-gurses force-pushed the users/erman-gurses/add-more-logging branch from 296bb2d to b086fdf Compare March 7, 2025 07:47
Signed-off-by: erman-gurses <[email protected]>
@erman-gurses erman-gurses force-pushed the users/erman-gurses/add-more-logging branch from d854f6a to a09d45b Compare March 7, 2025 16:24
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long to review. The extra logs are generally good to add. I would at least remove the class-scoped logging though.

Comment on lines 110 to +111
class BenchmarkLlama3_1_8B(BaseBenchmarkTest):
logger.info("Testing BenchmarkLlama3_1_8B...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these class-scoped log lines and instead rely on the -v option already included on the pytest commands in the workflow files.


I don't think logs at the class scope are doing what you want / are useful: https://github.com/nod-ai/shark-ai/actions/runs/13667985379/job/38212871488?pr=1017#step:7:22

snippet:

collecting ... 
----------------------------- live log collection ------------------------------
INFO:tests.models.llama.benchmark_amdgpu_test BenchmarkLlama3_1_8B...
INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark8B_f16_Non_Decomposed_Input_Len_128...
INFO:tests.models.llama.benchmark_amdgpu_test Testing Benchmark405B_fp8_TP8_Non_Decomposed...
collected 11 items

Pytest has a few phases while running tests. This is the "collection" phase, which is where pytest discovers test cases from the directory tree based on any conftest.py files, setting/configuration files, and command line options. These log lines are running during that collection phase, when you really want them when the tests are actually starting. Only after collection completes are individual tests actually ran, then there are a few other phases.

I can't find docs on the phases, but https://docs.pytest.org/en/stable/reference/reference.html#hooks is close enough to get the point across:

  1. bootstrapping
  2. initialization
  3. collection
  4. test running
  5. reporting

However, pytest can already log when a test starts. See https://docs.pytest.org/en/stable/how-to/output.html and the -v option in particular which changes from

=========================== test session starts ============================
collected 4 items

test_verbosity_example.py .FFF                                       [100%]

================================= FAILURES =================================

to

=========================== test session starts ============================
collecting ... collected 4 items

test_verbosity_example.py::test_ok PASSED                            [ 25%]
test_verbosity_example.py::test_words_fail FAILED                    [ 50%]
test_verbosity_example.py::test_numbers_fail FAILED                  [ 75%]
test_verbosity_example.py::test_long_text_fail FAILED                [100%]

================================= FAILURES =================================

Comment on lines +216 to 217
logger.info("Compiling MLIR file...")
self.llama8b_f16_torch_sdpa_artifacts.compile_to_vmfb(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper already does some logging, but the extra logs don't hurt. What we could do in these places is provide context for what or why we're compiling/running. The "IREE Benchmark Prefill..." logs you added are a good example of that. Just seeing iree-run-module ... in the logs doesn't immediately provide such context, but "benchmark prefill" does.

INFO:tests.models.llama.benchmark_amdgpu_test Compiling MLIR file...

INFO:eval      Launching compile command:
cd /home/runner/_work/shark-ai/shark-ai && iree-compile /home/runner/_work/shark-ai/shark-ai/2025-03-05/llama-8b/f16_torch_128.mlir --iree-hip-target=gfx942 -o=/home/runner/_work/shark-ai/shark-ai/2025-03-05/llama-8b/f16_torch_128.vmfb --iree-hal-target-device=hip --iree-hal-dump-executable-files-to=/home/runner/_work/shark-ai/shark-ai/2025-03-05/llama-8b/f16_torch_128/files --iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-stream-resource-memory-model=discrete --iree-hal-indirect-command-buffers=true --iree-hal-memoization=true --iree-opt-strip-assertions

INFO:eval      compile_to_vmfb: 16.77 secs

Comment on lines 602 to +603
# benchmark prefill
logger.info("IREE Benchmark Prefill...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when comments and logs are saying the same thing, you can remove the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra General category for infrastructure-related requests for common triaging and prioritization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants