Introduce RuntimeSeriesReward #762

lqwk · 2022-08-25T19:11:08Z

Introduce RuntimeSeriesReward

Introduce a new implementation of comparing program runtimes that computes the reward as the difference of the medians between the current set of runtimes and the previous set of runtimes only if the runtime series are significantly different (determined by the Kruskal–Wallis test).

Source: https://htor.inf.ethz.ch/publications/img/hoefler-scientific-benchmarking.pdf

Testing

I ran a series of tests comparing the new implementation with the existing implementation using the LLVM autotuner on the cbench-v1 benchmark. The rewards are shown below:

Benchmark	Runtime	Runtime Series
cbench-v1/bitcount	0.847564	0.780826
cbench-v1/blowfish	0.997087	0.997496
cbench-v1/bzip2	2.505352	2.516984
cbench-v1/crc32	1.003222	0.995800
cbench-v1/dijkstra	1.001125	1.014525
cbench-v1/gsm	0.785371	0.812009
cbench-v1/jpeg-c	0.976613	0.989570
cbench-v1/jpeg-d	1.004371	1.010437
cbench-v1/patricia	0.992379	1.010707
cbench-v1/qsort	0.958153	0.986507
cbench-v1/sha	1.000886	0.990302
cbench-v1/stringsearch	1.004281	1.006058
cbench-v1/stringsearch2	1.009237	1.050334
cbench-v1/susan	0.959058	0.975618
cbench-v1/tiff2bw	1.016761	1.018933
cbench-v1/tiff2rgba	1.105831	1.104215
cbench-v1/tiffdither	1.042336	1.022278
cbench-v1/tiffmedian	1.005050	1.010557

The new implementation is on par with the existing implementation, and even beats the existing implementation on 12/17 of the benchmarks.

I am proposing to merge this upstream and we can maybe work on other optimizations such as early stopping.

…ntime

codecov-commenter · 2022-08-25T20:20:54Z

Codecov Report

❌ Patch coverage is 34.00000% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.63%. Comparing base (1c40e5b) to head (a53276a).
⚠️ Report is 134 commits behind head on development.

Files with missing lines	Patch %	Lines
compiler_gym/spaces/runtime_series_reward.py	28.57%	25 Missing ⚠️
compiler_gym/wrappers/llvm.py	33.33%	8 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (1c40e5b) and HEAD (a53276a). Click for more details.

HEAD has 43 uploads less than BASE

Flag BASE (1c40e5b) HEAD (a53276a)

45 2

Additional details and impacted files

@@               Coverage Diff                @@
##           development     #762       +/-   ##
================================================
- Coverage        89.29%   55.63%   -33.67%     
================================================
  Files              130      131        +1     
  Lines             7912     7961       +49     
================================================
- Hits              7065     4429     -2636     
- Misses             847     3532     +2685

Files with missing lines	Coverage Δ
compiler_gym/spaces/__init__.py	`100.00% <100.00%> (ø)`
compiler_gym/wrappers/__init__.py	`100.00% <100.00%> (ø)`
compiler_gym/wrappers/llvm.py	`44.82% <33.33%> (-55.18%)`	⬇️
compiler_gym/spaces/runtime_series_reward.py	`28.57% <28.57%> (ø)`

... and 87 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lqwk · 2022-08-25T20:25:43Z

Note: needs #761 to land first

ChrisCummins · 2022-10-14T02:13:59Z

Hi @lqwk, I'm very sorry for my delay in reviewing this. I've not forgotten about it, I just have a backlog of issues to fix on the CI so that I can run the tests against these changes.

Cheers,
Chris

ChrisCummins · 2022-11-03T01:32:56Z

Hi @lqwk, okay, I finally pushed through the backlog of issues and have a newly stable v0.2.5 release. Sorry again that it me so long to getting around to this.

Are you still working on this? If so, could you please rebase this on top of the development branch so that we can use the CI to verify that all tests pass.

Cheers,
Chris

lqwk added 21 commits July 19, 2022 12:39

Compare two runtime serires medians using Kruskal-Wallis test

bd240e1

Add RuntimeSeriesEstimateReward wrapper

642644a

Compare two runtime serires medians using Kruskal-Wallis test

bb78bf9

Add RuntimeSeriesEstimateReward wrapper

cbf8e6b

Fix unknown optimization target error

348ab6b

Merge branch 'runtime' of https://github.com/lqwk/CompilerGym into ru…

39b5965

…ntime

Increase reward if two series are significantly different

e1f68af

Fix bug in reward calculation

2492418

rename runtime to runtimeseries

7e9f511

add check

ceb4e21

Compare two runtime serires medians using Kruskal-Wallis test

812e07c

Add RuntimeSeriesEstimateReward wrapper

2ade2ff

Fix unknown optimization target error

eb150d4

Increase reward if two series are significantly different

7d9c7c2

Fix bug in reward calculation

262ef55

rename runtime to runtimeseries

b797e93

add check

4030316

Merge branch 'runtime' of https://github.com/lqwk/CompilerGym into ru…

b927cf2

…ntime

remove duplicate code

330c1bb

documentation

77eba6f

make RuntimeSeriesReward more readable

a53276a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 25, 2022

ChrisCummins added this to the v0.2.6 milestone Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce RuntimeSeriesReward #762

Introduce RuntimeSeriesReward #762

Uh oh!

lqwk commented Aug 25, 2022

Uh oh!

codecov-commenter commented Aug 25, 2022 •

edited

Loading

Uh oh!

lqwk commented Aug 25, 2022

Uh oh!

ChrisCummins commented Oct 14, 2022

Uh oh!

ChrisCummins commented Nov 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Introduce RuntimeSeriesReward #762

Are you sure you want to change the base?

Introduce RuntimeSeriesReward #762

Uh oh!

Conversation

lqwk commented Aug 25, 2022

Introduce RuntimeSeriesReward

Testing

Uh oh!

codecov-commenter commented Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lqwk commented Aug 25, 2022

Uh oh!

ChrisCummins commented Oct 14, 2022

Uh oh!

ChrisCummins commented Nov 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Aug 25, 2022 •

edited

Loading