Skip to content

Conversation

@smuzaffar
Copy link
Contributor

@smuzaffar smuzaffar commented Mar 20, 2025

workaround to suppress the array-bound warning in LTO builds


This is an attempt to fix the array bound warning we get in GCC14 IBs [a].
Size of array I m_v[S]; at https://github.com/cms-sw/cmssw/blob/master/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h#L21 is 257 via

totbins() at https://github.com/cms-sw/cmssw/blob/master/HeterogeneousCore/CUDAUtilities/interface/HistoContainer.h#L125 return 257 and then https://github.com/cms-sw/cmssw/blob/master/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h#L61-L63 tries to initialize/access m_v[257] which is out of array bound.

Either totbins() should return NHISTS * NBINS instead of NHISTS * NBINS +1 or we loop over 1 less element.

@fwyzard, if this is not the correct fix then feel free to propose a correct fix.

[a] https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc14/CMSSW_15_1_X_2025-03-18-2300/RecoVertex/PixelVertexFinding

In member function 'operator[]',
    inlined from 'clusterTracksByDensity' at src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:62:17,
    inlined from 'operator().isra' at src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:247:29:
  [src/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h:14](https://github.com/cms-sw/cmssw/blob/CMSSW_15_1_X_2025-03-18-2300/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h#L14):48: warning: array subscript 257 is above array bounds of 'unsigned int[257]' [-Warray-bounds=]
    14 |     constexpr I& operator[](int i) { return m_v[i]; }
      |                                                ^
src/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h: In function 'operator().isra':
src/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h:21:7: note: while referencing 'm_v'
   21 |     I m_v[S];
      |       ^
In member function 'operator[]',
    inlined from 'clusterTracksByDensity' at src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:62:17,
    inlined from 'operator().isra' at src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:247:29:
  [src/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h:14](https://github.com/cms-sw/cmssw/blob/CMSSW_15_1_X_2025-03-18-2300/HeterogeneousCore/AlpakaInterface/interface/FlexiStorage.h#L14):48: warning: array subscript 257 is above array bounds of 'unsigned int[257]' [-Warray-bounds=]
    14 |     constexpr I& operator[](int i) { return m_v[i]; }

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 20, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47634/44166

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar for master.

It involves the following packages:

  • RecoVertex/PixelVertexFinding (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @fabiocos, @martinamalberti, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

please test for el8_amd64_gcc14

@fwyzard
Copy link
Contributor

fwyzard commented Mar 20, 2025

@fwyzard, if this is not the correct fix then feel free to propose a correct fix.

I'll have a look.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f42cc9/45098/summary.html
COMMIT: a27faa3
CMSSW: CMSSW_15_1_X_2025-03-19-2300/el8_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47634/45098/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test deviceVertexFinderByDensity_tSerialSync had ERRORS
---> test deviceVertexFinderOneKernel_tSerialSync had ERRORS

RelVals

  • 145.408A fatal system signal has occurred: abort signal
  • 145.713A fatal system signal has occurred: abort signal
  • 145.301A fatal system signal has occurred: abort signal
Expand to see more relval errors ...

AddOn Tests

A fatal system signal has occurred: abort signal
A fatal system signal has occurred: abort signal
A fatal system signal has occurred: abort signal
Expand to see more addon errors ...

@fwyzard
Copy link
Contributor

fwyzard commented Mar 21, 2025

I'm still going through the details of the indices and ranges, but to first order I think GCC is wrong.

I've run the deviceVertexFinderByDensity_t tests for different back-ends after instrumenting the code of the clusterTracksByDensity function with

    for (auto j : cms::alpakatools::uniform_elements(acc, Hist::totbins())) {
      printf("set hist.off[%d] to 0\n", j);
      hist.off[j] = 0;
    }

On the CPU back-end:

./deviceVertexFinderByDensity_tSerialSync | grep 'set hist.off' | sort -k2.11,2.13g -u
set hist.off[0] to 0
set hist.off[1] to 0
set hist.off[2] to 0
set hist.off[3] to 0
set hist.off[4] to 0
...
set hist.off[253] to 0
set hist.off[254] to 0
set hist.off[255] to 0
set hist.off[256] to 0

On the CUDA back-end:

set hist.off[0] to 0
set hist.off[1] to 0
set hist.off[2] to 0
set hist.off[3] to 0
set hist.off[4] to 0
...
set hist.off[253] to 0
set hist.off[254] to 0
set hist.off[255] to 0
set hist.off[256] to 0

We never try to write to index 257.

@fwyzard
Copy link
Contributor

fwyzard commented Mar 21, 2025

In fact, with the proposed changes, the tests fail :-/

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Mar 24, 2025

@fwyzard , yes I also have added some debug print outs and index 257 was never accessed. Yes this seems like compiler bug

Previously we have suppressed this for RecoTracker/PixelSeeding via removing the array-bounds warnings from the compilation flags ( using pragma to suppress this warnings did not work for LTO). So I can update this PR to add <use name="no-array-bounds-flag" for="alpaka/serial"/>

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47634/44209

@cmsbuild
Copy link
Contributor

Pull request #47634 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@smuzaffar
Copy link
Contributor Author

please test for el8_amd64_gcc14

@fwyzard
Copy link
Contributor

fwyzard commented Mar 24, 2025

+heterogeneous

@smuzaffar
Copy link
Contributor Author

please test

lets run the tests for production arch too

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f42cc9/45175/summary.html
COMMIT: 8643ca7
CMSSW: CMSSW_15_1_X_2025-03-24-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47634/45175/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f42cc9/45166/summary.html
COMMIT: 8643ca7
CMSSW: CMSSW_15_1_X_2025-03-21-1100/el8_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47634/45166/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 239 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 107708 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4137425
  • DQMHistoTests: Total failures: 503001
  • DQMHistoTests: Total nulls: 418
  • DQMHistoTests: Total successes: 3633986
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 8.831000000000001 KiB( 51 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.415 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 13034.0 ): 0.938 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 140.045,... ): -0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.043 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.014,... ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.202 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): -0.016 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): 0.090 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.713 ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 17034.0 ): 3.490 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): ...
  • Checked 223 log files, 194 edm output root files, 52 DQM output files
  • TriggerResults: found differences in 23 / 50 workflows

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 538ac87 into cms-sw:master Mar 25, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants