Skip to content

Conversation

@kpedro88
Copy link
Contributor

@kpedro88 kpedro88 commented Aug 29, 2023

The change to the abseil version in #8565 introduced an inconsistency in TensorFlow, leading to errors like this:

/cvmfs/cms.cern.ch/slc7_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/../lib/gcc/x86_64-unknown-linux-gnu/11.4.1/../../../../x86_64-unknown-linux-gnu/bin/ld.bfd: tmp/slc7_amd64_gcc11/src/RecoMET/METPUSubtraction/plugins/RecoMETMETPUSubtraction_plugins/ccjAPFCF.ltrans0.ltrans.o: in function `DeepMETProducer::DeepMETProducer(edm::ParameterSet const&, tensorflow::SessionCache const*)':
<artificial>:(.text+0x6d6c): undefined reference to `tensorflow::TensorShapeBase<tensorflow::TensorShape>::TensorShapeBase(absl::lts_20230125::Span<long const>)'
/cvmfs/cms.cern.ch/slc7_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/../lib/gcc/x86_64-unknown-linux-gnu/11.4.1/../../../../x86_64-unknown-linux-gnu/bin/ld.bfd: <artificial>:(.text+0x6da7): undefined reference to `tensorflow::TensorShapeBase<tensorflow::TensorShape>::TensorShapeBase(absl::lts_20230125::Span<long const>)'
collect2: error: ld returned 1 exit status
gmake: *** [config/SCRAM/GMake/Makefile.rules:1793: tmp/slc7_amd64_gcc11/src/RecoMET/METPUSubtraction/plugins/RecoMETMETPUSubtraction_plugins/libRecoMETMETPUSubtraction_plugins.so] Error 1

The problem as I understand it:

  1. The TensorFlow shared libraries are linked to the CMSSW abseil library and have the correct symbols, but the TensorFlow headers have a different abseil namespace (lts_20220623 rather than lts_20230125).
  2. This appears to be because TensorFlow still uses its local Bazel workspace information for abseil, even with our bazel-absl.patch file that updates the central Bazel info on this package.

I patched all the places I could find where the Bazel version was hardcoded in the TensorFlow source (based on the upstream abseil version update in TensorFlow 2.13.0, one minor version after ours: tensorflow/tensorflow@ad938db) and was able to remove the DeepMETProducer workaround from cms-sw/cmssw#42228 (see cms-sw/cmssw#42682). I'm happy to reimplement the patches in whatever way is preferred (but it's going to be ugly no matter what).

attn: @iarspider @yongbinfeng

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for branch IB/CMSSW_13_3_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @antoniovilela, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

@iarspider
Copy link
Contributor

@kpedro88 thanks for the fix. Can you open a PR in https://github.com/cms-externals/tensorflow/ targeting cms/v2.12.0 with your changes?

@kpedro88
Copy link
Contributor Author

Done, see cms-externals/tensorflow#12

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6ea053/34534/summary.html
COMMIT: ed0e178
CMSSW: CMSSW_13_3_X_2023-08-29-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8672/34534/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 30 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3153095
  • DQMHistoTests: Total failures: 1493
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3151580
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

closing it in favor of #8675

@smuzaffar smuzaffar closed this Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants