Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Mar 26, 2025

  • Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
  • Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
  • Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
  • Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
    arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jonas Hahnfeld (hahnjo)

Changes
  • Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
  • Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
  • Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
  • Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
    arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

Full diff: https://github.com/llvm/llvm-project/pull/133057.diff

3 Files Affected:

  • (modified) clang/lib/AST/DeclTemplate.cpp (-6)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2-8)
  • (modified) clang/lib/Serialization/TemplateArgumentHasher.cpp (+18-33)
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index c0f5be51db5f3..8560c3928aa84 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -367,12 +367,6 @@ bool RedeclarableTemplateDecl::loadLazySpecializationsImpl(
   if (!ExternalSource)
     return false;
 
-  // If TPL is not null, it implies that we're loading specializations for
-  // partial templates. We need to load all specializations in such cases.
-  if (TPL)
-    return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
-                                                       /*OnlyPartial=*/false);
-
   return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
                                                      Args);
 }
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..eb0496c97eb3b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -7891,14 +7891,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) {
     }
   }
 
-  if (Template) {
-    // For partitial specialization, load all the specializations for safety.
-    if (isa<ClassTemplatePartialSpecializationDecl,
-            VarTemplatePartialSpecializationDecl>(D))
-      Template->loadLazySpecializationsImpl();
-    else
-      Template->loadLazySpecializationsImpl(Args);
-  }
+  if (Template)
+    Template->loadLazySpecializationsImpl(Args);
 }
 
 CXXCtorInitializer **
diff --git a/clang/lib/Serialization/TemplateArgumentHasher.cpp b/clang/lib/Serialization/TemplateArgumentHasher.cpp
index 3c7177b83ba52..5fb363c4ab148 100644
--- a/clang/lib/Serialization/TemplateArgumentHasher.cpp
+++ b/clang/lib/Serialization/TemplateArgumentHasher.cpp
@@ -21,17 +21,6 @@ using namespace clang;
 namespace {
 
 class TemplateArgumentHasher {
-  // If we bail out during the process of calculating hash values for
-  // template arguments for any reason. We're allowed to do it since
-  // TemplateArgumentHasher are only required to give the same hash value
-  // for the same template arguments, but not required to give different
-  // hash value for different template arguments.
-  //
-  // So in the worst case, it is still a valid implementation to give all
-  // inputs the same BailedOutValue as output.
-  bool BailedOut = false;
-  static constexpr unsigned BailedOutValue = 0x12345678;
-
   llvm::FoldingSetNodeID ID;
 
 public:
@@ -41,14 +30,7 @@ class TemplateArgumentHasher {
 
   void AddInteger(unsigned V) { ID.AddInteger(V); }
 
-  unsigned getValue() {
-    if (BailedOut)
-      return BailedOutValue;
-
-    return ID.computeStableHash();
-  }
-
-  void setBailedOut() { BailedOut = true; }
+  unsigned getValue() { return ID.computeStableHash(); }
 
   void AddType(const Type *T);
   void AddQualType(QualType T);
@@ -92,8 +74,7 @@ void TemplateArgumentHasher::AddTemplateArgument(TemplateArgument TA) {
   case TemplateArgument::Expression:
     // If we meet expression in template argument, it implies
     // that the template is still dependent. It is meaningless
-    // to get a stable hash for the template. Bail out simply.
-    BailedOut = true;
+    // to get a stable hash for the template.
     break;
   case TemplateArgument::Pack:
     AddInteger(TA.pack_size());
@@ -110,10 +91,9 @@ void TemplateArgumentHasher::AddStructuralValue(const APValue &Value) {
 
   // 'APValue::Profile' uses pointer values to make hash for LValue and
   // MemberPointer, but they differ from one compiler invocation to another.
-  // It may be difficult to handle such cases. Bail out simply.
+  // It may be difficult to handle such cases.
 
   if (Kind == APValue::LValue || Kind == APValue::MemberPointer) {
-    BailedOut = true;
     return;
   }
 
@@ -135,14 +115,11 @@ void TemplateArgumentHasher::AddTemplateName(TemplateName Name) {
   case TemplateName::DependentTemplate:
   case TemplateName::SubstTemplateTemplateParm:
   case TemplateName::SubstTemplateTemplateParmPack:
-    BailedOut = true;
     break;
   case TemplateName::UsingTemplate: {
     UsingShadowDecl *USD = Name.getAsUsingShadowDecl();
     if (USD)
       AddDecl(USD->getTargetDecl());
-    else
-      BailedOut = true;
     break;
   }
   case TemplateName::DeducedTemplate:
@@ -167,7 +144,6 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
   case DeclarationName::ObjCZeroArgSelector:
   case DeclarationName::ObjCOneArgSelector:
   case DeclarationName::ObjCMultiArgSelector:
-    BailedOut = true;
     break;
   case DeclarationName::CXXConstructorName:
   case DeclarationName::CXXDestructorName:
@@ -194,16 +170,29 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
 void TemplateArgumentHasher::AddDecl(const Decl *D) {
   const NamedDecl *ND = dyn_cast<NamedDecl>(D);
   if (!ND) {
-    BailedOut = true;
     return;
   }
 
   AddDeclarationName(ND->getDeclName());
+
+  // If this was a specialization we should take into account its template
+  // arguments. This helps to reduce collisions coming when visiting template
+  // specialization types (eg. when processing type template arguments).
+  ArrayRef<TemplateArgument> Args;
+  if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D))
+    Args = CTSD->getTemplateArgs().asArray();
+  else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D))
+    Args = VTSD->getTemplateArgs().asArray();
+  else if (auto *FD = dyn_cast<FunctionDecl>(D))
+    if (FD->getTemplateSpecializationArgs())
+      Args = FD->getTemplateSpecializationArgs()->asArray();
+
+  for (auto &TA : Args)
+    AddTemplateArgument(TA);
 }
 
 void TemplateArgumentHasher::AddQualType(QualType T) {
   if (T.isNull()) {
-    BailedOut = true;
     return;
   }
   SplitQualType split = T.split();
@@ -213,7 +202,6 @@ void TemplateArgumentHasher::AddQualType(QualType T) {
 
 // Process a Type pointer.  Add* methods call back into TemplateArgumentHasher
 // while Visit* methods process the relevant parts of the Type.
-// Any unhandled type will make the hash computation bail out.
 class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
   typedef TypeVisitor<TypeVisitorHelper> Inherited;
   llvm::FoldingSetNodeID &ID;
@@ -245,9 +233,6 @@ class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
 
   void Visit(const Type *T) { Inherited::Visit(T); }
 
-  // Unhandled types. Bail out simply.
-  void VisitType(const Type *T) { Hash.setBailedOut(); }
-
   void VisitAdjustedType(const AdjustedType *T) {
     AddQualType(T->getOriginalType());
   }

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2025

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

@vgvassilev
Copy link
Contributor

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2025

Performance measurements with LLVM

I tested these patches for building LLVM itself with modules (LLVM_ENABLE_MODULES=ON). To work around #130795, I apply #131354 before building Clang. In terms of overall performance for the entire build, I'm not able to measure a difference in memory consumption because that is dominated by the linker. The run time performance is very noisy, so it's hard to make accurate statements but it looks unaffected as well.

I did some measurements for individual files, chosen by searching for large object files and excluding generated files. For each version, I first build LLVM completely to populate the module.cache and then delete and rebuild only one object file. Run time performance is not hugely affected, it seems to get slightly faster with this PR.

Maximum resident set size (kbytes) from /usr/bin/time -v:

object file before* main this PR
lib/Analysis/CMakeFiles/LLVMAnalysis.dir/ScalarEvolution.cpp.o 543100 515184 445784
lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o 923036 884160 805960
lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/AttributorAttributes.cpp.o 639184 600076 522512
lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o 876580 857404 776572

before*: reverting fb2c9d9, c5e4afe, 30ea0f0, 20e9049 on current main

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025
hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025
@ilya-biryukov
Copy link
Contributor

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

Sure, let me try kicking it off. Note that our infrastructure is much better at detecting the compilations timing out than providing proper benchmarking at scale (there are a few targeted benchmarks too, though).
That means we're good and detecting big regressions, but won't be able to provide very reliable performance measurements.

I'll try to give you what we have, though.

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

@ChuanqiXu9
Copy link
Member

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025
@hahnjo
Copy link
Member Author

hahnjo commented Mar 28, 2025

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

@vgvassilev
Copy link
Contributor

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

@ChuanqiXu9
Copy link
Member

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

If it works, I feel good with it.

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

@vgvassilev
Copy link
Contributor

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.
Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

I don't feel odd. I remember it is (or was) LLVM's policy that smaller patches are preferred : )

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025
@ilya-biryukov
Copy link
Contributor

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

We did hit one compiler error that does not reproduce without modules, however:
error: use of overloaded operator '=' is ambiguous

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

@vgvassilev
Copy link
Contributor

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

That's very good news. I think we can further reduce these times. IIRC, we still deserialize declarations that we do not need. One of the places to look is the logic that kicks in when at module loading time:

llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,

We did hit one compiler error that does not reproduce without modules, however: error: use of overloaded operator '=' is ambiguous

Ouch. If that's the only issue on your infrastructure that's probably not so bad.

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

@emaxx-google
Copy link
Contributor

emaxx-google commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

@hahnjo
Copy link
Member Author

hahnjo commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 {
public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

@emaxx-google
Copy link
Contributor

emaxx-google commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)
UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 {
public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

That's how it looks like - the minimizer tool (based on C-Reduce/C-Vise) basically works by randomly removing chunks of code, which does often end up with code that looks corrupted. The tool could at least do a better job by merging/inlining unnecessary headers, macros, etc., but at least the output, as shared, should be sufficient to trigger the error in question (error: use of overloaded operator '=' is ambiguous). Let me know whether this works for you.

@hahnjo
Copy link
Member Author

hahnjo commented Apr 1, 2025

I had a closer look, but I get plenty of compile errors already on main - including

./head15.h:20:7: error: use of overloaded operator '=' is ambiguous (with operand types 'std::vector<absl::string_view>' and 'strings_internal::Splitter<typename strings_internal::SelectDelimiter<char>::type, AllowEmpty, std::string>' (aka 'Splitter<char, strings_internal::AllowEmpty, basic_string>'))

I haven't even applied the change in this PR - what am I missing?

@hahnjo
Copy link
Member Author

hahnjo commented Jan 14, 2026

Thanks for waiting! I ran a build of most of our code with this PR and didn't find any issues.

Great! In that case, I would like to proceed with landing the commits individually so they can be more easily reverted if needed (see discussion at the beginning of the PR), and then we can see if we can / want to backport for release/22.x. Please object if you see a problem with that plan.

@ChuanqiXu9
Copy link
Member

Thanks for waiting! I ran a build of most of our code with this PR and didn't find any issues.

Great! In that case, I would like to proceed with landing the commits individually so they can be more easily reverted if needed (see discussion at the beginning of the PR), and then we can see if we can / want to backport for release/22.x. Please object if you see a problem with that plan.

As this patch doesn't add any new functionalities. I think we don't need to backport it.

@hahnjo
Copy link
Member Author

hahnjo commented Jan 15, 2026

Great! In that case, I would like to proceed with landing the commits individually so they can be more easily reverted if needed (see discussion at the beginning of the PR), and then we can see if we can / want to backport for release/22.x. Please object if you see a problem with that plan.

As this patch doesn't add any new functionalities. I think we don't need to backport it.

No, it doesn't add new functionalities but it finally fixes the feature of lazy template loading to actually work. But backporting is not a requirement, we can continue to carry the patches downstream in ROOT.

@ChuanqiXu9
Copy link
Member

Great! In that case, I would like to proceed with landing the commits individually so they can be more easily reverted if needed (see discussion at the beginning of the PR), and then we can see if we can / want to backport for release/22.x. Please object if you see a problem with that plan.

As this patch doesn't add any new functionalities. I think we don't need to backport it.

No, it doesn't add new functionalities but it finally fixes the feature of lazy template loading to actually work. But backporting is not a requirement, we can continue to carry the patches downstream in ROOT.

hmmm for functionalities, I mean something the end user can notice. The lazy template loading is not a functionality in my mind too. It (including this patch) is an optimization. But this doesn't matter : )

hahnjo added a commit that referenced this pull request Jan 15, 2026
The code is applied from ODRHash::AddDecl with the reasoning given
in the comment, to reduce collisions. This was particularly visible
with STL types templated on std::pair where its template arguments
were not taken into account.

Reviewed as part of #133057
@vgvassilev
Copy link
Contributor

vgvassilev commented Jan 15, 2026

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.

EDIT: And make the clang peak memory when using modules lower :)

@ChuanqiXu9
Copy link
Member

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.

EDIT: And make the clang peak memory when using modules lower :)

Of course, this is why we want this. But for releases, I don't think we should take the risk for the actually NFC change.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 15, 2026
The code is applied from ODRHash::AddDecl with the reasoning given
in the comment, to reduce collisions. This was particularly visible
with STL types templated on std::pair where its template arguments
were not taken into account.

Reviewed as part of llvm/llvm-project#133057
@vgvassilev
Copy link
Contributor

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.
EDIT: And make the clang peak memory when using modules lower :)

Of course, this is why we want this. But for releases, I don't think we should take the risk for the actually NFC change.

That was the whole point of asking folks at google to test it, and the release is a release candidate. I'd say that's a perfect occasion to get this in.

@ChuanqiXu9
Copy link
Member

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.
EDIT: And make the clang peak memory when using modules lower :)

Of course, this is why we want this. But for releases, I don't think we should take the risk for the actually NFC change.

That was the whole point of asking folks at google to test it, and the release is a release candidate. I'd say that's a perfect occasion to get this in.

I won't block if you really want to backport it

@vgvassilev
Copy link
Contributor

vgvassilev commented Jan 15, 2026

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.
EDIT: And make the clang peak memory when using modules lower :)

Of course, this is why we want this. But for releases, I don't think we should take the risk for the actually NFC change.

That was the whole point of asking folks at google to test it, and the release is a release candidate. I'd say that's a perfect occasion to get this in.

I won't block if you really want to backport it

Now it is an excellent opportunity to add it because we are still in a release candidate mode. I really want this in -- it's been quite a journey for this path, let's make it happen in llvm 22!

hahnjo added a commit that referenced this pull request Jan 15, 2026
It is unclear (to me) why this needs to be done "for safety", but
this change significantly improves the effectiveness of lazy loading.

Reviewed as part of #133057
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 15, 2026
It is unclear (to me) why this needs to be done "for safety", but
this change significantly improves the effectiveness of lazy loading.

Reviewed as part of llvm/llvm-project#133057
hahnjo added a commit that referenced this pull request Jan 15, 2026
Similar as the last commit, it is unclear why we need to load all
specializations, including non-partial ones, when we have a TPL.

Reviewed as part of #133057
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 15, 2026
Similar as the last commit, it is unclear why we need to load all
specializations, including non-partial ones, when we have a TPL.

Reviewed as part of llvm/llvm-project#133057
hahnjo added a commit that referenced this pull request Jan 15, 2026
While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is
not actually needed: we are allowed to ignore parts that cannot be
handled because they will be analogously ignored by all hashings.

Reviewed as part of #133057
@hahnjo
Copy link
Member Author

hahnjo commented Jan 15, 2026

All four commits are in main now.

@hahnjo hahnjo closed this Jan 15, 2026
@hahnjo hahnjo deleted the clang-modules-lazy branch January 15, 2026 12:17
@AaronBallman
Copy link
Collaborator

Except that it will make using the stl and related modules much faster which is a significant quality of life improvement imo.
EDIT: And make the clang peak memory when using modules lower :)

Of course, this is why we want this. But for releases, I don't think we should take the risk for the actually NFC change.

We're sufficiently early in the rc process that we can take the changes if we think they're not risky, but if there's a potential that we need to revert this again, then I'd say nothing is on fire and we don't need to backport. I leave it to the templates and serialization maintainers to decide whether this is too risky or not.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 15, 2026
…asher

While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is
not actually needed: we are allowed to ignore parts that cannot be
handled because they will be analogously ignored by all hashings.

Reviewed as part of llvm/llvm-project#133057
@ChuanqiXu9
Copy link
Member

I won't say "too" as google spent almost 10 months to test it. But given the complexity of serialization and templates, I can't say confidently it won't be asked to be reverted during the RC circle.

Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
The code is applied from ODRHash::AddDecl with the reasoning given
in the comment, to reduce collisions. This was particularly visible
with STL types templated on std::pair where its template arguments
were not taken into account.

Reviewed as part of llvm#133057
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
It is unclear (to me) why this needs to be done "for safety", but
this change significantly improves the effectiveness of lazy loading.

Reviewed as part of llvm#133057
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
Similar as the last commit, it is unclear why we need to load all
specializations, including non-partial ones, when we have a TPL.

Reviewed as part of llvm#133057
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is
not actually needed: we are allowed to ignore parts that cannot be
handled because they will be analogously ignored by all hashings.

Reviewed as part of llvm#133057
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
The code is applied from ODRHash::AddDecl with the reasoning given
in the comment, to reduce collisions. This was particularly visible
with STL types templated on std::pair where its template arguments
were not taken into account.

Reviewed as part of llvm#133057
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
It is unclear (to me) why this needs to be done "for safety", but
this change significantly improves the effectiveness of lazy loading.

Reviewed as part of llvm#133057
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
Similar as the last commit, it is unclear why we need to load all
specializations, including non-partial ones, when we have a TPL.

Reviewed as part of llvm#133057
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is
not actually needed: we are allowed to ignore parts that cannot be
handled because they will be analogously ignored by all hashings.

Reviewed as part of llvm#133057
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants