Skip to content

Conversation

Prince781
Copy link
Contributor

@Prince781 Prince781 commented Sep 30, 2025

Each call to findemandedEltsBySingleUser() returns a new APInt that must be OR'd with the current APInt. For large vectors with many uses this can be slow, if the total number of operations is {# uses} x {size of vector}. Instead or OR'ing, use setBit() on the passed-in APInt.

@Prince781 Prince781 requested a review from nikic as a code owner September 30, 2025 20:33
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Sep 30, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Princeton Ferro (Prince781)

Changes

Don't use APInt::operator|=() since for large vectors with many uses this can be slow, if the total number of operations is {# uses} x {size of vector}. Instead, make findDemandedEltsBySingleUser() take constant time by using setBit().


Full diff: https://github.com/llvm/llvm-project/pull/161436.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+18-13)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 6ef30663bf3ce..a099aaf9e6223 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -320,11 +320,12 @@ Instruction *InstCombinerImpl::foldBitcastExtElt(ExtractElementInst &Ext) {
 }
 
 /// Find elements of V demanded by UserInstr.
-static APInt findDemandedEltsBySingleUser(Value *V, Instruction *UserInstr) {
-  unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
+static void findDemandedEltsBySingleUser(Value *V, Instruction *UserInstr,
+                                         APInt &UnionUsedElts) {
+  const unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
 
-  // Conservatively assume that all elements are needed.
-  APInt UsedElts(APInt::getAllOnes(VWidth));
+  // Whether we can determine the elements accessed at compile time.
+  bool KnownIndices = false;
 
   switch (UserInstr->getOpcode()) {
   case Instruction::ExtractElement: {
@@ -332,32 +333,36 @@ static APInt findDemandedEltsBySingleUser(Value *V, Instruction *UserInstr) {
     assert(EEI->getVectorOperand() == V);
     ConstantInt *EEIIndexC = dyn_cast<ConstantInt>(EEI->getIndexOperand());
     if (EEIIndexC && EEIIndexC->getValue().ult(VWidth)) {
-      UsedElts = APInt::getOneBitSet(VWidth, EEIIndexC->getZExtValue());
+      UnionUsedElts.setBit(EEIIndexC->getZExtValue());
+      KnownIndices = true;
     }
     break;
   }
   case Instruction::ShuffleVector: {
     ShuffleVectorInst *Shuffle = cast<ShuffleVectorInst>(UserInstr);
-    unsigned MaskNumElts =
+    const unsigned MaskNumElts =
         cast<FixedVectorType>(UserInstr->getType())->getNumElements();
 
-    UsedElts = APInt(VWidth, 0);
-    for (unsigned i = 0; i < MaskNumElts; i++) {
-      unsigned MaskVal = Shuffle->getMaskValue(i);
+    KnownIndices = true;
+    for (auto I : llvm::seq(MaskNumElts)) {
+      unsigned MaskVal = Shuffle->getMaskValue(I);
       if (MaskVal == -1u || MaskVal >= 2 * VWidth)
         continue;
       if (Shuffle->getOperand(0) == V && (MaskVal < VWidth))
-        UsedElts.setBit(MaskVal);
+        UnionUsedElts.setBit(MaskVal);
       if (Shuffle->getOperand(1) == V &&
           ((MaskVal >= VWidth) && (MaskVal < 2 * VWidth)))
-        UsedElts.setBit(MaskVal - VWidth);
+        UnionUsedElts.setBit(MaskVal - VWidth);
     }
     break;
   }
   default:
     break;
   }
-  return UsedElts;
+
+  // Conservatively assume all elements are accessed if indices are unknown
+  if (!KnownIndices)
+    UnionUsedElts.setAllBits();
 }
 
 /// Find union of elements of V demanded by all its users.
@@ -370,7 +375,7 @@ static APInt findDemandedEltsByAllUsers(Value *V) {
   APInt UnionUsedElts(VWidth, 0);
   for (const Use &U : V->uses()) {
     if (Instruction *I = dyn_cast<Instruction>(U.getUser())) {
-      UnionUsedElts |= findDemandedEltsBySingleUser(V, I);
+      findDemandedEltsBySingleUser(V, I, UnionUsedElts);
     } else {
       UnionUsedElts = APInt::getAllOnes(VWidth);
       break;

@Prince781 Prince781 requested a review from mjulian31 September 30, 2025 20:38
@Prince781 Prince781 force-pushed the dev/pferro/instcombine-extractelt-compile-time branch from 426bcb1 to fe8ef23 Compare September 30, 2025 21:18
Each call to findemandedEltsBySingleUser() returns a new APInt that must
be OR'd with the current APInt. For large vectors with many uses this
can be slow, if the total number of operations is {# uses} x {size of
vector}. Instead or OR'ing, use setBit() on the passed-in APInt.
@Prince781 Prince781 force-pushed the dev/pferro/instcombine-extractelt-compile-time branch from fe8ef23 to 2dbd45d Compare September 30, 2025 21:21
@Prince781 Prince781 requested a review from nikic September 30, 2025 21:28
@Prince781 Prince781 changed the title [InstCombine] linearize complexity of findDemandedEltsByAllUsers() [InstCombine] linearize complexity of findDemandedEltsByAllUsers() Sep 30, 2025
@Prince781 Prince781 requested a review from dtcxzyw September 30, 2025 22:45
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@mjulian31 mjulian31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Prince781 Prince781 merged commit 9e04291 into llvm:main Oct 1, 2025
9 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…lvm#161436)

Each call to `findemandedEltsBySingleUser()` returns a new APInt that
must be OR'd with the current APInt. For large vectors with many uses
this can be slow, if the total number of operations is `{# uses} x {size
of vector}`. Instead or OR'ing, use `setBit()` on the passed-in APInt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants