Skip to content

Conversation

@dabund24
Copy link

@dabund24 dabund24 commented Nov 5, 2025

First part of #1805.
Second case will be handled in a separate PR.

To be handled

Non-transitive version

When creating $t_1$, $t_0$ must hold a lock $l$. If $l$ is not released before $t_1$ is definitely joined into $t_0$, $t_1$ is protected by $l$.

Examples

graph TB;
subgraph t1;
    E["..."]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C;
    C["join(t1);"]-->D["unlock(l);"]
end;
B-.->E
F-.->C
Loading
graph TB;
subgraph t1;
    E["..."]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C[return;];
end;
B-.->E
Loading

General version

Let $t_1$ be a must-ancestor of $t_0$. When creating $t_1$, $t_0$ must hold a lock $l$. If $l$ is not released before $t_d$ is definitely joined into $t_0$, $t_d$ is protected by $l$.

Example

graph TB;
subgraph td;
    G["..."]-->H["return;"];
end;
subgraph t1;
    E["create(td);"]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C;
    C["join(td);"]-->D["unlock(l);"]
end;
B-.->E
E-.->G
H-.->C
Loading

Dependency Analyses

  • $\mathcal T$: Ego Thread Id at program point
  • $\mathcal L$: Must-Lockset at program point
  • $\mathcal C$: May-Creates of ego thread before program point
  • $\mathcal J$: Transitive Must-Joins of ego thread before program point
  • $\mathcal{DES}\ t$: Descendant threads of $t$ (implemented in this PR)
  • $t_a\in\mathcal{A}\ t$: $t_a$ is a must-ancestor thread of $t$

Conditions to satisfy

  1. $t_0\in\mathcal A\ t_1\land (t_1=t_d\lor t_1\in\mathcal A\ t_d)$
  2. maybe $\exists$ create(t1) in $t_0$ with $l\in\mathcal L$
  3. $\forall$ create(t1) in $t_0:l\in\mathcal L$
  4. $\forall$ unlock(l) in $t_0:t_d\notin\left(\mathcal C\cup\bigcup_{c\in\mathcal C}\mathcal{DES}\ c\right)\setminus\mathcal J$

Possible solutions

1. Explicitly listing all descendants

  • $\mathcal{CL}\subseteq T\to T\to 2^L$
  • $T\to 2^L$ is MapBot
  • $2^L$ is Must-Set
  • Flow-Insensitive
  • $(t_1\mapsto\{t_0\mapsto L\})\in\mathcal{CL}$ means " $t_1$ is protected by all mutexes in $L$ locked in $t_0$ and by nothing else".

Contributions

  • create(t1):
    $\forall t\in t_1\cup\mathcal{DES}\ t_1:$
    $$\mathcal{CL}\ t\sqsupseteq\{\mathcal T\mapsto\mathcal L\}$$

  • unlock(l):
    $\forall t\in \left(\mathcal C\cup\bigcup_{c\in\mathcal C}\mathcal{DES}\ c \right)\setminus\mathcal J:$
    $\mathcal{CL}\ t\sqsupseteq \{\mathcal T\mapsto(\mathcal{CL}\ t\ \mathcal T)\setminus \{l\}\}$

  • unlock of unknown mutex:
    $\forall t\in \left(\mathcal C\cup\bigcup_{c\in\mathcal C}\mathcal{DES}\ c \right)\setminus\mathcal J:$
    $$\mathcal{CL}\ t\sqsupseteq \{\mathcal T\mapsto\emptyset\}$$

Rules for MHP exclusion

Program points $s_1$ with $\mathcal T_1$, $\mathcal L_1$ and $\mathcal{CL}_1$ and $s_2$ with $\mathcal T_2$, $\mathcal L_2$ and $\mathcal{CL}_2$ cannot happen in parallel if at least one condition holds:

  • $\exists (t_a\mapsto L_a)\in\mathcal{CL}_1:L_a\cap\mathcal L_2\neq\emptyset,t_a\neq \mathcal T_2$
  • $\exists (t_a\mapsto L_a)\in\mathcal{CL}_2:L_a\cap\mathcal L_1\neq\emptyset,t_a\neq \mathcal T_1$
  • $\exists(t_{a1}\mapsto L_{a1})\in\mathcal{CL}_ 1,(t_{a2}\mapsto L_{a2})\in\mathcal {CL}_ 2: L_{a1}\cap L_{a2}\neq\emptyset\land t_{a1}\neq t_{a2}$

@sim642 sim642 changed the title Improve mhp precision using ancestor locksets Improve MHP precision using ancestor locksets Nov 10, 2025
@michael-schwarz
Copy link
Member

michael-schwarz commented Dec 10, 2025

Before discussing this, I'll start by first explaining the potential fix of the bug just in case this is part of the necessary considerations. The reason the bug happens is the fact that the lockset analysis does some path splitting, thus there exists a create(t2)-statement (the one after node 14) with mutex in the must-lockset.

Stupid question: Is that not a problem of how CL is constructed that is in principle independent of path-sensitivity and can, e.g., also arise as a result of context-sensitivity?

void evil(int x) {
      if(x) lock(a);
      create(t1)
      if(x) unlock(a);
}

void main() {
    // Branching to ensure created thread has a unique tid, even if there is potentially two places it is created : -)
    if (top) {
        evil(0);
    } else {
        evil(1);
    }
}

Assuming evil is analyzed context-sensitively with x in context (which it usually is), we have the same problem here? Or not?

Would $T \to T \to 2^L$ with $2^L$ a must set not be the better choice?
(Alternatively, you may want to write it as $T \to 2^{(T \times 2^L)}$ with the invariant that there is only one tuple (t,L) for each t)

Then, $l \in CL \quad t_d \quad t_0$ means that $t_0$ is a must parent of $t_d$ and it always holds $l$ when creating $t_0$.

Afterthought: how do you deal with ambiguous creators? I guess giving up when the thread id is no longer unique?


The push for a more modular solution was that I implemented something that looks somewhat similar on the surface (#1065) which turned out to cause a slow-down by a factor of 4 (#1120), which we have still not fixed.

But maybe we can go with the descendant global invariant for now and then check later if it causes any slowdown we're unwilling to pay on real programs? We can still go for the more involved local solution later if this is the case?
(Probably something for @dabund24 and @DrMichaelPetter to decide).

@dabund24
Copy link
Author

dabund24 commented Dec 10, 2025

Before discussing this, I'll start by first explaining the potential fix of the bug just in case this is part of the necessary considerations. The reason the bug happens is the fact that the lockset analysis does some path splitting, thus there exists a create(t2)-statement (the one after node 14) with mutex in the must-lockset.

Stupid question: Is that not a problem of how CL is constructed that is in principle independent of path-sensitivity and can, e.g., also arise as a result of context-sensitivity?

void evil(int x) {
      if(x) lock(a);
      create(t1)
      if(x) unlock(a);
}

void main() {
    // Branching to ensure created thread has a unique tid, even if there is potentially two places it is created : -)
    if (top) {
        evil(0);
    } else {
        evil(1);
    }
}

Assuming evil is analyzed context-sensitively with x in context (which it usually is), we have the same problem here? Or not?

I haven't thought about that, but you are right, we do have the same problem here. In 1db14cb I added another test, which covers this. However, I think that the fix for the path-sensitivity problem should fix this, too, since in that case, we would again have another creation statement without the mutex locked and add it to the tainted set (or use your approach).

Afterthought: how do you deal with ambiguous creators? I guess giving up when the thread id is no longer unique?

I was assuming initially, that the three cases in the section "Notes on non-unique thread ids" from the PR-summary are the only relevant cases, where ambiguous creators could be a problem, but thinking about it, that is a really bold claim, which I just should not make without knowing a proof for it. Checking if the descendant threads have a unique thread id wouldn't even result in a loss of precision in most cases[1], since the threadJoin analysis also gives up on non-unique TIDs.

Would T → T → 2 L with 2 L a must set not be the better choice?
(Alternatively, you may want to write it as T → 2 ( T × 2 L ) with the invariant that there is only one tuple (t,L) for each t)

I'm amazed, that is so much nicer O_O

But maybe we can go with the descendant global invariant for now and then check later if it causes any slowdown we're unwilling to pay on real programs? We can still go for the more involved local solution later if this is the case?
(Probably something for @dabund24 and @DrMichaelPetter to decide).

I am going to implement the other case, too, since I am also somewhat intrigued now, how the two approaches will compare. Thanks a lot for your remarks!


[1] it would if we never unlock and never join, but I think that this wouldn't be too tragic

@dabund24 dabund24 marked this pull request as draft December 10, 2025 11:22
@dabund24
Copy link
Author

Is there a way of accessing all (must-)ancestors of a thread? Getting all threads in general or all keys of a global analysis would also work, but I don't see any of that to be possible at first glance. If that's the cas, I can add a MustAncestors analysis, such that $\bigcup_{a\in\mathcal A}\ldots$ becomes possible to implement

@dabund24
Copy link
Author

dabund24 commented Dec 11, 2025

I have not yet completely reviewed it (please re-request my review once you have fixed the bug you found).

The simpler version is now done (at least that's what I believe) and in sync with the PR-summary. If you want to review it already, you can do so. Otherwise, feel free to ignore the review request until the alternative solution is also implemented.

@michael-schwarz
Copy link
Member

If you want to review it already, you can do so.

I probably won't get around to it until some time next week, but I added it to my TODO (list / stack / multiset).

@michael-schwarz
Copy link
Member

Is there a way of accessing all (must-)ancestors of a thread?

I think in general no. However, if you only need the must ancestors of definite thread ids (which I guess is true in your case?), you can reconstruct them as the new create edge is simply appended to the sequence of the parent.

Such a function must_ancestors: TID -> TID list option could then be added to the generic thread id interface, and just return None, i.e., "information not available" for the thread ids which don't allow this trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants