Toxicity Analysis

The Toxicity Analyzer identifies modules that cannot be safely snapshotted and restored.

Overview

Some Python code creates state that cannot be reset via memory snapshots:

Threading: Background threads, locks, condition variables
Networking: Open sockets, connections
Subprocesses: Child processes, file descriptors
FFI: C extensions with global state

Tach detects these patterns statically and marks affected tests as "toxic", forcing them to run in isolated processes that exit after each test.

flowchart TB
    subgraph Analysis["LOCAL ANALYSIS"]
        Scan["Scan .py files"]
        Parse["Parse AST"]
        Detect["Detect toxic patterns"]
        Report["ToxicityReport"]
    end

    subgraph Graph["GRAPH PROPAGATION"]
        Build["Build dependency graph"]
        Propagate["Fixed-point iteration"]
        Tag["Tag all reachable modules"]
    end

    subgraph Output["OUTPUT"]
        Safe["Safe Tests<br/>(Hypervisor Mode)"]
        Toxic["Toxic Tests<br/>(Isolation Mode)"]
    end

    Analysis --> Graph --> Output

Data Structures

ToxicityReport

Result of analyzing a single file.

pub struct ToxicityReport {
    pub is_toxic: bool,
    pub reasons: Vec<String>,
    pub imports: Vec<String>,
}

Field	Description
`is_toxic`	Whether the file contains toxic patterns
`reasons`	Human-readable explanations
`imports`	All detected imports (for graph construction)

ModuleNode

Data stored in each graph node.

pub struct ModuleNode {
    pub name: String,
    pub path: PathBuf,
    pub is_toxic: bool,
    pub reasons: Vec<String>,
}

ToxicityGraph

The dependency graph for toxicity propagation.

pub struct ToxicityGraph {
    graph: DiGraph<ModuleNode, ()>,
    name_to_node: HashMap<String, NodeIndex>,
    path_to_node: HashMap<PathBuf, NodeIndex>,
}

Uses petgraph::graph::DiGraph where an edge A -> B means "A imports B".

Toxic Patterns

Standard Library Blocklist

const TOXIC_STD_LIB: &[&str] = &[
    "threading",
    "_thread",
    "multiprocessing",
    "socket",
    "ctypes",
    "signal",
    "concurrent.futures",
];

External Module Blocklist

const TOXIC_EXTERNAL_MODULES: &[&str] = &[
    "grpc",
    "pandas",      // OpenMP threads
    "tensorflow",  // CUDA state
    "torch",       // CUDA state
    "cv2",         // OpenCV threads
    "gevent",      // Greenlets
    "cffi",
];

Dynamic Import Patterns

Pattern	Example	Reason
`__import__`	`__import__("threading")`	Runtime module loading
`exec`	`exec("import socket")`	Arbitrary code execution
`importlib.import_module`	`importlib.import_module("ctypes")`	Dynamic imports

Star Imports

from threading import *  # Toxic - imports Thread, Lock, etc.

Star imports from toxic modules are aggressively marked toxic.

Toxic Calls

import threading
t = threading.Thread(target=fn)  # Toxic call detected

Direct calls to functions from toxic modules are detected even with aliasing.

Propagation Algorithm

Toxicity propagates transitively through the import graph:

graph TD
    A[test_user.py] --> B[auth.py]
    B --> C[crypto_utils.py]
    C --> D[ctypes]

    style D fill:#f66
    style C fill:#f96
    style B fill:#fc6
    style A fill:#ff6

    subgraph Legend
        L1[Directly Toxic]
        L2[Transitively Toxic]
    end

Fixed-Point Iteration

1. Build directed graph: Module -> Imports
2. Analyze each module for LOCAL toxicity
3. Fixed-point iteration:
   REPEAT:
     FOR each edge (from, to):
       IF to.is_toxic AND NOT from.is_toxic:
         from.is_toxic = true
         from.reasons.push("Imports toxic module '{to.name}'")
   UNTIL no changes
4. Result: Complete transitive closure of toxicity

Implementation

The propagate method is an internal helper called by build():

impl ToxicityGraph {
    /// Private method - called internally by build()
    fn propagate(&mut self) {
        loop {
            let mut changed = false;

            // Collect edges to avoid borrow issues
            let edges: Vec<(NodeIndex, NodeIndex)> = self
                .graph
                .edge_indices()
                .filter_map(|e| self.graph.edge_endpoints(e))
                .collect();

            for (from_idx, to_idx) in edges {
                let to_toxic = self.graph[to_idx].is_toxic;
                let to_name = self.graph[to_idx].name.clone();

                if to_toxic && !self.graph[from_idx].is_toxic {
                    self.graph[from_idx].is_toxic = true;
                    self.graph[from_idx]
                        .reasons
                        .push(format!("Imports toxic module '{}'", to_name));
                    changed = true;
                }
            }

            if !changed {
                break;
            }
        }
    }
}

Integration with Test Pipeline

sequenceDiagram
    participant Disc as Discovery
    participant Tox as Toxicity
    participant Sched as Scheduler
    participant Work as Worker

    Disc->>Tox: TestModule[]
    Tox->>Tox: analyze_all()
    Tox->>Tox: build_graph()
    Tox->>Tox: propagate()
    Tox->>Sched: RunnableTest[] with is_toxic

    loop For each test
        Sched->>Work: TestPayload{is_toxic}
        alt is_toxic = false
            Work->>Work: Apply Seccomp
            Work->>Work: Run test
            Work->>Work: Reset memory
        else is_toxic = true
            Work->>Work: Skip Seccomp
            Work->>Work: Run test
            Work->>Work: exit(0)
        end
    end

False Positive Mitigation

TYPE_CHECKING Blocks

Imports inside if TYPE_CHECKING: blocks are skipped:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    import threading  # NOT toxic - only for type hints

Conditional Imports

Currently, all imports are detected regardless of runtime conditions:

if sys.platform == "win32":
    import ctypes  # Still marked toxic

This is conservative but safe.

Key Functions

analyze_file

Analyzes a single Python file for local toxicity.

pub fn analyze_file(source: &str, path: &Path) -> ToxicityReport

Parameter	Description
`source`	Python source code as a string
`path`	Path to the file (used for error messages)

Returns a ToxicityReport directly (not wrapped in Result).

ToxicityGraph::build

Constructs the dependency graph from all project files.

pub fn build(paths: &[PathBuf], project_root: &Path) -> Self

Parameter	Description
`paths`	List of Python file paths to analyze
`project_root`	Root directory for module name resolution

This method:

Indexes all files (path to module name)
Analyzes each file for local toxicity
Builds import edges
Propagates toxicity transitively

ToxicityGraph::is_toxic

Queries whether a module is toxic (including transitively).

pub fn is_toxic(&self, path: &Path) -> bool

Worker Behavior

Test Type	Seccomp	After Execution	Worker Fate
Safe	Applied	Memory reset	Continues in pool
Toxic	Skipped	`exit(0)`	Replaced

Toxic workers skip Seccomp because they may legitimately need:

fork/exec for subprocess tests
socket for network tests

Research References

This implementation is informed by the following research papers (see docs/pdfs/txt/ for full text):

Paper	Key Contribution
Fork Safety of Python C-Extensions	Orphaned lock scenarios, async-signal-safety, "Poison Fork" triggers (OpenMP, CUDA, gRPC)
Rust Static Analysis for Toxic Python Modules	Taxonomy of import-time toxicity, `ruff_python_parser` integration, fixed-point iteration
Python Monorepo Zygote Tree Design	Toxicity propagation rules, contagion model ("if A imports toxic B, A is toxic")

Implementation Note: Tach uses rustpython-parser for AST analysis. The research paper analyzed ruff_python_parser as an alternative but the implementation chose rustpython-parser for API stability.

Key Technical Details from Research

Orphaned Locks: fork() only clones the calling thread - background threads (BLAS workers, gRPC pollers) vanish, leaving mutexes permanently locked
POSIX Constraint: Post-fork, only async-signal-safe functions are safe to call - Python interpreter is NOT async-signal-safe
Detection Patterns: threading.Thread().start(), ssl.create_default_context(), multiprocessing.Pool() at module scope (depth=0)
C-Extension Blindspot: Static analysis cannot see into compiled .so files - consider ld-linux.so auditing for thread spawning detection
if name == "main" Guard: Must not flag code inside this guard as toxic (only runs when executed as main)

See Research Overview for complete analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toxicity Analysis

Overview

Data Structures

ToxicityReport

ModuleNode

ToxicityGraph

Toxic Patterns

Standard Library Blocklist

External Module Blocklist

Dynamic Import Patterns

Star Imports

Toxic Calls

Propagation Algorithm

Fixed-Point Iteration

Implementation

Integration with Test Pipeline

False Positive Mitigation

TYPE_CHECKING Blocks

Conditional Imports

Key Functions

analyze_file

ToxicityGraph::build

ToxicityGraph::is_toxic

Worker Behavior

Related Documentation

Research References

Key Technical Details from Research

FilesExpand file tree

toxicity.md

Latest commit

History

toxicity.md

File metadata and controls

Toxicity Analysis

Overview

Data Structures

ToxicityReport

ModuleNode

ToxicityGraph

Toxic Patterns

Standard Library Blocklist

External Module Blocklist

Dynamic Import Patterns

Star Imports

Toxic Calls

Propagation Algorithm

Fixed-Point Iteration

Implementation

Integration with Test Pipeline

False Positive Mitigation

TYPE_CHECKING Blocks

Conditional Imports

Key Functions

analyze_file

ToxicityGraph::build

ToxicityGraph::is_toxic

Worker Behavior

Related Documentation

Research References

Key Technical Details from Research