Skip to content

perf: optimize topological sort to O(N+E) complexity #803

@hartym

Description

@hartym

Summary

The topological sort implementation in ApplicationsRegistry.resolve_dependencies() has O(N² + NE) time complexity due to a nested loop pattern. While this performs adequately for HARP's typical use cases (<10 applications), it could be optimized to O(N + E) by building a reverse dependency graph.

Current Implementation

File: harp/config/applications.py:319-323

# Find apps that depend on this node
for app_name in original_order:  # O(N) iteration
    if node in graph[app_name]:   # O(E) check for each
        in_degree[app_name] -= 1
        if in_degree[app_name] == 0:
            queue.append(app_name)

Current Complexity: O(N² + NE)

  • For each processed node (N iterations in while loop)
  • We iterate through all applications (N iterations)
  • Checking if node is in dependencies list (up to E checks)

Proposed Optimization

Build a reverse dependency graph upfront:

# Build reverse dependency graph (add after line 295)
reverse_graph = defaultdict(list)  # dependents_of[dep] = [apps that depend on dep]
for app_name in self._applications:
    for dep in graph[app_name]:
        reverse_graph[dep].append(app_name)

# Then replace lines 319-323 with direct lookup:
for dependent in reverse_graph[node]:
    in_degree[dependent] -= 1
    if in_degree[dependent] == 0:
        queue.append(dependent)

Optimized Complexity: O(N + E)

  • Building reverse graph: O(N + E) - iterate apps and their deps once
  • Processing queue: O(N + E) - each app and dependency processed once

Why This Isn't Critical

Current Performance:

This optimization would save: ~0.05ms at startup (unmeasurable)

Why defer:

  • One-time startup operation, not in hot path
  • Current implementation is correct and well-tested
  • No user-facing performance issues
  • Optimization would require re-testing all 33 dependency tests

When to Implement

This optimization becomes relevant if:

  1. HARP supports 50+ applications in a single system (unlikely)
  2. Startup time becomes a bottleneck (currently not an issue)
  3. Someone wants to contribute a "good first issue" (learning opportunity)

Labels

  • enhancement - improves asymptotic complexity
  • performance - minor performance improvement
  • good first issue - well-defined scope, clear solution
  • low priority - works fine as-is

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions