Skip to content

feat(evolution): add IC-based factor deduplication penalty#38

Merged
NeuZhou merged 2 commits intomasterfrom
feat/ic-dedup
Apr 13, 2026
Merged

feat(evolution): add IC-based factor deduplication penalty#38
NeuZhou merged 2 commits intomasterfrom
feat/ic-dedup

Conversation

@NeuZhou
Copy link
Copy Markdown
Owner

@NeuZhou NeuZhou commented Apr 13, 2026

P0: IC-Based Factor Deduplication

Add Information Coefficient (IC) correlation as a fitness adjustment factor to prevent the GA from converging on redundant, highly-correlated factor combinations.

What it does

  • Computes pairwise correlation between active factors for each DNA evaluation
  • High correlation (>0.7): Linear penalty from 1.0x down to 0.7x — discourages redundant factor combos
  • Low correlation (<0.3): Linear bonus from 1.0x up to 1.15x — rewards diverse factor selection
  • Mid-range (0.3-0.7): Neutral, no effect

Why it matters

The existing HHI-based factor_diversity_bonus only checks weight concentration (whether weights are spread across many factors). But it doesn't check whether the factors themselves are redundant — two perfectly correlated factors with equal weights would get a diversity bonus despite being functionally identical.

IC dedup addresses this: factors with similar signals get penalized, pushing the GA toward genuinely diverse strategies.

Implementation

  • stratevo/evolution/scoring.py: New ic_correlation_penalty param in compute_fitness()
  • stratevo/evolution/auto_evolve.py: _compute_ic_correlation() method with efficient sampling (20 stocks x 50 dates) and per-generation caching
  • tests/test_ic_dedup.py: 36 comprehensive tests (penalty math, diverse vs redundant, Pearson edge cases, caching, backward compat)

Testing

  • 36 new tests, all pass
  • 1835 existing tests pass (1 pre-existing version mismatch failure unrelated to this change)
  • Penalty is mild (0.7x at worst) to avoid killing convergence

Add ic_correlation_penalty parameter to compute_fitness() that penalizes
strategies whose active factors are highly correlated (measuring the same
signal). This complements the existing HHI-based factor_diversity_bonus
which only checks weight concentration, not actual factor similarity.

- scoring.py: ic_correlation_penalty multiplier (0.7x–1.15x)
  - corr > 0.7: penalty down to 0.7x at corr=1.0
  - corr < 0.3: bonus up to 1.15x at corr=0.0
  - [0.3, 0.7]: neutral (1.0x)
- auto_evolve.py: lightweight IC correlation computation in evaluate()
  - Samples 20 stocks × 50 dates from pre-computed indicators
  - Cached per generation via gen_seed + active factor set
- tests/test_ic_dedup.py: 36 tests covering penalty math, edge cases,
  caching, backward compatibility, and Pearson correlation helper

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
stratevo/evolution/auto_evolve.py 93.68% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@NeuZhou NeuZhou merged commit 42dbd12 into master Apr 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant