Skip to content

bytesandroses/alignment-scorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSA Alignment Comparison Tool

A high-performance CLI tool for automating the validation and comparison of biological datasets. This tool calculates accuracy metrics between Reference Alignments and Realignments using vectorized NumPy operations.


What the Tool Does

For each pair of alignments, the script:

  1. Reads the FASTA files and normalizes the sequences.
  2. Converts each alignment into a numeric “coordinate matrix,” where gaps are 0 and nucleotides are assigned their 1‑based ungapped position.
  3. Compares the two matrices to count how many positions differ.
  4. Calculates an overall structural accuracy score.
  5. Writes all results to a clean CSV file.

Installation

Requires Python 3.10+.

Install dependencies:

pip install biopython numpy natsort

Usage Examples

usage: msa_scorer.py [-h] -r REF -a REAL [-o OUTPUT] [-e EXTENSIONS [EXTENSIONS ...]] [-v] [--strict]

options:
  -h, --help            show this help message and exit
  -r REF, --ref REF     Reference alignment directory
  -a REAL, --real REAL  Realignment directory
  -o OUTPUT, --output OUTPUT
                        Output CSV path (default: results.csv)
  -v, --verbose         Enable debug logging
  --strict              Exit pipeline on first error

Compare two directories of alignments:

python msa_scorer.py --ref ref_dir --real real_dir

Save results under a custom filename:

python msa_scorer.py -r ref -a real -o comparison_results.csv

Enable debugging output:

python msa_scorer.py -r ref -a real -v

Stop immediately on error:

python msa_scorer.py -r ref -a real --strict

Specify your own set of acceptable file extensions:

python msa_scorer.py -r ref -a real -e .fa .fasta .aln

Output Format

The script writes a CSV with one row per paired comparison. Each row includes:

  • simulation_id
  • reference_file
  • realignment_file
  • total_differences
  • total_positions
  • accuracy_percent
  • sequences_count
  • alignment_length

Example:

0, sample1.fasta, sample1.fa, 42, 10500, 99.60, 12, 880

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages