HELIX is a research-grade, algorithmic pipeline and web dashboard designed for genomic sequencing, assembly, and analysis. It combines over 20 classical and advanced computer science algorithms to simulate the end-to-end process of reading, assembling, aligning, and analyzing DNA sequences.
Unlike basic sequence simulators, HELIX utilizes a suite of dynamic programming, backtracking, greedy algorithms, and graph theory techniques to handle complex biological simulations—such as tumor aneuploidy, sex determination, and primer placement.
-
Eulerian vs Hamiltonian Assembly: Demonstrates the performance shift from NP-Complete (Hamiltonian Path via Backtracking) to Linear Time
$O(V+E)$ (Eulerian Path via De Bruijn Graphs) when assembling shreddedk-merreads. - Smith-Waterman Alignment: Uses dynamic programming to identify exact local alignments and accurately flag mutations (Substitutions, Insertions, Deletions, Frameshifts).
- Tumor Aneuploidy Simulation: Uses k-Color Graph Coloring to phase haplotypes and determine copy number variations (e.g., detecting
3Nor4Nkaryotypes in simulated cancer cells). - Sex Determination Consensus: Aggregates three independent algorithms (Coverage ratio, SRY gene detection, and Heterozygosity rates) to confidently predict the biological sex of the sample.
- Gene-Level Tracking: Tracks the status of critical clinical markers (e.g.,
TP53,BRCA1,KRAS), especially when cancer simulation is toggled.
- Huffman Compression: Compresses standard
A/T/G/Cstreams based on frequency, saving disk space dynamically (often achieving ~20-30% savings). - 0/1 Knapsack Read Selection: Discards low-quality overlapping reads intelligently within a strict RAM budget, optimizing for quality vs memory footprint.
- Job Sequencing with Deadlines: Prioritizes the sequencing of high-value clinical genes (like
BRCA1) over intergenic "junk" regions using weighted profit metrics.
- Backend Engine: Python 3, FastAPI, Uvicorn
- Frontend Dashboard: React, TypeScript, Vite, Tailwind CSS
- Design Pattern: Glassmorphism UI with real-time "Algorithm Explorer" and "Intelligence" visualizers.
Make sure you have Node.js (for the React frontend) and Python 3.8+ installed.
Navigate to the root directory and install the required dependencies (if you haven't already):
pip install fastapi uvicorn pydanticRun the FastApi server:
python api.py(The server will start on http://localhost:8000)
Open a new terminal window, navigate to the frontend folder, and run the Vite dev server:
cd frontend
npm install
npm run dev(The frontend will be available at http://localhost:5173)
- Configure Parameters: On the left panel, select your target species (e.g., Human, Mouse, SARS-CoV-2) or input your own custom DNA sequence.
- Adjust Simulation Specs: Tweak the genome length, sequencing coverage depth, and expected mutation count. Toggle "Simulate Cancer/Tumor" to observe aneuploidy and targeted gene mutations.
- Launch the Pipeline: Click
Launch HELIX Pipeline. The frontend will fetch data from the Python backend and instantly populate the results. - Explore the Tabs:
- Overview: View the reconstructed genome sequence and the Boy/Girl consensus results.
- Intelligence: View research-grade assembly metrics (N50, L50, GC Content), deep mutation analytics (SNPs vs INDELs), and human-readable genomic insights.
- Algorithm Explorer: Click on individual algorithms (like De Bruijn Graphs or Huffman Compression) to see exactly how your specific DNA data was mathematically processed.
- Comparisons: View the "Anytime Algorithm" charts, comparing Greedy, DP, and B&B time complexities.
- Advanced: View specific outputs for the Reliability DP, Job Sequencing algorithms, and N-Queens primer placement.
| Category | Algorithm | Purpose in HELIX |
|---|---|---|
| Graphs | De Bruijn Graph | Reconstructing DNA sequences via Eulerian paths. |
| Graphs | K-Coloring | Phasing haplotypes and estimating aneuploidy. |
| Dynamic Programming | Smith-Waterman | Optimal local sequence alignment. |
| Dynamic Programming | 0/1 Knapsack | Selecting the highest quality reads within a RAM budget. |
| Greedy | Huffman Coding | Compressing output DNA text efficiently. |
| Greedy | Job Sequencing | Prioritizing clinical cancer genes for processing. |
| Backtracking | N-Queens | Placing primers cleanly without overlapping repeat regions. |
| Backtracking | Hamiltonian Path | Used purely to demonstrate the inefficiency of NP-Complete approaches vs Eulerian paths. |
If you are modifying the Python engine (helix_main.py), note that api.py does not hot-reload by default. If you make changes to the backend algorithms, you must restart the python api.py terminal process for the frontend to receive the new data.
