Skip to content

Prateek2007-cmd/HELIX-genomic-sequencing-engine

Repository files navigation

HELIX Genomic Sequencing Engine 🧬

HELIX is a research-grade, algorithmic pipeline and web dashboard designed for genomic sequencing, assembly, and analysis. It combines over 20 classical and advanced computer science algorithms to simulate the end-to-end process of reading, assembling, aligning, and analyzing DNA sequences.

Unlike basic sequence simulators, HELIX utilizes a suite of dynamic programming, backtracking, greedy algorithms, and graph theory techniques to handle complex biological simulations—such as tumor aneuploidy, sex determination, and primer placement.

HELIX Dashboard Preview


🌟 Key Features

1. Advanced Assembly & Alignment

  • Eulerian vs Hamiltonian Assembly: Demonstrates the performance shift from NP-Complete (Hamiltonian Path via Backtracking) to Linear Time $O(V+E)$ (Eulerian Path via De Bruijn Graphs) when assembling shredded k-mer reads.
  • Smith-Waterman Alignment: Uses dynamic programming to identify exact local alignments and accurately flag mutations (Substitutions, Insertions, Deletions, Frameshifts).

2. Biological Simulation & Intelligence

  • Tumor Aneuploidy Simulation: Uses k-Color Graph Coloring to phase haplotypes and determine copy number variations (e.g., detecting 3N or 4N karyotypes in simulated cancer cells).
  • Sex Determination Consensus: Aggregates three independent algorithms (Coverage ratio, SRY gene detection, and Heterozygosity rates) to confidently predict the biological sex of the sample.
  • Gene-Level Tracking: Tracks the status of critical clinical markers (e.g., TP53, BRCA1, KRAS), especially when cancer simulation is toggled.

3. Resource Optimization & Efficiency

  • Huffman Compression: Compresses standard A/T/G/C streams based on frequency, saving disk space dynamically (often achieving ~20-30% savings).
  • 0/1 Knapsack Read Selection: Discards low-quality overlapping reads intelligently within a strict RAM budget, optimizing for quality vs memory footprint.
  • Job Sequencing with Deadlines: Prioritizes the sequencing of high-value clinical genes (like BRCA1) over intergenic "junk" regions using weighted profit metrics.

🛠️ Tech Stack

  • Backend Engine: Python 3, FastAPI, Uvicorn
  • Frontend Dashboard: React, TypeScript, Vite, Tailwind CSS
  • Design Pattern: Glassmorphism UI with real-time "Algorithm Explorer" and "Intelligence" visualizers.

🚀 Quick Start Guide

Prerequisites

Make sure you have Node.js (for the React frontend) and Python 3.8+ installed.

1. Start the Backend API

Navigate to the root directory and install the required dependencies (if you haven't already):

pip install fastapi uvicorn pydantic

Run the FastApi server:

python api.py

(The server will start on http://localhost:8000)

2. Start the Frontend Dashboard

Open a new terminal window, navigate to the frontend folder, and run the Vite dev server:

cd frontend
npm install
npm run dev

(The frontend will be available at http://localhost:5173)


🧪 How to Use the Dashboard

  1. Configure Parameters: On the left panel, select your target species (e.g., Human, Mouse, SARS-CoV-2) or input your own custom DNA sequence.
  2. Adjust Simulation Specs: Tweak the genome length, sequencing coverage depth, and expected mutation count. Toggle "Simulate Cancer/Tumor" to observe aneuploidy and targeted gene mutations.
  3. Launch the Pipeline: Click Launch HELIX Pipeline. The frontend will fetch data from the Python backend and instantly populate the results.
  4. Explore the Tabs:
    • Overview: View the reconstructed genome sequence and the Boy/Girl consensus results.
    • Intelligence: View research-grade assembly metrics (N50, L50, GC Content), deep mutation analytics (SNPs vs INDELs), and human-readable genomic insights.
    • Algorithm Explorer: Click on individual algorithms (like De Bruijn Graphs or Huffman Compression) to see exactly how your specific DNA data was mathematically processed.
    • Comparisons: View the "Anytime Algorithm" charts, comparing Greedy, DP, and B&B time complexities.
    • Advanced: View specific outputs for the Reliability DP, Job Sequencing algorithms, and N-Queens primer placement.

🧠 Core Algorithms Implemented

Category Algorithm Purpose in HELIX
Graphs De Bruijn Graph Reconstructing DNA sequences via Eulerian paths.
Graphs K-Coloring Phasing haplotypes and estimating aneuploidy.
Dynamic Programming Smith-Waterman Optimal local sequence alignment.
Dynamic Programming 0/1 Knapsack Selecting the highest quality reads within a RAM budget.
Greedy Huffman Coding Compressing output DNA text efficiently.
Greedy Job Sequencing Prioritizing clinical cancer genes for processing.
Backtracking N-Queens Placing primers cleanly without overlapping repeat regions.
Backtracking Hamiltonian Path Used purely to demonstrate the inefficiency of NP-Complete approaches vs Eulerian paths.

👨‍💻 Developer Notes

If you are modifying the Python engine (helix_main.py), note that api.py does not hot-reload by default. If you make changes to the backend algorithms, you must restart the python api.py terminal process for the frontend to receive the new data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors