An AI-powered biomedical text summarization system that produces structured, evidence-backed insights from research literature.
BioSum Reliable is an intelligent biomedical NLP system that converts complex research documents into structured summaries. The system combines extractive NLP, transformer-based abstractive summarization, and biomedical entity recognition to produce interpretable and evidence-supported summaries of biomedical literature.
🚀 Live Demo 👉 https://huggingface.co/spaces/saloni-1919/biosum-reliable
OR
Watch the Video!
- Evidence-based extractive summarization
- Transformer-based abstractive summarization
- Biomedical entity recognition
- Structured research summaries (Objective, Methods, Results, Conclusion)
- Evidence scoring for transparency
- REST API with FastAPI
- Interactive Swagger API documentation
- Docker-ready deployment
BioSum Reliable can be used in several biomedical and research workflows:
- Summarizing biomedical research papers
- Extracting key findings from clinical studies
- Supporting literature review for researchers
- Assisting medical professionals in reviewing research
- Integrating biomedical summarization into AI pipelines
Objective
This study evaluated metformin response in adults with diabetes.
Methods
We reviewed 100 patients and compared baseline glucose and HbA1c values.
Results
HbA1c improved after treatment and fewer patients required insulin rescue therapy.
Conclusion
Metformin improved glycemic control in this cohort.
Important sentences selected directly from the research document using statistical scoring and biomedical heuristics.
A refined summary generated using a transformer model trained on biomedical literature.
Example extracted entities:
- Disease: Diabetes
- Drug: Metformin
- Drug: Insulin
- Measure: HbA1c
BioSum Reliable processes biomedical text through several NLP stages.
- Input biomedical research text
- Sentence segmentation and preprocessing
- Biomedical entity extraction
- Evidence-based sentence scoring
- Extractive summarization
- Transformer-based abstractive summarization
- Structured summary generation
- Final summary with supporting evidence
- Python
- FastAPI
- PyTorch
- HuggingFace Transformers
- spaCy NLP
- Uvicorn ASGI Server
This project demonstrates a production-style machine learning pipeline combining NLP, transformer models, and API deployment.
git clone https://github.com/saloni-1919/biosum-reliable.git
cd biosum-reliableInstall all required Python packages using the requirements file.
pip install -r requirements.txtStart the FastAPI application.
uvicorn app.main:app --reloadOnce the server starts, open your browser and navigate to:
http://127.0.0.1:8000/docs
This will open the interactive Swagger API interface where you can test the summarization API.
POST /api/summarize
{
"text": "Biomedical research text...",
"target_sentences": 5,
"abstractive": true
}- Structured summary
- Extractive summary
- Final summarized output
- Key biomedical entities
- Evidence-ranked sentences
biosum-reliable
│
├── app
│ ├── api
│ ├── core
│ ├── ml
│ ├── services
│ └── main.py
│
├── docs
│ ├── demo.gif
│ ├── interface.png
│ ├── output1.png
│ ├── output2.png
│ └── output3.png
│
├── tests
├── LICENSE
├── Dockerfile
├── Procfile
├── pyproject.toml
├── requirements.txt
└── README.md
Note: Model weights are excluded from the repository due to GitHub file size limits. The model can be reproduced using the training script above.
The abstractive summarization component was trained using a transformer-based model on biomedical research literature.
PubMed Scientific Papers Dataset
BART Large CNN
| Parameter | Value |
|---|---|
| Training samples | 200 |
| Validation samples | 50 |
| Epochs | 1 |
| Learning rate | 2e-5 |
- Final training loss: 2.57
- Validation loss: 1.97
python app/ml/train_biomedical_summarizer.pyBioSum Reliable follows a hybrid summarization pipeline combining extractive NLP techniques with transformer-based abstractive summarization.
The system processes biomedical research articles through the following stages:
-
Text Input
- Biomedical research abstract or document is provided by the user.
-
Preprocessing
- Sentence segmentation
- Text normalization
- Section detection (Objective, Methods, Results, Conclusion)
-
Biomedical Entity Recognition
- Identification of domain-specific entities such as:
- Diseases
- Drugs
- Clinical measurements
- Treatments
- Identification of domain-specific entities such as:
-
Evidence-Based Sentence Scoring
- Sentences are ranked based on relevance using statistical scoring.
-
Extractive Summarization
- Top-ranked sentences are selected to produce an evidence-supported summary.
-
Transformer-Based Abstractive Summarization
- A fine-tuned BART model generates a concise natural-language summary.
-
Structured Research Output
- Results are organized into structured sections:
- Objective
- Methods
- Results
- Conclusion
- Results are organized into structured sections:
-
Evidence Transparency
- The system highlights supporting sentences used to generate the summary.
Biomedical Text
│
▼
Sentence Segmentation
│
▼
Biomedical Entity Recognition
│
▼
Evidence Sentence Scoring
│
▼
Extractive Summary
│
▼
Transformer Abstractive Model (BART)
│
▼
Structured Research Summary
The system generates multiple outputs for interpretability:
- Structured summary (Objective, Methods, Results, Conclusion)
- Extractive summary
- Abstractive final summary
- Key biomedical entities
- Evidence-ranked supporting sentences
This multi-stage architecture ensures both interpretability and high-quality summarization for biomedical literature.
- Train on larger biomedical datasets
- Improve entity recognition accuracy
- Add citation extraction
- Support full research paper summarization
- Deploy model using HuggingFace inference API
This project is licensed under the MIT License.
See the LICENSE file for details.
Saloni Nathani
GitHub: https://github.com/saloni-1919




