Adobe India Hackathon 2025

Welcome to the "Connecting the Dots" Challenge

This repository contains the solution for the Adobe India Hackathon 2025, including the persona-driven document intelligence system for Challenge 1b.

Challenge 1b: Persona-Driven Document Intelligence

This solution is a Python-based application that analyzes a collection of PDF documents and extracts the most relevant sections based on a given persona and job-to-be-done.

Features

Persona-Based Analysis: Ranks document sections based on their relevance to a user's persona and task.
Extractive Summarization: Provides concise summaries of the most important sections.
Dockerized Solution: The application is containerized for optional deployment and execution.

Project Structure

.
├── Dockerfile
├── README.md
├── approach_explanation.md
├── requirements.txt
├── src/
│ ├── main.py
│ ├── pdf_parser.py
│ ├── ranking.py
│ └── summarizer.py
└── Challenge_1b/
    └── [Collection Name]/
        ├── PDFs/
        │   ├── document1.pdf
        │   └── ...
        └── challenge1b_input.json

⚙️ How to Run the Solution

You can run the solution with or without Docker.

✅ Recommended: Run Locally Without Docker (CPU + Offline Compatible)

This solution is fully functional on CPU and does not require an internet connection once dependencies are installed.

Install Dependencies (Manually):

Important: Installing spacy inside Docker can be time-consuming. To avoid long build times, it is highly recommended to install all packages separately on your system before execution.

Run the following command in the root directory:
```
pip install -r requirements.txt
```
Download Required Spacy Model (only once):

If you're online during setup, run:
```
python -m spacy download en_core_web_sm
```
💡 Skip this if the model is already downloaded or if running offline.
Prepare Input Data:
- Inside the Challenge_1b directory, create a new folder for your collection.
- Add a PDFs/ subdirectory with all your input PDFs.
- Add a challenge1b_input.json file at the root of that collection folder (use examples as reference).
Run the Application:

From the root project directory:
```
python src/main.py
```
This will process all collections under the Challenge_1b directory and generate corresponding challenge1b_output.json files inside each collection folder.

🐳 Optional: Run with Docker (if needed)

Build Docker Image:

⚠️ Note: Building the Docker image may take time due to Spacy installation.
```
docker build --platform linux/amd64 -t document-intelligence .
```
Run the Docker Container:
```
docker run --rm -v "%cd%/Challenge_1b":/app/Challenge_1b document-intelligence
```
This will process all collections and generate the output files in the same directory.

📄 For More Details

Refer to approach_explanation.md for a full explanation of the methodology, design, and component structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adobe India Hackathon 2025

Welcome to the "Connecting the Dots" Challenge

Challenge 1b: Persona-Driven Document Intelligence

Features

Project Structure

⚙️ How to Run the Solution

✅ Recommended: Run Locally Without Docker (CPU + Offline Compatible)

🐳 Optional: Run with Docker (if needed)

📄 For More Details

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Challenge_1b		Challenge_1b
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
approach_explanation.md		approach_explanation.md
requirements.txt		requirements.txt

ajaycode265/round_1B

Folders and files

Latest commit

History

Repository files navigation

Adobe India Hackathon 2025

Welcome to the "Connecting the Dots" Challenge

Challenge 1b: Persona-Driven Document Intelligence

Features

Project Structure

⚙️ How to Run the Solution

✅ Recommended: Run Locally Without Docker (CPU + Offline Compatible)

🐳 Optional: Run with Docker (if needed)

📄 For More Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages