🔍 HTML Search SPA

A full-stack semantic search tool for crawling, chunking, and vectorizing HTML content from any public website URL.

🎥 Demo

🌐 Features

Accepts any public website URL.
Parses and chunks HTML content.
Indexes with Weaviate vector database.
Lets you search semantically across website content.
Shows match percentage, context preview, and raw HTML.

🛠️ Tech Stack

Frontend: Next.js
Backend: Python (Flask)
Vector DB: Weaviate
NLP Embedding: HuggingFace SentenceTransformer (all-MiniLM-L6-v2)

🚀 Running the Project

Backend

cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py

Frontend

cd frontend
npm install
npm run dev

Vector DB

docker-compose up -d

📁 Folder Structure

html-search-spa/
├── backend/
│ ├── app.py
│ ├── utils/
│ ├── requirements.txt
├── frontend/
│ ├── public/
│ ├── src/
│ │ ├── App.jsx
│ │ ├── components/
│ ├── tailwind.config.js
│ ├── package.json
├── weaviate/
│ ├── docker-compose.yml
├── README.md
├── .gitignore

To Push to GitHub

git init
git add .
git commit -m "Initial commit: HTML Search SPA"
git remote add origin https://github.com/your-username/your-repo.git
git push -u origin main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 HTML Search SPA

🎥 Demo

🌐 Features

🛠️ Tech Stack

🚀 Running the Project

📁 Folder Structure

To Push to GitHub

📃 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
backend		backend
demo		demo
frontend		frontend
weaviate		weaviate
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🔍 HTML Search SPA

🎥 Demo

🌐 Features

🛠️ Tech Stack

🚀 Running the Project

📁 Folder Structure

To Push to GitHub

📃 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages