Skip to content

MAMA Search is a multi-threaded search engine built in Java that efficiently indexes and searches documents using MongoDB. It supports concurrent queries and highlights practical applications of parallelism and NoSQL data handling.

License

Notifications You must be signed in to change notification settings

AliAlaa88/MAMA-Search

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAMA Search

MAMA Search is a Java‑based search engine that demonstrates the core functionalities of a modern search platform — including web crawling, indexing, ranking, and user query processing. It features a modular design, multithreading, and a React‑powered web interface.

Features

🌐 Web Crawler

  • Multithreaded crawler with configurable thread count
  • Respects robots.txt and skips duplicates or non‑HTML pages
  • URL normalization and persistent crawl state
  • Seed‑based queue system — crawled up to 6 000 pages

🧠 Indexer

  • Parses HTML content and stores data with positional and tag‑based metadata
  • Custom, incremental schema for lightning‑fast updates
  • Supports retrieval by word, stemmed forms, and tags (title, headers, body)

🔎 Query Processor & Phrase Search

  • Stemming support (e.g., “travel” ⇒ “traveling”, “traveler”)

  • Phrase search using quotes (e.g., "machine learning")

  • Boolean logic (up to two operations per query):

    • "football player" OR "tennis player"
    • "machine learning" AND "AI"
    • "deep learning" NOT "CNN"

📊 Ranker

  • TF‑IDF for relevance scoring
  • PageRank for measuring page importance
  • Hybrid ranking: relevance × popularity

⚡ Performance & Caching

  • Searches our 6 000‑document index in < 0.2 s per query
  • Result caching for instant responses on repeated queries

💻 Web Interface

  • React‑powered frontend with real‑time auto‑suggestions

  • Displays:

    • Page title
    • URL
    • Snippets with highlighted terms
  • Shows search time and paginates results


Technologies Used

  • Java: Core logic (crawler, indexer, query processor, ranker)
  • MongoDB: Index persistence & suggestion store
  • React: Frontend UI (create‑react‑app)
  • HTML/CSS/JS: Styling & interaction
  • Git & GitHub: Version control
  • Agile: Iterative development process

Setup & Run

1. Clone the repo

git clone https://github.com/galelo04/MAMA-Search.git
cd MAMA-Search

2. Install dependencies

  • Frontend:

    cd frontend
    npm install
  • Backend:

    mvn install

3. Run the application

🚀 Frontend

cd frontend
npm run dev

Your React app will spin up at http://localhost:3000

⚙️ Backend API

Locate and run the ServerAPI.java file in src/main/java/ServerAPI.java:

  • IDE: Open the project and run ServerAPI as a Java application

  • Command Line (with Maven):

    mvn exec:java -Dexec.mainClass="ServerAPI"

The API will start on http://localhost:8080 by default.


Enjoy blazing‑fast search — and happy coding! 🚀

About

MAMA Search is a multi-threaded search engine built in Java that efficiently indexes and searches documents using MongoDB. It supports concurrent queries and highlights practical applications of parallelism and NoSQL data handling.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 93.1%
  • JavaScript 6.6%
  • Other 0.3%