Skip to content

akharrou/sci-tracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sci-Trace: Autonomous Scientific Lineage Mapper

Table of Contents

Search finds keywords. Sci-Trace finds foundations.

Sci-Trace is an autonomous research assistant that lives in the cloud and is accessible at any moment through Discord or Slack. Beyond general scientific dialogue, given a particular scientific concept, it can trace that concept's intellectual ancestry, recursively navigating the citation graph to surface the foundational papers that a modern work is built on. What would otherwise take hours of manual literature review takes minutes.

The system pairs OpenClaw, an autonomous AI agent, with a persistent Host server, both running on an AWS EC2 instance. OpenClaw handles general research queries directly and, when it detects the intent to trace a concept's lineage, utilizes a specialized tool to trigger the research process. This tool sends a request to the Host server to spawn a LangGraph agent (the Python Kernel) that fetches papers via the Semantic Scholar API, uses LLM reasoning to evaluate methodological significance at each step, and recursively walks the citation graph until it identifies a foundational root. Results and real-time progress are automatically streamed back to the originating Discord or Slack channel.

Key features:

  • Natural language interaction via Discord and Slack, with autonomous intent detection
  • Recursive citation graph traversal using the Semantic Scholar API
  • LLM-powered evaluation of methodological significance at each step (chain-of-thought, parallel batching)
  • Outputs a citation DAG image and a narrative lineage summary per trace
  • Slash command and natural language entry points; both converge on the same specialized LangGraph agent tool

Simplified request flow:

User (Discord / Slack)
        │
        ▼
   OpenClaw Agent  ──── intent analysis ────► direct response
        │
        │ lineage trace detected
        ▼
   Host Server  (Node.js, persistent)
        │
        │ spawns
        ▼
   Python Kernel  (transient)
        │
        ├── Semantic Scholar API  (paper fetch)
        ├── LLM eval  (methodological significance, per candidate)
        ├── LangGraph state machine  (recursive graph traversal)
        ├── Narrative synthesis
        └── DAG image rendering
        │
        ▼
   Host Server  ──► Discord / Slack

For a full technical deep-dive, see the Sci-Trace DeepWiki.

Demos

Traces in these demos were capped at 5 levels of depth and 5 parallel API queries at a time. Both are configurable.

Agentic Discovery

Autonomous intent analysis and scholarly reasoning via natural language mentions.

demo-agent.mp4
Deterministic Mapping

Instantaneous research generation via structured /trace commands.

demo-trace.mp4

Sci-Trace automates the research tracing lifecycle: recursive graph traversal, LLM-powered methodological validation, and good-fidelity visual synthesis.

System Architecture: The Host-OpenClaw-Kernel Pattern

To ensure stability and responsiveness, Sci-Trace utilizes a decoupled, multi-layered architecture:

  • The Body (Host): A persistent Node.js daemon that manages UI abstraction for Discord and Slack, session state, and the orchestration of background research tasks.
  • The Persona (OpenClaw): A conversational agent acting as a Senior Research Fellow (formal, scholarly, and witty) that plans and reasons over user requests and triggers research tasks.
  • The Brain (Kernel): A transient Python process powered by LangGraph and Pydantic AI. It handles the heavy-duty logic of fetching data from the Semantic Scholar API and reasoning over citation significance.



Three-layer architecture: The Host (Node.js body) routes slash commands directly, OpenClaw (Agent) autonomously interprets natural language and decides whether to trigger traces or respond directly, and the Kernel (Python brain) executes research tasks while querying external LLM and paper APIs.

Full Request Lifecycle

The following sequence illustrates the autonomous handoff between the persistent chat interfaces and the ephemeral research kernel.



Requests flow through two paths: slash commands route directly to the Host bridge, while natural language messages flow through OpenClaw for intent analysis. Both paths converge at the research kernel, which reports progress via tagged stdout and returns artifacts.

Kernel Logic: LangGraph State Machine

The research kernel operates as a cyclic state machine, allowing it to recursively traverse the citation graph until it identifies a foundational root.



LangGraph state machine: Recursively searches for papers, filters references, evaluates candidates via Pydantic AI for methodological significance, and continues until a foundational root is identified. Finally synthesizes narrative results and generates visual citation graph.

Setup & Installation

1. Prerequisites

  • Node.js 20+ / Python 3.11+
  • uv (Python package manager)
  • AWS Account (for infrastructure)

2. Environment Configuration

Create a .env file in the root directory:

# --- Host (Discord & Slack) ---
DISCORD_TOKEN=...
DISCORD_CLIENT_ID=...
SLACK_BOT_TOKEN=...
SLACK_SIGNING_SECRET=...

# --- Kernel (LLM & Data) ---
OPENROUTER_API_KEY=...
SEMANTIC_SCHOLAR_API_KEY=...

3. Installation

make install

4. Running the Trace

Once the bot is running (npm start), use the slash command: /trace topic: "Attention Is All You Need" Or mention the bot: @Research Assistant where did BERT come from?


Cloud Infrastructure & Deployment

Sci-Trace is designed with high availability in mind and is designed to operate autonomously in the cloud. It includes a complete Infrastructure as Code (IaC) suite for automated provisioning on AWS.

Provisioning (Terraform)

Terraform configurations are located in infra/. They provision:

  • Provider: AWS
  • Instance: t3.medium running Ubuntu 22.04 LTS
  • Bootstrap: user_data.sh installs Node.js 20, Python 3.11, uv, and PM2 on first boot

Deployment

./deploy.sh <EC2_PUBLIC_IP> <PEM_KEY_PATH>

Uses rsync to synchronize the codebase (excluding local environments) and performs remote setup for both the Kernel and the Host.

Process Management (PM2)

The Host daemon is managed by PM2, configured via ecosystem.config.js. Logs are written to host/logs/app.log. The process restarts automatically on crash or server reboot.

About

Conversational OpenClaw based autonomous research assistant with LangGraph subagent specialized tool to trace any concept's intellectual ancestry through Semantic Scholar scientific literature research.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors