Sci-Trace: Autonomous Scientific Lineage Mapper

Table of Contents

Gist
Demos
System Architecture: The Host-OpenClaw-Kernel Pattern
Full Request Lifecycle
- Kernel Logic: LangGraph State Machine
Setup & Installation
Cloud Infrastructure & Deployment

Search finds keywords. Sci-Trace finds foundations.

Sci-Trace is an autonomous research assistant that lives in the cloud and is accessible at any moment through Discord or Slack. Beyond general scientific dialogue, given a particular scientific concept, it can trace that concept's intellectual ancestry, recursively navigating the citation graph to surface the foundational papers that a modern work is built on. What would otherwise take hours of manual literature review takes minutes.

The system pairs OpenClaw, an autonomous AI agent, with a persistent Host server, both running on an AWS EC2 instance. OpenClaw handles general research queries directly and, when it detects the intent to trace a concept's lineage, utilizes a specialized tool to trigger the research process. This tool sends a request to the Host server to spawn a LangGraph agent (the Python Kernel) that fetches papers via the Semantic Scholar API, uses LLM reasoning to evaluate methodological significance at each step, and recursively walks the citation graph until it identifies a foundational root. Results and real-time progress are automatically streamed back to the originating Discord or Slack channel.

Key features:

Natural language interaction via Discord and Slack, with autonomous intent detection
Recursive citation graph traversal using the Semantic Scholar API
LLM-powered evaluation of methodological significance at each step (chain-of-thought, parallel batching)
Outputs a citation DAG image and a narrative lineage summary per trace
Slash command and natural language entry points; both converge on the same specialized LangGraph agent tool

Simplified request flow:

User (Discord / Slack)
        │
        ▼
   OpenClaw Agent  ──── intent analysis ────► direct response
        │
        │ lineage trace detected
        ▼
   Host Server  (Node.js, persistent)
        │
        │ spawns
        ▼
   Python Kernel  (transient)
        │
        ├── Semantic Scholar API  (paper fetch)
        ├── LLM eval  (methodological significance, per candidate)
        ├── LangGraph state machine  (recursive graph traversal)
        ├── Narrative synthesis
        └── DAG image rendering
        │
        ▼
   Host Server  ──► Discord / Slack

For a full technical deep-dive, see the Sci-Trace DeepWiki.

Demos

Traces in these demos were capped at 5 levels of depth and 5 parallel API queries at a time. Both are configurable.

Agentic Discovery

Autonomous intent analysis and scholarly reasoning via natural language mentions.

demo-agent.mp4

Deterministic Mapping

Instantaneous research generation via structured /trace commands.

demo-trace.mp4

Sci-Trace automates the research tracing lifecycle: recursive graph traversal, LLM-powered methodological validation, and good-fidelity visual synthesis.

System Architecture: The Host-OpenClaw-Kernel Pattern

To ensure stability and responsiveness, Sci-Trace utilizes a decoupled, multi-layered architecture:

The Body (Host): A persistent Node.js daemon that manages UI abstraction for Discord and Slack, session state, and the orchestration of background research tasks.
The Persona (OpenClaw): A conversational agent acting as a Senior Research Fellow (formal, scholarly, and witty) that plans and reasons over user requests and triggers research tasks.
The Brain (Kernel): A transient Python process powered by LangGraph and Pydantic AI. It handles the heavy-duty logic of fetching data from the Semantic Scholar API and reasoning over citation significance.

Three-layer architecture: The Host (Node.js body) routes slash commands directly, OpenClaw (Agent) autonomously interprets natural language and decides whether to trigger traces or respond directly, and the Kernel (Python brain) executes research tasks while querying external LLM and paper APIs.

Full Request Lifecycle

The following sequence illustrates the autonomous handoff between the persistent chat interfaces and the ephemeral research kernel.

Requests flow through two paths: slash commands route directly to the Host bridge, while natural language messages flow through OpenClaw for intent analysis. Both paths converge at the research kernel, which reports progress via tagged stdout and returns artifacts.

Kernel Logic: LangGraph State Machine

The research kernel operates as a cyclic state machine, allowing it to recursively traverse the citation graph until it identifies a foundational root.

LangGraph state machine: Recursively searches for papers, filters references, evaluates candidates via Pydantic AI for methodological significance, and continues until a foundational root is identified. Finally synthesizes narrative results and generates visual citation graph.

Setup & Installation

1. Prerequisites

Node.js 20+ / Python 3.11+
uv (Python package manager)
AWS Account (for infrastructure)

2. Environment Configuration

Create a .env file in the root directory:

# --- Host (Discord & Slack) ---
DISCORD_TOKEN=...
DISCORD_CLIENT_ID=...
SLACK_BOT_TOKEN=...
SLACK_SIGNING_SECRET=...

# --- Kernel (LLM & Data) ---
OPENROUTER_API_KEY=...
SEMANTIC_SCHOLAR_API_KEY=...

3. Installation

make install

4. Running the Trace

Once the bot is running (npm start), use the slash command: /trace topic: "Attention Is All You Need" Or mention the bot: @Research Assistant where did BERT come from?

Cloud Infrastructure & Deployment

Sci-Trace is designed with high availability in mind and is designed to operate autonomously in the cloud. It includes a complete Infrastructure as Code (IaC) suite for automated provisioning on AWS.

Provisioning (Terraform)

Terraform configurations are located in infra/. They provision:

Provider: AWS
Instance: t3.medium running Ubuntu 22.04 LTS
Bootstrap: user_data.sh installs Node.js 20, Python 3.11, uv, and PM2 on first boot

Deployment

./deploy.sh <EC2_PUBLIC_IP> <PEM_KEY_PATH>

Uses rsync to synchronize the codebase (excluding local environments) and performs remote setup for both the Kernel and the Host.

Process Management (PM2)

The Host daemon is managed by PM2, configured via ecosystem.config.js. Logs are written to host/logs/app.log. The process restarts automatically on crash or server reboot.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs/assets		docs/assets
host		host
infra		infra
kernel		kernel
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
deploy.sh		deploy.sh
ecosystem.config.js		ecosystem.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sci-Trace: Autonomous Scientific Lineage Mapper

Demos

System Architecture: The Host-OpenClaw-Kernel Pattern

Full Request Lifecycle

Kernel Logic: LangGraph State Machine

Setup & Installation

1. Prerequisites

2. Environment Configuration

3. Installation

4. Running the Trace

Cloud Infrastructure & Deployment

Provisioning (Terraform)

Deployment

Process Management (PM2)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sci-Trace: Autonomous Scientific Lineage Mapper

Demos

System Architecture: The Host-OpenClaw-Kernel Pattern

Full Request Lifecycle

Kernel Logic: LangGraph State Machine

Setup & Installation

1. Prerequisites

2. Environment Configuration

3. Installation

4. Running the Trace

Cloud Infrastructure & Deployment

Provisioning (Terraform)

Deployment

Process Management (PM2)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages