Skip to content

amaydixit11/HalfLife

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HalfLife: Temporal-Aware Reranking (Even Without Timestamps)

RAG systems don't just ignore time. They don't even know what time it is.

Most real-world data (scraped docs, blogs, PDFs) lacks consistent temporal metadata. Consequently, standard vector search prioritizes well-cited, high-density results from 2018 over more relevant breakthroughs from 2024.

HalfLife fixes this as a 1-line drop-in layer.


🧨 The "Authority Trap" (Fixed)

Ask any standard RAG pipeline: "What is the best way to manage state in React today?"

  • ❌ Baseline (Vector-Only): #1 β†’ Redux (2018). It matches the query perfectly but is 8 years outdated.
  • βœ… With HalfLife: #1 β†’ React 19 / Zustand (2024). HalfLife automatically detects the "today" intent and re-ranks based on temporal relevance.

Run the Live Demo (No Timestamps!) ↓


🧩 Works Even Without Timestamps

Most rerankers assume you've already extracted and cleaned your metadata. HalfLife handles the messy reality:

  1. Automated Inference: Assigns chronological confidence by extracting years directly from raw unstructured text.
  2. Intent Classification: Detects if your query is Fresh (latest is best), Historical (older is best), or Static.
  3. Temporal Fusion: Re-weights relevance scores using decay functions, even if your vector store is "timestamp-blind."

πŸ§ͺ Try on your own data (5 mins)

Validate HalfLife on your existing RAG results without changing your code:

  1. Export chunks: Save some nodes/payloads to data.json.
  2. Run the test script:
    python3 scripts/try_on_your_data.py --query "Best LLM today" --file data.json
  3. Compare: See the difference between #1 baseline vs. #1 HalfLife.

πŸš€ Get Started

  1. Install via PyPI:
    pip install halflife-rag
  2. Integrate (LlamaIndex):
    from halflife.integrations.llamaindex import HalfLifePostprocessor
    
    # Plug into any LlamaIndex query engine
    query_engine = index.as_query_engine(
        node_postprocessors=[HalfLifePostprocessor(top_n=3)]
    )

πŸ“Š Real-World Performance

Evaluated on 120 real Arxiv papers (Parameter-Efficient Fine-Tuning) from 2019–2024:

Query Intent Baseline Avg Result Age HalfLife Avg Result Age Ξ”
Fresh ("current state-of-the-art...") 3.9 yr 2.4 yr βˆ’1.5 yr
Static ("explain how...") 4.3 yr 4.0 yr βˆ’0.3 yr (Stable)
Historical ("original paper...") 5.0 yr 5.2 yr +0.2 yr (Verified)

For fresh queries, HalfLife surfaces results 1.4 years more recent on average. For static queries, behavior is unchanged β€” HalfLife is invisible when it doesn't need to act. Historical inversion is marginal on this corpus (2019–2024); deeper corpora show stronger results.

Reproduce this yourself β€” the Colab notebook fetches live Arxiv data with no setup:

Open In Colab


How It Works

HalfLife sits between your vector retriever and your LLM. It re-scores each retrieved chunk using a weighted fusion of three signals:

final_score = Ξ± Β· vector_score + Ξ² Β· temporal_score + Ξ³ Β· trust_score

The weights Ξ±, Ξ², Ξ³ are set dynamically based on query intent:

Detected Intent Example keywords Ξ± (vector) Ξ² (temporal) Ξ³ (trust)
Fresh "latest", "current", "today", "SOTA" 0.3 0.6 0.1
Historical "originally", "history of", "early" 0.4 0.5 0.1
Static "explain", "what is", "how does" 0.8 0.1 0.1

For historical queries, the temporal signal is inverted β€” older chunks score higher.

User Query
    ↓
QueryIntentClassifier  β†’  sets Ξ±, Ξ², Ξ³
    ↓
Vector Retrieval (Qdrant / any store)
    ↓
HalfLife Reranker
    β”œβ”€β”€ Temporal decay per chunk  (exponential / piecewise / learned MLP)
    β”œβ”€β”€ Redis metadata cache      (< 1ms overhead on cached chunks)
    └── Min-Max fusion
    ↓
Re-ranked chunks  β†’  LLM

Quickstart

No infrastructure (2 minutes)

pip install halflife-rag
from halflife import HalfLife

hl = HalfLife()

# Drop into your existing retrieval results
results = qdrant.search(query=query)
reranked = hl.rerank(query=query, chunks=results, top_k=5)

for chunk in reranked:
    print(f"[{chunk['final_score']:.3f}] ({chunk['timestamp'][:4]}) {chunk['text'][:80]}")

With Docker (full feature set including Redis cache)

git clone https://github.com/amaydixit11/halflife.git
cd halflife
docker-compose up -d   # starts Qdrant + Redis
pip install -e .
halflife demo          # runs the adversarial demo

LangChain Integration

from langchain.retrievers import ContextualCompressionRetriever
from halflife.integrations.langchain import HalfLifeReranker

retriever = ContextualCompressionRetriever(
    base_compressor=HalfLifeReranker(top_k=5),
    base_retriever=your_existing_retriever   # unchanged
)

docs = retriever.get_relevant_documents("Latest approach to LLM fine-tuning?")
# β†’ surfaces 2024 papers instead of 2020 papers

LlamaIndex Integration

from halflife.integrations.llamaindex import HalfLifePostprocessor

query_engine = index.as_query_engine(
    similarity_top_k=20,
    node_postprocessors=[HalfLifePostprocessor(top_n=5)]
)

response = query_engine.query("What is the latest React version?")

Decay Strategies

Three decay functions are available, selectable per document at ingestion time:

Strategy Formula Best for
Exponential e^(βˆ’Ξ»Ξ”t) News, fast-moving fields, software versions
Piecewise Step function (1.0 β†’ 0.7 β†’ 0.3) Documentation, compliance, versioned specs
Learned MLP-predicted Ξ» from doc features Mixed corpora with feedback signal

The learned decay MLP runs at ingestion time only β€” zero ML inference at query time.


Experimental Features

  • Learned Decay MLP β€” pure-NumPy, predicts per-chunk Ξ» from doc type, source domain, text length, and feedback ratio. Train on your own benchmark results with halflife train.
  • Feedback Loop β€” adaptive Ξ» tuning via EMA from user "was this useful?" signals.
  • Event Bus β€” hard/soft invalidation when a fact is superseded (e.g. a retraction, a product discontinuation).

CLI

halflife demo                              # adversarial demo (no Docker)
halflife quickstart                        # end-to-end ingest β†’ query β†’ rerank
halflife benchmark --output results.json   # nDCG / MRR / temporal freshness
halflife evaluate --ablation               # compare exponential / linear / learned / baseline
halflife train --results results.json      # train the decay MLP
halflife serve --port 8000                 # start the FastAPI middleware

Roadmap

  • Phase 1: Core decay engine + Redis metadata store
  • Phase 2: Intent-aware fusion + historical inversion
  • Phase 3: Learned decay MLP + benchmark harness
  • Phase 3.5: Real-world Arxiv validation + Colab notebook
  • Phase 4: Event-driven fact supersession
  • Phase 5: Pinecone, Weaviate, Chroma integrations
  • Phase 6: Transformer-based intent classifier (replace keyword matching)

License & Contributing

MIT License. Contributions welcome β€” especially new decay functions, vector store integrations, and domain-specific benchmark datasets.

If you're using HalfLife in a project, open an issue and let us know β€” we'd like to link to it.

About

A middleware layer for retrieval systems that applies temporal decay functions over retrieved chunks using explicit half-life modeling. It fuses vector similarity with time-dependent relevance scoring to enable adaptive, chunk-level reranking in retrieval-augmented pipelines.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors