News Personalization — Data Science Base Project (Streamlit Modules)

This repo is a starter DS sandbox for a personalized news stack. It’s intentionally modular so you can test each module independently via a separate Streamlit app.

It includes:

A minimal data model (Pydantic) for stories, facts, clusters, user profiles
A C-LLM extraction schema (stubbed by default) + canonicalization + fact IDs
A simple clustering baseline + delta computation
A basic user knowledge store + preference updates
A planner (P-LLM) and realizer (R-LLM) scaffolding (stubbed)
An evaluation playground (coverage, novelty, redundancy, faithfulness placeholder)
Streamlit apps for each module under apps/

⚠️ By default, LLM calls are stubbed so you can run everything with no API keys. You can later wire your preferred LLM provider by editing src/llm/providers.py.

1) Setup (local)

1. Create venv + install

python -m venv .venv
# mac/linux
source .venv/bin/activate
# windows
# .venv\Scripts\activate

pip install -r requirements.txt

2. Configure environment

cp .env.example .env

Fill any keys if you want live providers (optional). You can run fully offline.

2) Running Streamlit apps

Each module is a standalone app:

streamlit run apps/01_ingest_and_cluster.py
streamlit run apps/02_c_llm_extract.py
streamlit run apps/03_canonicalize_and_dedup.py
streamlit run apps/04_user_model.py
streamlit run apps/05_planner.py
streamlit run apps/06_realizer.py
streamlit run apps/07_evaluation.py

If you want a simple launcher:

python tools/run_app.py 2

3) Project layout

apps/                       # Streamlit entrypoints (one per module)
src/
  config.py                 # env + paths
  data_models.py            # Pydantic models (Story, Fact, Cluster, User)
  storage/                  # lightweight local storage (jsonl)
  ingest/                   # ingest stubs + parsers
  clustering/               # clustering + delta logic
  extraction/               # C-LLM schema + canonicalization + fact IDs
  user/                     # user profile, knowledge & preference updates
  planning/                 # P-LLM plan schema + stub planner
  realization/              # R-LLM renderer schema + stub realizer
  eval/                     # metrics + eval helpers
  llm/                      # provider interface (stub, optional live)
tools/
  run_app.py                # quick launcher
data/                       # local runtime data (created automatically)
artifacts/                  # cached outputs (created automatically)

4) Data flow in this sandbox

Ingest: load example articles → normalize → store Story
Cluster: assign cluster_id and maintain ClusterState
Extract: run C-LLM extractor (stub) → produce grounded Fact objects
Canonicalize: normalize facts → generate fact_id → dedup across stories
User model: update preferences + per-cluster fact memory (seen facts)
Plan: create a per-user ContentPlan (what to include/omit/emphasize)
Realize: generate swipe cards + extended modules (stub)
Evaluate: compute novelty/redundancy coverage + compare variants

5) Wiring a real LLM (optional)

Edit:

src/llm/providers.py (implement LLMProvider.generate_json(...))
Set env keys in .env

Then, in the apps, switch from the stub provider to your live provider.

6) Notes for Cursor

Cursor will work best if you:

open the repo folder
use .env for secrets
run Streamlit in an integrated terminal
keep generated data under data/ and cached artifacts under artifacts/

7) Next steps you can build on

Add a real EventRegistry connector in src/ingest/eventregistry.py
Replace the stub C-LLM with your actual extraction prompt + grounding
Add a vector DB (FAISS / pgvector) for fast retrieval
Implement a verifier: ensure R-LLM outputs only use selected fact IDs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Personalization — Data Science Base Project (Streamlit Modules)

1) Setup (local)

1. Create venv + install

2. Configure environment

2) Running Streamlit apps

3) Project layout

4) Data flow in this sandbox

5) Wiring a real LLM (optional)

6) Notes for Cursor

7) Next steps you can build on

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps		apps
src		src
tools		tools
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

News Personalization — Data Science Base Project (Streamlit Modules)

1) Setup (local)

1. Create venv + install

2. Configure environment

2) Running Streamlit apps

3) Project layout

4) Data flow in this sandbox

5) Wiring a real LLM (optional)

6) Notes for Cursor

7) Next steps you can build on

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages