Skip to content

feat(krep): rg streaming + LSA model + standalone CLI (v2.0.2.0)#4

Merged
ozhangebesoglu merged 24 commits into
mainfrom
feat/krep-svd-model
Jun 1, 2026
Merged

feat(krep): rg streaming + LSA model + standalone CLI (v2.0.2.0)#4
ozhangebesoglu merged 24 commits into
mainfrom
feat/krep-svd-model

Conversation

@ozhangebesoglu

Copy link
Copy Markdown
Owner

Summary

Kishi Shell v2.0.2.0 — Krep AI semantic search'in büyük yeniden yapılandırması. Üç ana feature, tek branch'te (rg-streaming + svd-model merge'ü).

🚀 Yeni Özellikler

1. ripgrep streaming prefilter (150-3000× speedup)

  • rg sistemde varsa otomatik dispatch (kullanıcı setup yapmaz)
  • Streaming Popen + limit×10 early-termination + 10s hard-timeout
  • Adaptive fallback: rg 0-match dönerse semantic walker'a düşer
  • --no-rg flag debug için
  • Kishi src auth login: 1668ms → 5ms (220×)
  • Python stdlib (6.8M satır): timeout → 11ms (>5000×)
  • 1 GB tek dosya: timeout → 6ms (>10000×)

2. Dictionary-free LSA model (PPMI + SVD)

  • krep --learn PATH ile corpus'tan otomatik vocab + 3D anlamsal eksen
  • Manuel 178-keyword sözlüğüne bağımlı değil, tüm diller otomatik
  • Rank-50 SVD HD vec (cosine ranking) + PCA-3 (scatter görsel)
  • Eksen auto-label (frequency-weighted top-5 kelime)
  • Verified on Loghub (OpenSSH/Apache/Linux/Mac/HDFS, 10k satır)

3. Tail-aware incremental + lazy auto-refresh

  • `krep --update-learn` sadece YENİ satırları işler (file offset tracking)
  • `--auto-refresh 1h` ile her sorguda background subprocess
  • Mevcut sorgu eski modelle devam, sıradaki yeniyi görür
  • Rotation/truncate detect (size shrink → baştan oku)
  • Silinmiş dosyalar file_state'ten otomatik çıkar

4. Standalone `krep` CLI

  • `pip install kishi-shell` artık iki binary kurar: `kishi` ve `krep`
  • bash/zsh/fish'tan doğrudan: `krep "auth" /var/log/`

5. Optional numpy/scipy (felsefe korundu)

  • Core: `pip install kishi-shell` → 2 dep (prompt_toolkit, psutil)
  • LSA model: `pip install kishi-shell[krep]` → +numpy/scipy
  • Keyword engine numpy olmadan çalışmaya devam eder

📊 Test Coverage

  • 407 / 407 test geçiyor
  • 95 yeni test (krep_learn 63 + krep_streaming 27 + krep_perf 5)
  • Senior code-audit APPROVED (2 IMPORTANT fix uygulandı)
  • Memory leak yok (10k iter, +0.2 MB RSS)
  • Thread-safe (16 thread × 20 paralel = 320, 0 exception)

🛠️ CLI Komutları

```bash
krep PATTERN [PATH...] # Search
krep --learn PATH # Build LSA model
krep --update-learn PATH # Tail-only incremental
krep --auto-refresh 1h # Lazy background refresh
krep --list-models # Show cached models
krep --purge-models # Delete all models
krep --no-model PATTERN PATH # Bypass model, keyword only
krep --no-rg PATTERN PATH # Bypass ripgrep, walker only
```

Test plan

  • 407/407 pytest passing
  • Manual: `krep -r 'auth login' kishi/` (rg + LSA çalışıyor)
  • Manual: `krep --learn /tmp/krep_realtest/` Loghub corpora
  • Manual: `krep --list-models` (FRESH/STALE, age, axes)
  • Manual: `echo "auth login" | krep auth` (stdin pipe)
  • Edge: 1 GB single file (6ms)
  • Edge: empty corpus, permission denied, symlink loop, binary, null byte
  • Senior code-audit independent review (subagent)

🤖 Generated with Claude Code

ozhangebesoglu and others added 24 commits May 25, 2026 23:24
…nsion and concept pruning and bump version to 2.0.1.0
Plan covers a 5-task TDD rollout that introduces a ripgrep-based
streaming prefilter with early termination, while keeping the
existing Cython-backed walker as the fallback path. No new Python
runtime dependencies; ripgrep stays optional.

Doğrulanmış kazanımlar (spike ölçümleri):
- Kishi src "auth login": 1656 ms → ~11 ms (150x)
- Python stdlib "auth login" (~20k dosya): 56 s → ~19 ms (~3000x)
- Fallback (rg yok): mevcut davranış (regresyon yok)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet <noreply@anthropic.com>
Subagent's earlier split-based implementation kept dotted tokens literal
(e.g. 'config.py' → 'config\.py'), but that diverges from Krep's semantic
OR-prefilter character. Real-corpus measurement shows:
- 'auth login', 'error timeout' (typical queries): IDENTICAL behavior.
- 'auth.token expired': findall yields 64 matches vs split's 6 — broader
  semantic coverage that the user actually wants.
- 'user@admin': findall yields 63 matches vs split's 0.

Restored re.findall(r'[\w]+') per the plan. The metacharacter safety
test now asserts the cleaner drop-not-escape behavior — \w+ naturally
strips meta-chars so the pattern is always re.compile-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Popen + line-by-line streaming via stdout
- Terminates rg when limit*early_stop_factor matches reached (default 50)
- 10s hard-timeout safety net
- Non-UTF8 safe (bytes mode + decode errors='replace')
- Returns (matches, stats) tuple where matches uses (l_vec, sim, output_str)
  format compatible with krep_search's existing finalize path
- Does NOT use rg's -w (word-boundary) flag: that would skip
  'auth_token', 'authenticate' for query 'auth' — too narrow for
  Krep's semantic prefilter goal.

Performance on real corpora (post -w removal):
- Kishi src 'auth login':   8 ms (vec=20, match=17)
- Kishi src 'error timeout': 10 ms (vec=50, match=50, early-stop)
- Stdlib 'auth login':     23 ms (vec=66, match=50, early-stop)
- Stdlib 'database query': 23 ms (vec=53, match=50, early-stop)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Extracted shared _krep_finalize(matches, q_vec, limit) helper.
- krep_search now dispatches to _krep_rg_streaming when rg is present
  and a path argument was provided.
- Adaptive fallback: if rg returns 0 matches but the pattern was valid,
  fall through to the legacy walker. This preserves Krep's semantic
  edge for queries like 'login authorization' that should still match
  'auth token' lines via concept-vector bigram similarity.
- stdin mode and rg_spawn_failed both route to the legacy walker.

Verification:
- 333 / 333 tests pass (no regression in existing 305 tests).
- test_krep_perf::test_recursive_search_under_threshold PASSES at <100 ms
  (Task 1 TDD target met).
- test_krep_search_files (legacy semantic-eşleşme test) PASSES via the
  adaptive fallback path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- kishi_krep gains --no-rg flag (debug/test); briefly overrides
  kishi.krep._HAS_RG and restores it after the call.
- help text documents --no-rg and the rg-streaming performance bump.
- README.md & README.tr.md gain a Krep Performance section with the
  full benchmark table (Walker vs rg-streaming, real corpora).
- Version bumped in pyproject.toml, kishi/main.py banner,
  kishi/builtins.py help_text and neofetch shell line, and both READMEs.

Verified end-to-end (3-run averages, 12-core x86_64, Python 3.14,
ripgrep 15.1):
  Kishi src    'auth login':       5 ms (vs 1068 ms walker) → 206x
  Kishi src    'database query':   5 ms (vs 1071 ms walker) → 210x
  Tests dir    'auth login':       6 ms (vs 1053 ms walker) → 171x
  Stdlib       'auth login':      11 ms (walker times out)   → >5000x
  Stdlib       'database query':  14 ms (walker times out)   → >4000x

335 / 335 tests pass. No regressions in existing 305-test baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new permanent regression tests in tests/test_krep_perf.py:

1. test_p99_under_strict_threshold — 10 cold-cache runs, p99 must be
   <50 ms. Current p99 measured ~8 ms; 5x headroom for CI variance.
2. test_no_memory_leak_100_iterations — 100 sequential calls; RSS
   delta must be <10 MB. Current delta ~0 MB.
3. test_thread_safety_smoke — 8 threads × 5 calls = 40 parallel
   invocations; zero exceptions, all rc=0. Print is monkey-patched
   to no-op because capsys is not thread-safe.

Verified locally on real corpora:
- Kishi src (~5k lines):        p50 ~5 ms,  p99 ~8 ms
- Tests dir (~3k lines):        p50 ~6 ms,  p99 ~8 ms
- Stdlib (~6.8M lines):         p50 ~10 ms, p99 ~24 ms
- 189k-file combined corpus:    avg ~15 ms
- /usr top-level (mega):        avg ~29 ms

These guards lock in the rg-streaming performance contract and catch
any future regression (lost streaming, leaked subprocess, broken
fallback) at CI time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Independent senior code review identified 4 actionable findings:

1. [IMPORTANT] proc.stdout.close() missing before break in _krep_rg_streaming
   On early-stop, proc.terminate() sent SIGTERM but the stdout pipe stayed
   open. If the pipe buffer was full, rg blocked until proc.wait(timeout=2)
   timed out, then proc.kill() ran, then another proc.wait(timeout=1) — up
   to 3 s of dead-time on every early-stop. Fix: close stdout in `finally`
   before wait; SIGPIPE makes rg exit immediately.
   Measured impact: 100k-potential-match scenario dropped from 6.8 ms (luck)
   to a tight 6.7-7.5 ms upper bound across 5 runs.

2. [IMPORTANT] rg_spawn_failed was silent — operational blindness
   When _HAS_RG is True but Popen raises OSError (broken executable,
   ENOMEM), we fall back to the legacy walker silently. Users would
   suddenly see 1-second searches instead of 10 ms and have no clue why.
   Fix: write an amber-colored stderr warning so the cause is visible.

3. [NIT] PEP8 E402: `import subprocess` and `import time` were inline
   mid-file. Moved to the module header alongside other stdlib imports.

4. New tests added:
   - test_dispatch_rg_spawn_failed_falls_back (mocked OSError + assert
     warning in captured.err + assert fallback found the match)
   - test_streaming_hard_timeout_safety_net (hard_timeout=0.001 forces
     early termination)
   - test_streaming_terminates_cleanly_on_early_stop (asserts wall time
     <1 s — regression guard for the pipe-close bug)

341 / 341 tests pass. No behavior change for successful rg paths;
only the failure-path latency is improved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan was fully implemented (commits 75ab71c..8b9c587). Added an
"Implementation Notes" section to the plan documenting 4 deliberate
deviations and 1 extra layer that emerged during execution:

1. -w (word-boundary) flag removed — Krep is semantic, needed wider
   prefilter so 'auth' matches 'auth_token'/'authenticate'.
2. Tokenization stayed on re.findall(\w+) per the plan; subagent's
   first iteration used split() and was reverted after real-corpus
   measurement showed findall's OR semantics is what Krep actually
   wants (64 vs 6 matches on 'auth.token expired' query).
3. Adaptive fallback when rg returns 0 hits — Krep's semantic edge
   (bigram bridging 'login authorization' to 'auth token') is
   preserved by falling through to the walker.
4. Senior audit fixes: proc.stdout.close() before terminate (was 3 s
   regression latency) + rg_spawn_failed stderr warning.
5. Extra CI guards: p99 < 50 ms, memory leak < 10 MB / 100 calls,
   8-thread smoke. Plan didn't require these — added for
   production-grade.

Final stress-test summary appended (10k iteration in 77 s, +0.2 MB
RSS, 0 FD leak, ±1% perf drift; 200 MB single file 7.8 ms; /usr
mega-corpus 29 ms; senior audit APPROVED with fixes applied;
341 tests passing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
docs/benchmarks/2026-05-26-krep-perf.md captures every measurement
made during this branch's verification cycle (4 iterations across
~5 Ralph-loop turns):

- 4-quadrant matrix (rg × Cython): isolates each accelerator's
  contribution. rg alone ≈ 260x; Cython adds 22-80% on top.
- Stat-grade table (30-run): p50/p95/p99/stdev for Kishi src,
  Stdlib. All under 30 ms p99.
- Corpus-size scaling row: 3 k lines → 50 M lines, search time
  stays in [5, 29] ms because early-stop is location-driven,
  not size-driven.
- Sub-component timing on 1 GB single file: rg subprocess 4.4 ms,
  Python wrapper +1.8 ms = 6.2 ms total. Demonstrates that the
  Python overhead is constant.
- Memory: tracemalloc +656 byte/100 iter, RSS +0.2 MB/10k iter,
  zero FD leak, zero zombies.
- Concurrency: 320 parallel calls, 32-thread cache write, zero
  exceptions in either.
- 21 edge case results (binary, symlink loop, null byte,
  permission denied, 1 GB single file, etc.) all rc-valid, no crash.
- Senior audit findings + status (all addressed).
- Coverage: ~89% on new code.
- Per-direction speedup table: up to >10000x on 1 GB single file.

This file is the regression baseline. Any future krep change that
moves the numbers in the "ileride karşılaştırma" section in the
wrong direction must be reviewed before merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Manuel 178-kelime X/Y/Z_KEYWORDS sözlüğünden bağımsız, corpus-tabanlı
3D semantik embedding. Latent Semantic Analysis (PPMI + truncated SVD).

Yeni dosya: kishi/krep_learn.py (~340 satır)
  - build_model(paths): vocab tarama, cooccurrence matrix, PPMI normalize,
    scipy.sparse.linalg.svds(k=3), eksen auto-label
  - save/load model: vectors.npz (float32 array, pickle yok) + metadata.json
  - find_model_for(paths): deterministik klasör hash ile model lookup
  - vectorize_with_model(text, model): OOV-safe, vocab dışı kelime atılır
  - list_models / purge_models: bakım

kishi/krep.py:
  - _resolve_model + _vectorize_dispatch helpers (REPL cache ile)
  - krep_search bash: model varsa kullan, yoksa keyword fallback
  - process_file: model varsa file-level pruning atla (vocab uzayında pahalı),
    her satır model'le vektörize; mevcut keyword yolu DEĞİŞMEDİ

kishi/builtins.py:
  - --learn PATH... : corpus'tan model üret, ~/.cache/kishi/krep_models/'e kaydet
  - --no-model     : modeli bypass et (debug/test)
  - --list-models  : kayıtlı modeller, vocab/lines/axes
  - --purge-models : tüm modelleri sil

POC doğrulama (Kishi src, 4560 satır):
  - Build: 0.1s, 1534 vocab, 34 KB model
  - Eksenler auto-label:
      axis 0: self return def import not (Python yapısı)
      axis 1: print model krep color_reset path (UI/krep)
      axis 2: the kishi explorer command ctrl (TUI)
  - Semantic: auth↔password=1.00, error↔fail=0.99 (sözlüksüz!)
  - Kazanım: 'plugin install' query → keyword 0 match, model 5 match
              'color message' query → keyword 0 match, model 5 match

Bağımlılık: numpy>=1.20, scipy>=1.7 (PyPI wheel matrix yaygın, AUR'da
resmi paket, hemen her dev makinesinde mevcut).

312/312 test geçiyor, regresyon yok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Senin (kullanıcı) önerin: "klasörde --learn yaptıktan sonra interval
ver, yeni log satırları geldiğinde otomatik kaldığı yerden devam etsin"
— Celery periodic task mantığı, ama daemon yok.

YENİ ÖZELLİKLER:

1. Tail-aware incremental scan (krep_learn._scan_corpus):
   - Her dosya için file_state{offset, mtime, size} kaydeder
   - Update zamanı: cur_size < prev_size → rotate/truncate, baştan oku
                    cur_size > prev_size  → tail offset'ten yeni satırlar
                    cur_mtime == prev_mtime → atla (değişmemiş)
   - Silinmiş dosyalar file_state'ten otomatik çıkar

2. update_model(existing, paths) → incremental SVD:
   - Yeni satırların cooccurrence pair'lerini önceki state'e topla
   - Yeni vocab kelimeleri otomatik dahil
   - PPMI + SVD yeniden çöz (kompakt, doğru)
   - 5 saniyelik build → ~0.5 saniyelik update

3. Lazy auto-refresh (krep._resolve_model):
   - krep --learn /var/log/ --auto-refresh 1h
   - Her sorguda model_age kontrol: > interval ise
     subprocess.Popen([python -c "...krep_learn.update_model..."])
     fire-and-forget (start_new_session=True)
   - Mevcut sorgu eski modelle devam, sıradaki sorgu yeni modeli görür
   - _REFRESH_TRIGGERED set'i ile aynı sorguda çift spawn'u engelle

4. Yeni CLI:
   - --update-learn PATH      manuel tail incremental
   - --auto-refresh INTERVAL  insancıl interval (1h, 30m, 1d, 0=off)
   - --list-models            yaş + auto-refresh + STALE/FRESH gösterir
   - Mevcut --learn, --no-model, --purge-models korundu

5. parse_interval / format_age helpers:
   - "1h" → 3600, "30m" → 1800, "1d" → 86400, "2w" → 1209600
   - "0", "off", "false" → 0 (kapalı)
   - Geçersiz format ValueError

E2E DOĞRULAMA:
- Initial build: 17 vocab, 40 lines, file_state 2 dosya
- Log append (sadece log1): +395 byte → update_model SADECE log1 işliyor
- V: 17 → 27 (yeni kelimeler: kubernetes, pod, restart vs.)
- is_stale: 1h+just_built → False, 1h+2h_ago → True
- --list-models: "auto-refresh 1h · STALE" doğru göstr

DOSYA FORMATI (per-model klasör):
  vectors.npz      word_vecs (float32, pickle yok)
  metadata.json    vocab, axis_labels, build_time, auto_refresh_seconds
  state.json       file_state, term_freq, pair_counts (incremental için)

load_model(with_state=False) sorgu için lightweight load,
with_state=True update için tam state. _MODEL_CACHE REPL ömrü.

312/312 mevcut test geçiyor, sıfır regresyon.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bug rapor: SVD rank-3 vec'leri tek yöne sıkıştırıyordu — vocab'ın %42'si
sim>0.99 (random pair'lerde), thermal↔ftp=0.99 gibi alakasız çiftler de
yüksek skor alıyordu. Sonuç: krep "authentication failure" sorgusu HDFS
block satırlarını matchliyordu.

Çözüm — word2vec/LSA standardı:
- SVD rank-50 (HD): cosine ranking için gerçek vocab ayrışması
- PCA-3 (sadece scatter görseli için): %14 variance, görsel yön
- word_vecs (V, 50)    cosine_similarity için
- word_vecs_3d (V, 3)  3D ASCII scatter plot için

Match tuple genişletildi: (l_vec_hd, sim, output_str, raw_text) — render
öncesi top-K match'ler raw_text'ten yeniden 3D vectorize edilir, scatter
HD bilgisini taşır ama görsel 3D bütünlüğünü korur.

Gerçek-log (Loghub: openssh, apache, linux, mac, hdfs, 10000 satır) test:
  'authentication failure' → linux.log sshd auth failure  sim=0.99 ✓
  'invalid user from'      → openssh.log Invalid user      sim=0.99 ✓
  'permission denied'      → klogind Auth failed (semantic) sim=0.89 ✓
  'kernel memory'          → kernel command line           sim=0.93 ✓
  'block packetresponder'  → hdfs Served block             sim=1.00 ✓
  'thunderbolt thermal'    → IOThunderboltSwitch           sim=0.59 ✓
  'google software update' → GoogleSoftwareUpdateAgent     sim=0.82 ✓

Cosine artık VARYE EDİYOR (0.51-1.00 arası), eski 1.00-düz değil.
Tam match: ~0.99, semantic: 0.80-0.92, loose: 0.51-0.59.

API:
- vectorize_with_model(text, model, dim="hd")  cosine için (default)
- vectorize_with_model(text, model, dim="3d")  scatter görsel için
- _cosine_anyd(a, b)  boyut-agnostik cosine helper

vectors.npz: word_vecs + word_vecs_3d, ikisi de float32, pickle yok.
load_model geriye uyumlu: word_vecs_3d yoksa word_vecs'i fallback eder.

312/312 test geçiyor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bug: lazy refresh tetiklemiyordu çünkü _resolve_model cache hit'te
is_stale kontrolünü atlıyordu (return önce). Ayrıca background refresh
tamamlandıktan sonra cache stale model'i tutmaya devam ediyordu.

Düzeltme:
- Cache key artık (paths, metadata.json mtime) — dosya değişirse cache
  otomatik invalid → reload
- is_stale check her resolve'da çalışır, cache hit'te bile
- Background subprocess tamamlandığında save_model metadata.json mtime'ını
  güncellediği için cache otomatik yenilenir

Gerçek log üzerinde tam lifecycle testi:
  1. krep --learn /tmp/krep_realtest/ --auto-refresh 1h
  2. build_time 2h önceye al (simulate stale)
  3. krep "auth" → _REFRESH_TRIGGERED=1, bg subprocess spawn
  4. 4 saniye bekle
  5. Aynı krep "auth" → cache mtime invalid → fresh model load
  6. is_stale = False (refresh sonrası)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tests/test_krep_learn.py covers every public + critical-private API of
the PPMI+SVD model layer:

- TestTokenize (6) — short-word filter, pure-digit skip, __dunder skip,
  Unicode aware, punctuation split
- TestWalkFiles (3) — .git/__pycache__ skip, binary ext skip, recursive
- TestReadFileFromOffset (5) — full / tail-seek / out-of-bounds (truncate)
  / binary NUL detect / empty file
- TestParseInterval (7) — h/m/d/w/s formats, disable keywords, invalid
- TestFormatAge (4) — seconds/minutes/hours/days
- TestIsStale (4) — disabled never stale, fresh ok, old stale, age delta
- TestBuildModel (7) — vocab, HD vs 3D shapes, axis labels, file_state,
  term_freq, pair_counts, auto_refresh stored, empty corpus error
- TestSaveLoadRoundtrip (7) — creates vectors.npz+metadata.json+state.json,
  lightweight vs with_state, 3D persistence, deterministic hash dir,
  missing returns None
- TestVectorizeWithModel (6) — dim="hd" vs "3d", OOV zero, empty, partial
  OOV, L2-normalized output
- TestCosine (3) — identical=1, orthogonal=0, zero-vec=0
- TestUpdateModel (5) — no-change, new lines, tail-only bytes, rotation
  detection (size shrink), deleted file removed from state
- TestListPurgeModels (4) — empty, after build, purge removes all, purge
  empty returns 0
- TestEndToEnd (2) — full pipeline build→save→load→query, unrelated
  word low similarity

Bug fixes uncovered during TDD:
1. _tokenize: '__pyx_n_u_error' filter was too narrow (digit required).
   Now ALL __dunder prefixes drop (Cython internals + Python __init__
   noise both filtered).
2. _walk_files: '/.git/' SKIP_DIR pattern missed dirpath without trailing
   slash. Added rstrip('/') + '/' padding so 'os.walk' returned paths
   match exactly.
3. update_model: file-deletion-only case (no new lines but a file gone)
   used to short-circuit with stale file_state. New `files_deleted` branch
   updates file_state without re-running SVD.

Total tests: 375 (312 baseline + 63 new krep_learn).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hibrit paketleme stratejisi (yol C): krep'i ayrı repo'ya çıkarmadan,
tek paket içinde bağımsız bir CLI entry point olarak expose et.

YENİ:
- kishi/krep_cli.py — sys.argv → kishi_krep builtin'i sarar, sys.exit ile
  rc döndürür. Bash, zsh, fish gibi herhangi bir shell'den çağrılabilir.

- pyproject.toml [project.scripts]:
  kishi = "kishi.main:main"     # mevcut
  krep  = "kishi.krep_cli:main" # YENİ

KULLANICI DENEYIMI:
  $ pip install kishi-shell
  $ krep "auth login" /var/log/         # ← bash'tan doğrudan
  $ krep --learn /var/log/ --auto-refresh 1h
  $ krep --list-models
  $ cat app.log | krep error             # stdin pipe da OK
  $ kishi                                 # ← Kishi REPL hâlâ orada
    Kishi$ -> krep "auth" /var/log/

SMOKE TESTS:
- python -m kishi.krep_cli --help        ✓
- python -m kishi.krep_cli --list-models ✓
- python -m kishi.krep_cli auth FILE     ✓ (3D scatter render)
- echo "..." | python -m kishi.krep_cli  ✓ (stdin pipe)

YENİ TESTLER (3, TestKrepCliEntry):
- test_help_exits_zero       — --help SystemExit(0)
- test_no_pattern_exits_one  — boş çağrı SystemExit(1)
- test_list_models_exits_zero — --list-models boş cache OK

GELECEK ESNEKLİK:
İleride krep ekosistemi büyürse, ayrı repo (krep-cli) PyPI paketi
yapmak 30 dakikalık iş — kishi/krep_*.py taşır, pyproject paket adı
değiştir, kishi-shell krep-cli'yi dep olarak ekler. Şimdiki mimaride
böyle bir migration için ek yatırım yok.

378/378 test geçiyor (375 önce + 3 CLI entry test).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Kishi'nin "saf Python + 2 dep" felsefesini koru — numpy/scipy
optional-dependencies'e taşı. pip install kishi-shell artık 30 KB
core paket; krep --learn ihtiyaç doğarsa pip install kishi-shell[krep].

pyproject.toml:
  dependencies = [prompt_toolkit, psutil]  # core
  optional-dependencies.krep = [numpy>=1.20, scipy>=1.7]

kishi/builtins.py:
  --learn / --update-learn / --list-models / --purge-models numpy yoksa
  yönlendirici hata:
    "Install: pip install kishi-shell[krep]
     Arch:    sudo pacman -S python-numpy python-scipy"
  Keyword engine bu paketler olmadan da çalışmaya devam eder.

README.md + README.tr.md:
  - Install section: iki seçenek (sade + [krep])
  - Krep AI section: LSA modeli için optional extra notu

Version bump 2.0.1.0 → 2.0.2.0 (yeni krep CLI + LSA model + optional dep)
- pyproject.toml
- kishi/main.py banner
- kishi/builtins.py help_text + neofetch
- README + README.tr başlıklar

378/378 test geçiyor, sıfır regresyon.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Krep'in iki ana feature dalını tek branch'te birleştir. Her iki feature
da krep_search'ün başına kendi dispatch'ini ekliyordu; merge sonrası
ÜÇ KATMANLI dispatch zinciri:

1. _resolve_model(paths)  →  SVD modeli varsa yükle (numpy opsiyonel)
2. _krep_rg_streaming     →  rg sistemde varsa stream prefilter
                              (model varsa HD vec ile cosine)
3. process_file walker    →  ikisi de yoksa veya rg 0-match dönerse

Conflict resolutions:
- pyproject.toml: v2.0.2.0 + optional-deps[krep] (SVD'den)
- kishi/main.py + builtins.py: v2.0.2.0 banner
- kishi/builtins.py: --learn/--update-learn/--auto-refresh (SVD) +
  --no-rg (rg) + birleştirilmiş --no-model/--no-rg implementation
- kishi/krep.py: _krep_rg_streaming model parametresi alır, HD vec
  ile satır vektörleştirir. Match tuple 4-element
  (l_vec, sim, output_str, raw_text). _krep_finalize artık model
  parametresi alır, raw_text'ten PCA-3 reduce ile scatter render eder.
- README.md / README.tr.md: v2.0.2.0 başlık

tests/test_krep_streaming.py:
- Match tuple format check 3-veya-4 element kabul edecek şekilde
  güncellendi (HD vec için vec_len >= 3 check).

407 / 407 test geçiyor (305 baseline + 63 krep_learn + 27 krep_streaming
+ 5 krep_perf + 7 krep). Sıfır regresyon.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #4 CI 5 Python versiyonunda da fail etti çünkü GitHub runner'da
ripgrep yok. _krep_rg_streaming rg_spawn_failed dönünce match boş kaldı.

İki katmanlı düzeltme:

1. .github/workflows/ci.yml:
   - sudo apt-get install -y ripgrep adımı eklendi (önerilen yol)
   - pip install -e ".[krep]" ile numpy/scipy de yüklenir
     (krep_learn tests için)

2. tests/test_krep_streaming.py:
   - TestStreamingSearch class'ına autouse fixture: rg yoksa skip
   - test_streaming_hard_timeout_safety_net + test_streaming_terminates_cleanly_on_early_stop:
     fonksiyon başında rg check (TestKrepSearchDispatch class içinde)
   - Defansif: rg install adımı düşse bile testler skip olur, fail olmaz

CI artık geçmeli; lokal 407/407 hala geçiyor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dirs

PR #4'te merge sırasında 'git add -A' yanlışlıkla test artifact'leri
de sürükledi (redirect testlerinden '>' ile bırakılmış boş dosyalar:
=13, =3.0.0, Editor, Editör, Girdi, Input, Risk, Task, Terminal, echo,
export, f, file, greetnWelcome, i, krep, merhabanSisteme, out, plugin,
weather, $MYOUT, &1, (, **For, **NOT:**, **Tip:**, 0, 1:n, 2400x, 60,
=, sess.log:17:, u001b[1 ve daha fazlası — toplam 38 dosya).

Ayrıca .serena/ ve .vscode/ editor-spesifik konfig'ler de tracked'di.
Bunlar kullanıcıya/makineye bağlı, repo'da olmamalı.

Düzeltme:
1. git rm --cached ile 38 çöp dosya + .serena/ + .vscode/ index'ten çıktı
2. Lokal disk'te de silindi
3. .gitignore'a kapsamlı kalıcı kurallar:
   - .serena/, .vscode/, .idea/, .mypy_cache/, .pytest_cache/, .coverage
   - .venv*/, .cache/, .krep_models/
   - Redirect test artifact'ları için pattern'ler ($*, =*, &*, vs.)
   - Türkçe karakter dahil tüm test çöp isimleri

407/407 test geçiyor (sıfır regresyon).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PyPI sdist build (python -m build) krep_core.pyx'i bulamadığı için
fail oluyordu — pyproject.toml [tool.setuptools.packages.find] sadece
*.py dosyaları alıyor, MANIFEST.in olmadan .pyx sdist'e girmez.

MANIFEST.in:
- recursive-include kishi *.pyx *.py  (Cython source + Python)
- recursive-include tests *.py        (downstream verify)
- recursive-include docs *.md         (plan + benchmark)
- recursive-include assets *.png      (README image'lar)
- include README*.md LICENSE pyproject.toml setup.py
- exclude kishi/*.c kishi/*.so        (build sırasında üretilir)
- prune .serena .vscode .mypy_cache .pytest_cache build dist
- global-exclude __pycache__ *.pyc

Sonuç:
  dist/kishi_shell-2.0.2.0.tar.gz   7.6 MB  (sdist, .pyx + tests + assets)
  dist/kishi_shell-2.0.2.0-cp314-cp314-linux_x86_64.whl  236 KB
  twine check dist/*  →  PASSED

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ozhangebesoglu ozhangebesoglu merged commit 28a100a into main Jun 1, 2026
5 checks passed
@ozhangebesoglu ozhangebesoglu deleted the feat/krep-svd-model branch June 1, 2026 14:57
ozhangebesoglu added a commit that referenced this pull request Jun 1, 2026
PR #4 sonrası v2.0.2.0 tag push'unda PyPI 400 hatası:
  Binary wheel 'kishi_shell-2.0.2.0-cp312-cp312-linux_x86_64.whl'
  has an unsupported platform tag 'linux_x86_64'.

Kök neden: 'python -m build' default'ta hem sdist hem wheel üretir.
GitHub runner Ubuntu/Linux olduğu için wheel 'linux_x86_64' tag'i
alıyor. PyPI sadece manylinux* / musllinux* / win_amd64 / macosx_*
gibi taşınabilir platform tag'lerini kabul ediyor; 'linux_x86_64'
kullanıcı makinesine özel olduğu için reddediliyor.

Düzeltme:
- 'python -m build --sdist' → sadece kaynak .tar.gz üret
- Kullanıcı 'pip install kishi-shell' yaparken Cython ile kendi
  makinesinde derler (gcc + Python headers gerekli, Linux'ta standart).

İleride manylinux2014_x86_64 + macOS arm64 + Windows amd64 binary
wheel'lar için 'cibuildwheel' eklenebilir (gerek olunca).

Ayrıca test job'a ripgrep + krep extra'sı eklendi (PR #4'teki ana CI
fix'i publish workflow'a da uygulandı; krep test'leri rg gerektiriyor,
krep_learn test'leri numpy/scipy gerektiriyor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant