Every edge extracted by weight-extract carries a confidence score derived from the raw logit magnitudes of the FFN feature that produced it. Confidence separates (France, L26-F9298, Paris) at 0.89 from (France, L3-F2041, crawl) at 0.002.
Extraction is always complete — all edges are stored regardless of confidence. Filtering by confidence happens at query time or as a post-processing step.
Each FFN feature i at layer L has two projections:
Input side (W_gate): embed @ W_gate.T — projects the embedding matrix through the gate weights. The top score for feature i is c_in: how specifically this feature responds to one trigger token vs many. High c_in = entity-selective.
Output side (W_down): embed @ W_down — projects the embedding matrix through the down weights. The top score for feature i is c_out: how strongly this feature pushes toward one answer token. High c_out = strong writer.
Raw product: c_in × c_out — a feature that fires specifically for "France" AND writes strongly toward "Paris" has a high raw product. A feature that fires vaguely AND writes weakly is noise.
Per-layer normalization: After all features in a layer are walked:
c = (c_in × c_out) / max(c_in × c_out across this layer)
This gives confidence in [0, 1] normalized within each layer.
Different layers serve different functions in the transformer:
| Layer range | Role | Signal type |
|---|---|---|
| L0–L14 | Dark accumulation | Structural, low factual confidence |
| L14–L25 | Relation differentiation | Mixed, relations emerging |
| L26 | Fact explosion | Highest factual confidence |
| L27–L33 | Refinement | Copy, format, consolidation |
A confidence of 0.8 at L26 means "strong factual edge." A confidence of 0.8 at L3 means "strong structural edge." Both are valid but serve different purposes. Per-layer normalization keeps scores comparable within their function. The layer field lets you weight across layers at query time.
Empirical results from Gemma 3-4B show that confidence and selectivity measure different things:
| Score | What it measures | Peaks at | Correlates with |
|---|---|---|---|
c (confidence) |
Combined signal: c_in × c_out / max |
Early/mid layers (L6–L12) | Structural edges — function words, syntax |
selectivity |
Input specificity: c_in / max(c_in) |
Late layers (L25–L33) | Factual edges — proper nouns, entities |
Early layers have features that fire broadly (low c_in) but write strongly to common tokens (high c_out). This gives high confidence but low selectivity — these are structural edges ("the", "is", "a").
Late layers have features that fire specifically for entities (high c_in) but write with moderate strength. This gives lower confidence but high selectivity — these are the factual edges you want.
For factual knowledge: filter on selectivity + late layers.
For structural analysis: filter on confidence + early layers.
{
"s": "France",
"r": "L26-F9298",
"o": "Paris",
"c": 0.89,
"src": "parametric",
"meta": {
"layer": 26,
"feature": 9298,
"c_in": 8.7,
"c_out": 12.4,
"selectivity": 0.72
}
}| Field | Description |
|---|---|
c |
Normalized confidence [0, 1] — (c_in × c_out) / max per layer |
selectivity |
Normalized input selectivity [0, 1] — c_in / max(c_in) per layer |
c_in |
Raw input selectivity (gate projection magnitude) |
c_out |
Raw output strength (down projection magnitude) |
layer |
Source transformer layer |
feature |
Source FFN feature index |
Extraction stores everything. Filtering happens when you load or query:
// Factual edges: high selectivity at late layers
let factual: Vec<&Edge> = graph.edges()
.iter()
.filter(|e| {
let meta = e.metadata.as_ref().unwrap();
let layer = meta["layer"].as_u64().unwrap();
let sel = meta["selectivity"].as_f64().unwrap();
layer >= 25 && sel >= 0.15
})
.collect();The --stats flag writes per-layer statistics for validation:
larql weight-extract google/gemma-3-4b-it \
-o knowledge.larql.json \
--stats stats.jsonStats file contains per-layer:
| Field | Description |
|---|---|
mean_confidence |
Average normalized confidence (c_in × c_out) |
max_confidence |
Highest confidence edge |
mean_selectivity |
Average normalized selectivity (c_in) |
max_selectivity |
Highest selectivity edge |
mean_c_in |
Average raw input selectivity |
mean_c_out |
Average raw output strength |
self_loop_count |
Edges where subject == object (identity reinforcement) |
self_loop_pct |
Self-loop percentage |
top_subjects |
Top 10 subjects by frequency, with avg confidence |
top_objects |
Top 10 objects by frequency, with avg confidence |
edges_found |
Total edges extracted from this layer |
features_scanned |
Number of FFN features walked |
Validation targets:
- Factual layers (L25+) should have the highest
mean_selectivity - Early layers should have high
self_loop_pct(identity reinforcement) top_subjectsat factual layers should include proper nounstop_subjectsat early layers should be dominated by function words
For Gemma 3-4B-IT (34 layers, 10240 features/layer):
| Metric | Approximate value |
|---|---|
| Total edges | ~8M |
| Edges at c >= 0.1 | ~500K–1M |
| Edges at c >= 0.5 | ~30K–50K |
| JSON file (complete) | ~1.5 GB |
| JSON file (c >= 0.1) | ~200 MB |
| MessagePack (complete) | ~700 MB |