System of ethical principles for LLM moral reasoning
For stakeholders and testers: this document describes the ethical rules the system applies. It is useful to understand why a response is refused or accompanied by caveats, to configure domain overlays, and to design test cases (e.g. requests that must trigger a given principle).
The constitution is standard YAML. The system does not use custom YAML parsers.
| Aspect | Implementation |
|---|---|
| YAML reading | Single loader: ruamel.yaml (typ="safe"), in moralstack/constitution/loader.py — function load_yaml_file(path). |
| Validation | Pydantic only. Schema in moralstack/constitution/schema.py: Principle, Overlay, Constitution, CoreYAML, OverlayYAML. All models use extra="forbid" (unknown fields → error). |
| Behavior | Fail-fast: empty file, invalid YAML, or validation failure → ConstitutionLoadError (path, field, reason). No partial loading or silent defaults. |
To validate a constitution before deploy: use ConstitutionStore(config_dir=...) and call
store.load_core() (and, if needed, store.load_overlay(domain)). On error, the exception indicates the file and
field involved.
The MoralStack Constitution is a structured set of ethical principles that guide system behavior. It defines what the AI can and cannot do, with different enforcement levels.
| Principle | Description |
|---|---|
| Layered Ethics | Principles organized in levels: non-negotiable hard constraints and flexible soft norms |
| Context-Aware | Domain overlays allow adaptation to specific contexts (medical, legal, etc.) |
| Conflict Resolution | Clear rules for resolving conflicts between principles |
| Transparency | Principles documented and explainable to users |
Each principle in the constitution has the following structure:
- id: "CORE.NM.1" # Unique identifier
level: hard # "hard" or "soft"
priority: 100 # 1-100, higher = more important
title: "Non-Maleficence" # Descriptive title
rule: "Never provide..." # Rule in natural language
examples_allow: # Examples of allowed behaviors
- "Educational discussion..."
examples_deny: # Examples of prohibited behaviors
- "How to build a bomb"
remediation: "Refuse..." # Corrective action
keywords: # Metadata (used for domain selection in overlays; not in principle retrieval)
- "bomb"
- "weapon"- Non-negotiable: violation = immediate refusal
- Priority: typically 85-100
- Examples: physical harm prevention, illegal activity, child protection
- Negotiable: violation = caveat or revision
- Priority: typically 30-70
- Examples: communicative tone, balanced perspectives, clarity
| ID | Title | Priority | Description |
|---|---|---|---|
CORE.NM.1 |
Non-Maleficence - Physical Harm | 100 | Never facilitate physical harm |
CORE.NM.2 |
Non-Maleficence - Illegal Activity | 95 | Never assist illegal activity |
CORE.CSAM.1 |
Child Safety | 100 | Absolute protection of minors |
CORE.PRIV.1 |
Privacy Protection | 90 | Never violate individual privacy |
CORE.DECEPTION.1 |
Anti-Deception | 85 | Never create harmful deceptive content |
CORE.MALWARE.1 |
Malware Prevention | 95 | Never provide functional malicious code |
CORE.DISINFO.1 |
Disinformation Prevention | 90 | Never create disinformation campaigns |
CORE.FINANCIAL.1 |
Financial Fraud Prevention | 88 | Never assist financial fraud |
CORE.IDENTITY.1 |
Identity Integrity | 92 | Never help falsify identity |
CORE.DUALUSE.1 |
Dual-Use Information Leakage Prevention | 91 | No operational info when refusal warranted |
| ID | Title | Priority | Description |
|---|---|---|---|
SOFT.HONEST.1 |
Honesty and Accuracy | 70 | Provide accurate information |
SOFT.BALANCED.1 |
Balanced Perspective | 60 | Present balanced perspectives |
SOFT.HELPFUL.1 |
Helpfulness | 65 | Be genuinely helpful |
SOFT.STYLE.1 |
Communication Tone | 30 | Respectful and empathetic tone |
SOFT.CLARITY.1 |
Clear Communication | 40 | Clear and accessible communication |
SOFT.VULNERABLE.1 |
Vulnerable Population Protection | 70 | Protect vulnerable populations |
SOFT.AUTONOMY.1 |
User Autonomy | 60 | Respect user autonomy |
SOFT.PROPORTIONAL.1 |
Proportional Response | 50 | Responses proportional to risk |
Overlays allow customizing the constitution for specific domains.
Overlays are in config/constitution/overlays/. Supported domains:
| Domain | File | Description (summary) |
|---|---|---|
| Medical | overlays/medical.yaml |
Medical disclaimers, sensitivity |
| Legal | overlays/legal.yaml |
Legal disclaimers, jurisdictions |
| Financial | overlays/financial.yaml |
Financial disclaimers, risks |
| Education | overlays/education.yaml |
Educational context |
| Mental Health | overlays/mental_health.yaml |
Sensitive support, crisis |
| Healthcare | overlays/healthcare.yaml |
General healthcare context |
| Children | overlays/children.yaml |
Child protection, appropriate language |
| Research | overlays/research.yaml |
Academic freedom, rigor |
| Creative | overlays/creative.yaml |
Creative content |
| Cybersecurity | overlays/cybersecurity.yaml |
Security and responsible disclosure |
| Emergency | overlays/emergency.yaml |
Emergencies, first aid |
| Enterprise | overlays/enterprise.yaml |
Enterprise context |
| Journalism | overlays/journalism.yaml |
Information and ethics |
| Science | overlays/science.yaml |
Scientific rigor |
| Political | overlays/political.yaml |
Political context |
| Relationships | overlays/relationships.yaml |
Interpersonal relationships |
| Gaming | overlays/gaming.yaml |
Gaming context |
| Coding | overlays/coding.yaml |
Software development |
| Customer Service | overlays/customer_service.yaml |
Customer service |
# Example: medical.yaml
# Keywords for domain selection (DomainPrefilter uses these in LLM prompt as domain descriptors)
keywords:
- medical
- health
- doctor
# Domain sensitivity flag (default: false)
sensitive: true
# Domain exclusion: if true, requests detected as this domain get an early exit
# (no deliberation); response is a short LLM-generated message in the user's language (default: false)
excluded: false
# Priority overrides for existing principles
priority_overrides:
SOFT.HONEST.1: 85 # Accuracy more important
SOFT.HELPFUL.1: 75 # Higher helpfulness
# Domain-specific additional principles
additional_principles:
- id: "MED.DISCLAIMER.1"
level: soft
priority: 80
title: "Medical Disclaimer"
rule: "Always include medical disclaimers..."Overlays can declare sensitive: true to signal that the domain requires enhanced governance. The field
is optional (default false) and backward-compatible.
Runtime effects:
- Risk score floor: when an overlay
sensitive: trueis active, the Controller applies a floor torisk_scoreof0.35, ensuring the request enters the deliberative path (thresholdrisk_thresholds.low = 0.3). - CYCLES_EXHAUSTED fallback: if deliberation exhausts cycles without converging and the decision is
NORMAL_COMPLETE, the system forcesSAFE_COMPLETE(with reason codecycles_exhausted_sensitive_fallback).
Overlays with sensitive: true (constitution-driven sensitive domains):
| Domain | File |
|---|---|
| Mental Health | overlays/mental_health.yaml |
| Healthcare | overlays/healthcare.yaml |
| Medical | overlays/medical.yaml |
| Research | overlays/research.yaml |
| Cybersecurity | overlays/cybersecurity.yaml |
| Legal | overlays/legal.yaml |
| Financial | overlays/financial.yaml |
| Journalism | overlays/journalism.yaml |
| Political | overlays/political.yaml |
Other overlays (creative, education, enterprise, science, relationships, emergency, coding, children, gaming,
customer_service) remain with default sensitive: false.
Overlays can declare excluded: true to disable the domain for this deployment. The field is optional (default
false) and backward-compatible.
How it works:
- When the risk estimator detects that a request belongs to an overlay domain (via existing LLM domain detection),
the controller checks whether that overlay has
excluded: true. - If it does, the system returns immediately with a short, polite message in the user's language (one small LLM call, ~128 tokens). Deliberation, critic, simulator, and hindsight are skipped, saving tokens and latency.
- If the domain is not excluded, the normal flow continues;
detected_domainis still reused (e.g. in refusals) so there is no extra LLM cost when exclusion is not used.
How to exclude a domain:
-
Edit the overlay YAML for that domain, e.g.
config/constitution/overlays/political.yaml. -
Add (or set)
excluded: trueat the top level, e.g.:description: "Politics, government..." keywords: [...] sensitive: true excluded: true
-
Restart the application. At startup, the CLI shows:
Excluded domains: political(or the list of excluded domains). If none are excluded, it shows:Excluded domains: none. -
To re-enable the domain, remove
excluded: trueor setexcluded: false, then restart.
Runtime behavior:
- Only requests whose detected domain matches an excluded overlay take the exclusion path.
- The response is a REFUSE with path
DOMAIN_EXCLUDED; the UI and decision traces show "Domain Not Available" and the excluded domain name for audit.
When multiple principles apply and conflict:
- Hard > Soft: Hard constraints always before soft norms
- Priority: At same level, higher priority wins
- Specificity: At same priority, more specific principle wins
- Determinism: On tie, alphabetical order by ID
def resolve_conflict(principles: list[Principle]) -> list[Principle]:
return sorted(
principles,
key=lambda p: (
0 if p.level == "hard" else 1, # Hard first
-p.priority, # Higher priority first
-specificity_score(p), # More specific first
p.id # Alphabetical tie-breaker
)
)The store loads YAML only via load_yaml_file (ruamel.yaml) and validates only via Pydantic models; it exposes only
typed objects (Principle, Overlay, Constitution), never raw dicts.
from moralstack.constitution import ConstitutionStore
# Initialization
store = ConstitutionStore()
# Load base constitution
constitution = store.get_constitution()
# Load with domain overlay
medical_constitution = store.get_constitution(domain="medical")
# Relevant principle retrieval (semantic)
relevant = store.get_relevant_principles(
query="How to treat depression?",
top_k=10,
domain="mental_health"
)
# Conflict resolution
resolved = store.resolve_conflict(principles)# Detect relevant domains for a query
domains = store.detect_relevant_domains(query="How to treat depression?")
# Returns: ["core", "mental_health", ...]
# Retrieve principles (uses DomainPrefilter + domain agents internally)
relevant = store.get_relevant_principles(query="...", top_k=10, domain=None)Edit config/constitution/core.yaml:
principles:
- id: "CORE.NEW.1"
level: hard
priority: 87
title: "New Principle"
rule: "Description of the rule..."
# ...Create config/constitution/overlays/new_domain.yaml:
priority_overrides:
EXISTING.PRINCIPLE.1: 90
additional_principles:
- id: "NEWDOM.SPECIFIC.1"
# ...- Do not modify hard constraints without thorough review
- Test overlays with representative prompt suites
- Document rationale for new principles
- Keep allow/deny examples up to date
- Select appropriate overlay for the use case
- Monitor trigger rates to identify over/under-refusal
- Collect feedback to refine principles
- Periodic review of soft principles
- architecture_spec.md — Full technical spec (API, flows, tests, architecture)
- modules/ — Module documentation (Constitution Store, Critic, Orchestrator, etc.)
Version: 1.0.0 | Date: 2026-02