# Feature Plan: Person Model Enrichment (Non-LLM First) ## Goal Populate `Person` fields from existing message history without spending OpenAI tokens by default: - `summary` - `profile` - `revealed` - `likes` - `dislikes` - `sentiment` - `timezone` - `last_interaction` ## Problem We Are Solving - We have high-volume message data but limited durable person intelligence. - LLM analysis is expensive for continuous/background processing. - We need fast, deterministic extraction first, with optional semantic ranking. ## Design Decisions 1. Config scope: - global defaults - optional group-level overrides - per-user overrides 2. Resolution order: - `user > group > global` 3. Global toggle: - hard kill-switch (`PERSON_ENRICHMENT_ENABLED`) 4. Per-user/group controls: - enable/disable enrichment - write mode (`proposal_required` or `direct`) - confidence threshold - max messages scanned per run - semantic-ranking toggle ## Proposed Data Additions - `PersonEnrichmentSettings`: - scope fields (`user`, optional `group`) - toggle/threshold/runtime limits - `PersonSignal`: - normalized extracted clue - source references (message ids/events) - confidence and detector name - `PersonUpdateProposal`: - pending/approved/rejected person field updates - reason and provenance - Optional `PersonFieldRevision`: - before/after snapshots for auditability ## Processing Flow 1. Select message window: - recent inbound/outbound messages per person/service - bounded by configurable caps 2. Fast extraction: - deterministic rules/regex for: - timezone cues - explicit likes/dislikes - self-revealed facts - interaction-derived sentiment hints 3. Semantic ranking (optional): - use Manticore-backed similarity search for classifier labels - rank candidate signals; do not call OpenAI in default path 4. Signal aggregation: - merge repeated evidence - decay stale evidence - detect contradictions 5. Apply update: - `proposal_required`: create `PersonUpdateProposal` - `direct`: write only above confidence threshold and with no conflict 6. Persist audit trail: - record detector/classifier source and exact message provenance ## Field-Specific Policy - `summary/profile`: generated from stable high-confidence aggregates only. - `revealed`: only explicit self-disclosures. - `likes/dislikes`: require explicit statement or repeated pattern. - `sentiment`: rolling value with recency decay; never absolute truth label. - `timezone`: explicit declaration preferred; behavioral inference secondary. - `last_interaction`: deterministic from most recent message timestamps. ## Rollout 1. Schema and settings models. 2. Deterministic extractor pipeline and commands. 3. Proposal queue + review flow. 4. Optional Manticore semantic ranking layer. 5. Backfill job for existing persons with safe rate limits. ## Acceptance Criteria - Default enrichment path runs with zero OpenAI usage. - Person updates are traceable to concrete message evidence. - Config hierarchy behaves predictably (`user > group > global`). - Operators can switch between proposal and direct write modes per scope. ## Out of Scope - Cross-user shared person graph. - Autonomous LLM-generated profile writing as default.