3.2 KiB
3.2 KiB
Feature Plan: Person Model Enrichment (Non-LLM First)
Goal
Populate Person fields from existing message history without spending OpenAI tokens by default:
summaryprofilerevealedlikesdislikessentimenttimezonelast_interaction
Problem We Are Solving
- We have high-volume message data but limited durable person intelligence.
- LLM analysis is expensive for continuous/background processing.
- We need fast, deterministic extraction first, with optional semantic ranking.
Design Decisions
- Config scope:
- global defaults
- optional group-level overrides
- per-user overrides
- Resolution order:
user > group > global
- Global toggle:
- hard kill-switch (
PERSON_ENRICHMENT_ENABLED)
- hard kill-switch (
- Per-user/group controls:
- enable/disable enrichment
- write mode (
proposal_requiredordirect) - confidence threshold
- max messages scanned per run
- semantic-ranking toggle
Proposed Data Additions
PersonEnrichmentSettings:- scope fields (
user, optionalgroup) - toggle/threshold/runtime limits
- scope fields (
PersonSignal:- normalized extracted clue
- source references (message ids/events)
- confidence and detector name
PersonUpdateProposal:- pending/approved/rejected person field updates
- reason and provenance
- Optional
PersonFieldRevision:- before/after snapshots for auditability
Processing Flow
- Select message window:
- recent inbound/outbound messages per person/service
- bounded by configurable caps
- Fast extraction:
- deterministic rules/regex for:
- timezone cues
- explicit likes/dislikes
- self-revealed facts
- interaction-derived sentiment hints
- deterministic rules/regex for:
- Semantic ranking (optional):
- use Manticore-backed similarity search for classifier labels
- rank candidate signals; do not call OpenAI in default path
- Signal aggregation:
- merge repeated evidence
- decay stale evidence
- detect contradictions
- Apply update:
proposal_required: createPersonUpdateProposaldirect: write only above confidence threshold and with no conflict
- Persist audit trail:
- record detector/classifier source and exact message provenance
Field-Specific Policy
summary/profile: generated from stable high-confidence aggregates only.revealed: only explicit self-disclosures.likes/dislikes: require explicit statement or repeated pattern.sentiment: rolling value with recency decay; never absolute truth label.timezone: explicit declaration preferred; behavioral inference secondary.last_interaction: deterministic from most recent message timestamps.
Rollout
- Schema and settings models.
- Deterministic extractor pipeline and commands.
- Proposal queue + review flow.
- Optional Manticore semantic ranking layer.
- Backfill job for existing persons with safe rate limits.
Acceptance Criteria
- Default enrichment path runs with zero OpenAI usage.
- Person updates are traceable to concrete message evidence.
- Config hierarchy behaves predictably (
user > group > global). - Operators can switch between proposal and direct write modes per scope.
Out of Scope
- Cross-user shared person graph.
- Autonomous LLM-generated profile writing as default.