Files
GIA/artifacts/plans/17-person-enrichment-without-llm.md

3.2 KiB

Feature Plan: Person Model Enrichment (Non-LLM First)

Goal

Populate Person fields from existing message history without spending OpenAI tokens by default:

  • summary
  • profile
  • revealed
  • likes
  • dislikes
  • sentiment
  • timezone
  • last_interaction

Problem We Are Solving

  • We have high-volume message data but limited durable person intelligence.
  • LLM analysis is expensive for continuous/background processing.
  • We need fast, deterministic extraction first, with optional semantic ranking.

Design Decisions

  1. Config scope:
    • global defaults
    • optional group-level overrides
    • per-user overrides
  2. Resolution order:
    • user > group > global
  3. Global toggle:
    • hard kill-switch (PERSON_ENRICHMENT_ENABLED)
  4. Per-user/group controls:
    • enable/disable enrichment
    • write mode (proposal_required or direct)
    • confidence threshold
    • max messages scanned per run
    • semantic-ranking toggle

Proposed Data Additions

  • PersonEnrichmentSettings:
    • scope fields (user, optional group)
    • toggle/threshold/runtime limits
  • PersonSignal:
    • normalized extracted clue
    • source references (message ids/events)
    • confidence and detector name
  • PersonUpdateProposal:
    • pending/approved/rejected person field updates
    • reason and provenance
  • Optional PersonFieldRevision:
    • before/after snapshots for auditability

Processing Flow

  1. Select message window:
    • recent inbound/outbound messages per person/service
    • bounded by configurable caps
  2. Fast extraction:
    • deterministic rules/regex for:
      • timezone cues
      • explicit likes/dislikes
      • self-revealed facts
      • interaction-derived sentiment hints
  3. Semantic ranking (optional):
    • use Manticore-backed similarity search for classifier labels
    • rank candidate signals; do not call OpenAI in default path
  4. Signal aggregation:
    • merge repeated evidence
    • decay stale evidence
    • detect contradictions
  5. Apply update:
    • proposal_required: create PersonUpdateProposal
    • direct: write only above confidence threshold and with no conflict
  6. Persist audit trail:
    • record detector/classifier source and exact message provenance

Field-Specific Policy

  • summary/profile: generated from stable high-confidence aggregates only.
  • revealed: only explicit self-disclosures.
  • likes/dislikes: require explicit statement or repeated pattern.
  • sentiment: rolling value with recency decay; never absolute truth label.
  • timezone: explicit declaration preferred; behavioral inference secondary.
  • last_interaction: deterministic from most recent message timestamps.

Rollout

  1. Schema and settings models.
  2. Deterministic extractor pipeline and commands.
  3. Proposal queue + review flow.
  4. Optional Manticore semantic ranking layer.
  5. Backfill job for existing persons with safe rate limits.

Acceptance Criteria

  • Default enrichment path runs with zero OpenAI usage.
  • Person updates are traceable to concrete message evidence.
  • Config hierarchy behaves predictably (user > group > global).
  • Operators can switch between proposal and direct write modes per scope.

Out of Scope

  • Cross-user shared person graph.
  • Autonomous LLM-generated profile writing as default.