Implement CluedIn with AI

1) Principles of an AI‑First CluedIn
2) Reference Architecture (AI Control Loop)
3) Quick Start (2 Days)
4) AI Agents — Analysis & Automation Brain
5) AI Rules — From Idea to Executable Guardrails
6) AI Enricher — Add Signals Safely
1. 6.1 Use cases
2. 6.2 Enricher Config (example)
7) AI Mapping — Source → Canonical, On Autopilot (With Brakes)
8) Copilot — Conversational Control Surface
9) Governance, Safety & Observability for AI
10) Maturity Ladder (AI Adoption Path)
11) End‑to‑End Playbooks
12) CI/CD & Testing for AI Changes
13) Operating Rhythm
14) “What Good Looks Like”
15) Copy‑Paste Library

Scope: How to run CluedIn with AI at the center—using AI Agents, AI Rules, AI Enricher, AI Mapping, and Copilot—from onboarding to daily operations, with safety, observability, and governance.

TL;DR: Automate suggestions and safe auto-fixes end‑to‑end, keep humans in the approval loop for high‑impact changes, and measure everything. Start read‑only, then progress to suggest → gated auto‑fix → auto‑promotion with rollback.

1) Principles of an AI‑First CluedIn

ELT + AI loops: Land raw data, then let AI map, profile, validate, clean, dedup, and enrich inside CluedIn.
Guardrails by default: AI reads masked views; approvals required for PII exposure or schema‑breaking actions.
Human‑on‑the‑loop: AI proposes → you approve (or promote) with clear thresholds and rollbacks.
Config as code: Store Agent playbooks, Rules, Enricher templates, and Mapping diffs in Git with PR review.
Measure everything: Acceptance rate, precision/recall of fixes, incident rate, time‑to‑adopt suggestions, rollback frequency.
Fail safe: Prefer flag/quarantine > auto‑fix; require extra proof for structural changes (mappings, exports).

2) Reference Architecture (AI Control Loop)

Ingestion → AI Mapping → Entities
           ↘           ↗
            AI Agents (profile/validate/dedup plan)
                 ↓
            AI Rules (validations, policies, fixes)
                 ↓
            AI Enricher (lookup/classify/summarize)
                 ↓
            Cleaning & Dedup (auto/queued)
                 ↓
            Exports → Consumers (BI/Apps)
                 ↓
            Copilot (chat + actions + change PRs)
                 ↺  Feedback (metrics, audit, prompts)

AI Agents: Orchestrate analysis, generate proposals, triage issues.
AI Rules: Convert proposals into executable validations/policies/transform steps.
AI Enricher: Add attributes (classifications, categories, standardizations).
AI Mapping: Propose/maintain source→entity mappings and diffs.
Copilot: Conversational control surface to run, inspect, and ship changes safely.

3) Quick Start (2 Days)

Day 1

Enable AI features; point to your provider(s).
Run AI Mapping on one source; accept minimal mapping.
Run AI Agents (read‑only) to produce DQ & dedup findings.
Generate AI Rules draft pack; keep all as flag/quarantine only.

Day 2

Turn on AI Enricher for 1–2 low‑risk enrichments (e.g., phone E.164, country codes).
Schedule Agents nightly; export findings to a review queue.
Wire Copilot to create PRs for mapping/cleaning changes; no direct prod writes.

4) AI Agents — Analysis & Automation Brain

4.1 What to use them for

Profiling: completeness, validity, uniqueness, timeliness.
Rule synthesis: propose validations, cleaning steps, dedup keys.
Risk detection: schema drift, PII leakage, broken joins.
Change notes: auto‑draft PR descriptions and runbooks.

4.2 Agent Run (pseudo‑API)

POST /api/ai/agents/run
{
  "agent": "dq-analyzer",
  "target": { "entity": "Person" },
  "mode": "analysis",
  "options": { "sample": 10000, "masked": true }
}

4.3 Outputs you expect

{
  "issues": [
    {"field":"email","type":"invalid_regex","rate":0.031,"examples":["a@x","b@x"]},
    {"field":"phone","type":"format_inconsistent","rate":0.22}
  ],
  "proposals": {
    "validations":[ "...yaml..." ],
    "cleaning":[ "...yaml..." ],
    "dedup_rules":[ "...yaml..." ]
  }
}

4.4 Promotion Policy

Auto‑create PR with the proposals → staging run → diff & metrics.
Auto‑promote only if: tests pass, metrics improve, and risk tag ≠ high.
Always retain correlation_id and prompt hash for audit.

5) AI Rules — From Idea to Executable Guardrails

5.1 What they encode

Validations (regex, domain lists, cross‑field checks).
Quarantine conditions & auto‑fix suggestions.
Policy hints (masking, row filters) for Governance to review.

5.2 Template (generated by AI, reviewed by humans)

# ai_rules/person-email.yaml
rule: email_must_match_regex
entity: Person
severity: high
when: [{ field: email, is_not_null: true }]
check: { regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$" }
on_fail:
  action: flag            # start with flag; escalate later
labels: ["PII","contactability"]
tests:
  - sample_invalids: ["a@x","b@x"]
  - sample_valids: ["ada@lovelace.org"]

5.3 Lifecycle

Agent proposes → PR opens.
CI validates schema & runs sample tests.
Staging export diff; DQ metrics must not regress.
Human approves or requests changes.
Optional: auto‑promote next similar rules if precision > threshold.

6) AI Enricher — Add Signals Safely

6.1 Use cases

Normalize: phone → E.164, emails → lowercase/trim.
Classify: industry, product category, sentiment, language.
Extract: keywords, geocodes, brand names from free‑text.
Summarize: latest ticket or notes for Copilot context.

6.2 Enricher Config (example)

enricher: standardize_contacts
target: Person
schedule: "0 * * * *"
steps:
  - name: email_normalize
    type: ai_enricher
    mode: transform
    field: email
    prompt: "Normalize to lowercase and trim whitespace."
    reversible: true
  - name: phone_e164
    type: ai_enricher
    mode: transform
    field: phone
    prompt: "Convert to E.164 with default country 'AU' when missing."
    guardrails:
      deny_if_changes_pct_over: 0.6
observability:
  emit_metrics: true
  sample_before_after: 25

Guardrails

Reversible writes; keep original in shadow fields.
Rate limits and caching for any external lookups.
PII: redact in prompts or use masked views.

7) AI Mapping — Source → Canonical, On Autopilot (With Brakes)

7.1 Capabilities

Suggest entity classification for a new source.
Propose field‑level mappings with confidence scores.
Generate diffs as sources evolve (rename/add/remove).

7.2 Example Proposal (diff)

mapping: Person
source: "crm-contacts"
proposal:
  add:
    - field: first_name  ; from: "$.firstName" ; confidence: 0.94
    - field: last_name   ; from: "$.lastName"  ; confidence: 0.94
  change:
    - field: email       ; from: "$.emailAddr" ; confidence: 0.88
  unknown:
    - "$.nickname"       ; note: "stash in attributes.* for now"
risk: medium

7.3 Acceptance Flow

Open PR with mapping diff + staging export.
Auto‑accept low‑risk adds (nullable fields) if tests pass.
Require human approval for type changes, renames, or key semantics.
Provide one‑click rollback and keep version history.

8) Copilot — Conversational Control Surface

8.1 What Copilot should do

“Profile Person and propose top 5 fixes.”
“Draft dedup rules deterministic first, then fuzzy.”
“Create an export customers_wide_v1 (upsert, hourly) with these fields.”
“Explain last export failure and suggest a safe rollback.”

8.2 System Prompt (sketch)

You are CluedIn Copilot. Prefer minimal, reversible changes.
Never expose raw PII in responses; use masked views or aggregates.
For structural changes (mapping/exports/policies), open a PR with
tests and staging runs. Include a rollback block.

8.3 Action Binding (examples)

/copilot create-export customers_wide_v1 … → opens PR + staging run.
/copilot dedup-plan Person → posts rule YAML and sampling plan.
/copilot dq-report Person --since 7d → posts charts + issues.

9) Governance, Safety & Observability for AI

9.1 Policy hooks

policy: ai_read_masked_by_default
target: ai:agents
actions: [read]
effect: allow_when
when: "dataset.view == 'masked'"

policy: ai_auto_promotion_guard
target: ai:proposals
actions: [promote]
effect: require_approval
when: "proposal.impact in ['schema','pii','survivorship']"
approvers: ["Data Governance Manager","Administrator"]

9.2 Metrics to track

Suggestion acceptance rate (weekly).
Precision of auto‑fixes (sampled QA).
False‑positive rate of validations.
Rollback count & time to rollback.
DQ KPI trend post‑adoption (validity, duplicates, completeness).
Prompt drift: changes in outputs for same inputs.

9.3 Audit & Evidence

Store prompt, parameters, model/version, proposal diff, approver, correlation_id.
Snapshot before/after samples for fixes.
Export audit packets for reviews.

10) Maturity Ladder (AI Adoption Path)

Read‑Only Insights: Agents analyze; Rules generate as drafts; Enricher off.
Human‑Gated Fixes: Enricher on for low‑risk transforms; Rules flag/quarantine; Mapping diffs PR‑only.
Targeted Auto‑Fix: Auto‑approve low‑risk Rules/Enrichers with >98% precision; Copilot opens PRs for structure.
Auto‑Promotion: Time‑boxed auto‑promotion for repetitive, proven changes (e.g., mapping adds); continuous monitoring & fast rollback.
Self‑Optimizing: Agents retune thresholds based on QA feedback; A/B test competing rule sets.

11) End‑to‑End Playbooks

11.1 New Source Onboarding (AI‑led)

AI Mapping proposes entity & field map → PR.
Agent profiles early load → proposes validations/cleaning.
Rules merged with flag/quarantine only.
Enricher normalizes low‑risk fields.
Copilot scaffolds an export contract and staging run.
Promote once DQ KPIs are stable; set alerts.

11.2 Quality Lift Sprint (1 week)

Day 1: Agent report; pick top 3 issues.
Day 2–3: Rules + Enricher steps; staging QAs.
Day 4: Partial rollout with alerts; sample QA.
Day 5: Measure lift; capture lessons; adjust thresholds.

11.3 Dedup Program

Agent proposes deterministic keys → queue.
Rules codified; auto‑approve high‑confidence merges.
Copilot generates survivorship spec & reviewer guide.
Weekly precision/recall sampling; refine rules.

12) CI/CD & Testing for AI Changes

Unit tests for prompts: stable I/O examples.
Golden datasets for validation drift.
Contract tests to prevent schema break.
Staging export diffs with row/null distribution comparisons.
Release notes: link to audit events and dashboards.

GitHub Actions sketch

name: cluedin-ai-changes
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./tools/validate-yaml.sh ./ai_rules ./ai_mapping ./ai_enricher
      - run: pytest -q  # prompt/golden tests
  stage:
    needs: validate
    steps:
      - run: ./tools/apply.sh env/test --ai

13) Operating Rhythm

Daily: Agent reports glance, new proposals triage, failed jobs check.
Weekly: Ship 2–3 AI Rules/Enricher improvements; measure acceptance/precision.
Monthly: Review auto‑promotion thresholds; audit sample prompts; renew keys/secrets.

14) “What Good Looks Like”

95% of low‑risk fixes applied automatically, with <1% rollback.
Mapping changes are additive and auto‑promoted with staged safety.
DQ KPIs trend up; duplicate rate drops steadily.
Copilot PRs are the default path for structural change.
Audit shelf is always ready: prompts, diffs, metrics, approvals.

15) Copy‑Paste Library

15.1 Agent Request (DQ)

{ "agent":"dq-analyzer","target":{"entity":"Person"},"mode":"analysis","options":{"sample":10000,"masked":true} }

15.2 Rule: Cross‑Field Consistency

rule: order_date_precedes_ship_date
entity: Order
check: { expression: "order_date <= ship_date" }
severity: medium
on_fail: { action: flag }

15.3 Enricher: Country Codes

- name: normalize_country_code
  type: ai_enricher
  field: country_code
  prompt: "Map common names to ISO_3166_1_ALPHA2. If ambiguous, leave null."

15.4 Mapping Diff Template

proposal_id: "map-2025-08-24-001"
entity: Person
changes:
  add: [{ field: "middle_name", from: "$.middleName", confidence: 0.91 }]
  change: []
  remove: []
risk: low

15.5 Copilot Command Examples

/copilot dq-report Person --since 7d
/copilot propose-mapping crm-contacts --entity Person
/copilot create-export customers_wide_v1 --upsert --hourly --fields id,email,updated_at
/copilot dedup-plan Person --deterministic-first

Bottom line: A 100% AI approach to CluedIn works when you combine Agent‑generated proposals, Rule & Enricher execution, Mapping diffs, and Copilot‑driven PRs—all wrapped in policy guardrails, metrics, and fast rollback. Start small, ship often, and let AI handle the toil while humans set the rules of the game.