Implement CluedIn with AI
CluedIn: The 100% AI-First Operating Model
Scope: How to run CluedIn with AI at the center—using AI Agents, AI Rules, AI Enricher, AI Mapping, and Copilot—from onboarding to daily operations, with safety, observability, and governance.
TL;DR: Automate suggestions and safe auto-fixes end‑to‑end, keep humans in the approval loop for high‑impact changes, and measure everything. Start read‑only, then progress to suggest → gated auto‑fix → auto‑promotion with rollback.
1) Principles of an AI‑First CluedIn
- ELT + AI loops: Land raw data, then let AI map, profile, validate, clean, dedup, and enrich inside CluedIn.
- Guardrails by default: AI reads masked views; approvals required for PII exposure or schema‑breaking actions.
- Human‑on‑the‑loop: AI proposes → you approve (or promote) with clear thresholds and rollbacks.
- Config as code: Store Agent playbooks, Rules, Enricher templates, and Mapping diffs in Git with PR review.
- Measure everything: Acceptance rate, precision/recall of fixes, incident rate, time‑to‑adopt suggestions, rollback frequency.
- Fail safe: Prefer flag/quarantine > auto‑fix; require extra proof for structural changes (mappings, exports).
2) Reference Architecture (AI Control Loop)
Ingestion → AI Mapping → Entities
↘ ↗
AI Agents (profile/validate/dedup plan)
↓
AI Rules (validations, policies, fixes)
↓
AI Enricher (lookup/classify/summarize)
↓
Cleaning & Dedup (auto/queued)
↓
Exports → Consumers (BI/Apps)
↓
Copilot (chat + actions + change PRs)
↺ Feedback (metrics, audit, prompts)
- AI Agents: Orchestrate analysis, generate proposals, triage issues.
- AI Rules: Convert proposals into executable validations/policies/transform steps.
- AI Enricher: Add attributes (classifications, categories, standardizations).
- AI Mapping: Propose/maintain source→entity mappings and diffs.
- Copilot: Conversational control surface to run, inspect, and ship changes safely.
3) Quick Start (2 Days)
Day 1
- Enable AI features; point to your provider(s).
- Run AI Mapping on one source; accept minimal mapping.
- Run AI Agents (read‑only) to produce DQ & dedup findings.
- Generate AI Rules draft pack; keep all as flag/quarantine only.
Day 2
- Turn on AI Enricher for 1–2 low‑risk enrichments (e.g., phone E.164, country codes).
- Schedule Agents nightly; export findings to a review queue.
- Wire Copilot to create PRs for mapping/cleaning changes; no direct prod writes.
4) AI Agents — Analysis & Automation Brain
4.1 What to use them for
- Profiling: completeness, validity, uniqueness, timeliness.
- Rule synthesis: propose validations, cleaning steps, dedup keys.
- Risk detection: schema drift, PII leakage, broken joins.
- Change notes: auto‑draft PR descriptions and runbooks.
4.2 Agent Run (pseudo‑API)
POST /api/ai/agents/run
{
"agent": "dq-analyzer",
"target": { "entity": "Person" },
"mode": "analysis",
"options": { "sample": 10000, "masked": true }
}
4.3 Outputs you expect
{
"issues": [
{"field":"email","type":"invalid_regex","rate":0.031,"examples":["a@x","b@x"]},
{"field":"phone","type":"format_inconsistent","rate":0.22}
],
"proposals": {
"validations":[ "...yaml..." ],
"cleaning":[ "...yaml..." ],
"dedup_rules":[ "...yaml..." ]
}
}
4.4 Promotion Policy
- Auto‑create PR with the proposals → staging run → diff & metrics.
- Auto‑promote only if: tests pass, metrics improve, and risk tag ≠
high
. - Always retain correlation_id and prompt hash for audit.
5) AI Rules — From Idea to Executable Guardrails
5.1 What they encode
- Validations (regex, domain lists, cross‑field checks).
- Quarantine conditions & auto‑fix suggestions.
- Policy hints (masking, row filters) for Governance to review.
5.2 Template (generated by AI, reviewed by humans)
# ai_rules/person-email.yaml
rule: email_must_match_regex
entity: Person
severity: high
when: [{ field: email, is_not_null: true }]
check: { regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$" }
on_fail:
action: flag # start with flag; escalate later
labels: ["PII","contactability"]
tests:
- sample_invalids: ["a@x","b@x"]
- sample_valids: ["ada@lovelace.org"]
5.3 Lifecycle
- Agent proposes → PR opens.
- CI validates schema & runs sample tests.
- Staging export diff; DQ metrics must not regress.
- Human approves or requests changes.
- Optional: auto‑promote next similar rules if precision > threshold.
6) AI Enricher — Add Signals Safely
6.1 Use cases
- Normalize: phone → E.164, emails → lowercase/trim.
- Classify: industry, product category, sentiment, language.
- Extract: keywords, geocodes, brand names from free‑text.
- Summarize: latest ticket or notes for Copilot context.
6.2 Enricher Config (example)
enricher: standardize_contacts
target: Person
schedule: "0 * * * *"
steps:
- name: email_normalize
type: ai_enricher
mode: transform
field: email
prompt: "Normalize to lowercase and trim whitespace."
reversible: true
- name: phone_e164
type: ai_enricher
mode: transform
field: phone
prompt: "Convert to E.164 with default country 'AU' when missing."
guardrails:
deny_if_changes_pct_over: 0.6
observability:
emit_metrics: true
sample_before_after: 25
Guardrails
- Reversible writes; keep original in shadow fields.
- Rate limits and caching for any external lookups.
- PII: redact in prompts or use masked views.
7) AI Mapping — Source → Canonical, On Autopilot (With Brakes)
7.1 Capabilities
- Suggest entity classification for a new source.
- Propose field‑level mappings with confidence scores.
- Generate diffs as sources evolve (rename/add/remove).
7.2 Example Proposal (diff)
mapping: Person
source: "crm-contacts"
proposal:
add:
- field: first_name ; from: "$.firstName" ; confidence: 0.94
- field: last_name ; from: "$.lastName" ; confidence: 0.94
change:
- field: email ; from: "$.emailAddr" ; confidence: 0.88
unknown:
- "$.nickname" ; note: "stash in attributes.* for now"
risk: medium
7.3 Acceptance Flow
- Open PR with mapping diff + staging export.
- Auto‑accept low‑risk adds (nullable fields) if tests pass.
- Require human approval for type changes, renames, or key semantics.
- Provide one‑click rollback and keep version history.
8) Copilot — Conversational Control Surface
8.1 What Copilot should do
- “Profile
Person
and propose top 5 fixes.” - “Draft dedup rules deterministic first, then fuzzy.”
- “Create an export
customers_wide_v1
(upsert, hourly) with these fields.” - “Explain last export failure and suggest a safe rollback.”
8.2 System Prompt (sketch)
You are CluedIn Copilot. Prefer minimal, reversible changes.
Never expose raw PII in responses; use masked views or aggregates.
For structural changes (mapping/exports/policies), open a PR with
tests and staging runs. Include a rollback block.
8.3 Action Binding (examples)
/copilot create-export customers_wide_v1 …
→ opens PR + staging run./copilot dedup-plan Person
→ posts rule YAML and sampling plan./copilot dq-report Person --since 7d
→ posts charts + issues.
9) Governance, Safety & Observability for AI
9.1 Policy hooks
policy: ai_read_masked_by_default
target: ai:agents
actions: [read]
effect: allow_when
when: "dataset.view == 'masked'"
policy: ai_auto_promotion_guard
target: ai:proposals
actions: [promote]
effect: require_approval
when: "proposal.impact in ['schema','pii','survivorship']"
approvers: ["Data Governance Manager","Administrator"]
9.2 Metrics to track
- Suggestion acceptance rate (weekly).
- Precision of auto‑fixes (sampled QA).
- False‑positive rate of validations.
- Rollback count & time to rollback.
- DQ KPI trend post‑adoption (validity, duplicates, completeness).
- Prompt drift: changes in outputs for same inputs.
9.3 Audit & Evidence
- Store prompt, parameters, model/version, proposal diff, approver, correlation_id.
- Snapshot before/after samples for fixes.
- Export audit packets for reviews.
10) Maturity Ladder (AI Adoption Path)
- Read‑Only Insights: Agents analyze; Rules generate as drafts; Enricher off.
- Human‑Gated Fixes: Enricher on for low‑risk transforms; Rules flag/quarantine; Mapping diffs PR‑only.
- Targeted Auto‑Fix: Auto‑approve low‑risk Rules/Enrichers with >98% precision; Copilot opens PRs for structure.
- Auto‑Promotion: Time‑boxed auto‑promotion for repetitive, proven changes (e.g., mapping adds); continuous monitoring & fast rollback.
- Self‑Optimizing: Agents retune thresholds based on QA feedback; A/B test competing rule sets.
11) End‑to‑End Playbooks
11.1 New Source Onboarding (AI‑led)
- AI Mapping proposes entity & field map → PR.
- Agent profiles early load → proposes validations/cleaning.
- Rules merged with flag/quarantine only.
- Enricher normalizes low‑risk fields.
- Copilot scaffolds an export contract and staging run.
- Promote once DQ KPIs are stable; set alerts.
11.2 Quality Lift Sprint (1 week)
- Day 1: Agent report; pick top 3 issues.
- Day 2–3: Rules + Enricher steps; staging QAs.
- Day 4: Partial rollout with alerts; sample QA.
- Day 5: Measure lift; capture lessons; adjust thresholds.
11.3 Dedup Program
- Agent proposes deterministic keys → queue.
- Rules codified; auto‑approve high‑confidence merges.
- Copilot generates survivorship spec & reviewer guide.
- Weekly precision/recall sampling; refine rules.
12) CI/CD & Testing for AI Changes
- Unit tests for prompts: stable I/O examples.
- Golden datasets for validation drift.
- Contract tests to prevent schema break.
- Staging export diffs with row/null distribution comparisons.
- Release notes: link to audit events and dashboards.
GitHub Actions sketch
name: cluedin-ai-changes
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: ./tools/validate-yaml.sh ./ai_rules ./ai_mapping ./ai_enricher
- run: pytest -q # prompt/golden tests
stage:
needs: validate
steps:
- run: ./tools/apply.sh env/test --ai
13) Operating Rhythm
Daily: Agent reports glance, new proposals triage, failed jobs check.
Weekly: Ship 2–3 AI Rules/Enricher improvements; measure acceptance/precision.
Monthly: Review auto‑promotion thresholds; audit sample prompts; renew keys/secrets.
14) “What Good Looks Like”
-
95% of low‑risk fixes applied automatically, with <1% rollback.
- Mapping changes are additive and auto‑promoted with staged safety.
- DQ KPIs trend up; duplicate rate drops steadily.
- Copilot PRs are the default path for structural change.
- Audit shelf is always ready: prompts, diffs, metrics, approvals.
15) Copy‑Paste Library
15.1 Agent Request (DQ)
{ "agent":"dq-analyzer","target":{"entity":"Person"},"mode":"analysis","options":{"sample":10000,"masked":true} }
15.2 Rule: Cross‑Field Consistency
rule: order_date_precedes_ship_date
entity: Order
check: { expression: "order_date <= ship_date" }
severity: medium
on_fail: { action: flag }
15.3 Enricher: Country Codes
- name: normalize_country_code
type: ai_enricher
field: country_code
prompt: "Map common names to ISO_3166_1_ALPHA2. If ambiguous, leave null."
15.4 Mapping Diff Template
proposal_id: "map-2025-08-24-001"
entity: Person
changes:
add: [{ field: "middle_name", from: "$.middleName", confidence: 0.91 }]
change: []
remove: []
risk: low
15.5 Copilot Command Examples
/copilot dq-report Person --since 7d
/copilot propose-mapping crm-contacts --entity Person
/copilot create-export customers_wide_v1 --upsert --hourly --fields id,email,updated_at
/copilot dedup-plan Person --deterministic-first
Bottom line: A 100% AI approach to CluedIn works when you combine Agent‑generated proposals, Rule & Enricher execution, Mapping diffs, and Copilot‑driven PRs—all wrapped in policy guardrails, metrics, and fast rollback. Start small, ship often, and let AI handle the toil while humans set the rules of the game.