Centralised Master Data Management
Centralised Master Data Management (MDM) with CluedIn — End-to-End Implementation Guide
Goal: Operate a single, governed “system of record” for master data (Customers, Suppliers, Products, etc.) with CluedIn at the core. Use Microsoft Power Apps to capture/maintain master data via forms and Power Automate to orchestrate validation, approvals, deduplication, and write-back — with full auditability, stewardship, and rollback in CluedIn.
What You’ll Build (at a glance)
- CluedIn Core
- Canonical entities & relationships
- Reference data (RDM), quality rules, matching, survivorship
- Stewardship work queues, audit history, rollback
- Exports/APIs to downstream apps
- Power Platform Layer
- Power Apps: secure forms for create/update requests (new customer, supplier update, product introduction, etc.)
- Power Automate: early validation, duplicate check, approvals (business + data stewardship), and commit to CluedIn
- Optional Dataverse: staging + business workflow metadata
- Operational Guardrails
- Least-privilege access, API tokens, environment separation
- Monitoring & alerts; runbooks for rollback and incident response
Outcomes
- A centralised MDM hub in CluedIn with golden records for in-scope domains.
- Standardised, validated, deduplicated master data pushed to consuming systems.
- Self-service create/update via Power Apps with Power Automate approvals.
- Early validation and dedupe preview before anything hits production data.
- Full lineage, audit, and rollback for every change.
Prerequisites
- CluedIn environment (non-prod + prod) with admin access.
- Defined MDM scope & ownership (data domains, data owners, stewards).
- API access to CluedIn (see Administration → API Tokens).
- Power Platform environment with permissions to create Power Apps, Power Automate flows, and (optionally) Dataverse tables.
- A dedicated VM/server for heavy CluedIn jobs (ingestion, matching, toolkit), not a laptop (prevents timeouts).
PART A — Establish the CluedIn MDM Foundation
Step A1 — Define Scope, Ownership, and SLAs
- Pick domains for the first wave (e.g., Customer, Supplier, Product).
- Assign RACI per domain (Data Owner, Steward, Approver).
- Define SLAs for requests (e.g., new supplier in ≤ 1 business day).
- Document mandatory fields, validation policies, and matching evidence (e.g., TaxID/DUNS for suppliers; GTIN/Brand+MPN for products).
Deliverables: Scope doc, RACI matrix, SLA table, mandatory-field matrix.
Step A2 — Model Canonical Entities & Relationships
- In Entity Explorer, create canonical types (e.g.,
Customer
,Supplier
,Product
,Address
,Contact
, etc.). - Define attributes, data types, and constraints; keep variant vs base split when relevant (e.g.,
Product
vsVariant
). - Model relationships (edges) you’ll need for survivorship and governance (e.g.,
Supplier -> Product
,Customer -> Address
).
Tip: Keep entity design stable; evolve via versioning, not breaking changes.
Step A3 — Reference Data Management (RDM)
- Load canonical sets (ISO country, currency, UoM, tax codes, product categories).
- Build mappings from source codes → canonical codes.
- Version and Publish approved sets.
- Add rules for Find Closest Match where semantics matter (e.g., category alignment).
Goal: 100% code mapping coverage to avoid downstream fragmentation.
Step A4 — Data Quality Rules (Normalize + Validate)
Implement Data Part Rules to standardize before any matching:
- Names/text: trim, case, punctuation, diacritics
- Emails: regex; Phones: E.164; Addresses: parsing + country ISO-2
- Identifiers: checksums/structure (VAT/DUNS/GTIN/IBAN)
- RDM canonicalization (currency, UoM, tax code)
- Tags:
Standardized
,InvalidEmail
,InvalidPhone
,UnknownCountry
, etc.
Outcome: Reliable inputs for deterministic matching.
Step A5 — Matching & Survivorship (Golden Records)
- Configure blocking & match keys per domain (e.g., Supplier uses TaxID/DUNS → high confidence).
- Set thresholds: auto-merge (≥0.97), steward review (0.90–0.97).
- Define Golden Record Rules (attribute precedence + tie-breakers).
- Add actions: completeness scoring, verification flags, tags.
Safety: Prove Unmerge and Undo on a sandbox dataset before live.
Step A6 — Stewardship & Governance
- Configure work queues (invalids, duplicates, missing mandatory fields).
- Enable History and train stewards on Undo and Unmerge.
- Set Permissions (roles, least privilege; sensitive fields restricted).
Step A7 — Exports & Downstream Interfaces
- Define export views for consuming systems (CRM, ERP, PIM, DW).
- Start read-only, validate mappings, then move to authoritative sync.
- Keep lineage columns (source IDs, timestamps, quality flags).
PART B — Power Platform Intake: Forms, Early Validation, Workflow
Two patterns are common. Pick the one that fits your org (or mix them):
Pattern 1 — Direct-to-CluedIn: Power Apps → Flow → CluedIn API (no Dataverse).
Pattern 2 — Dataverse Staging: Power Apps writes to a Dataverse table; Flow orchestrates validation + CluedIn upsert.
Step B1 — Security & Connectors
In CluedIn
- Generate a least-privilege API token (Administration → API Tokens).
- Scope it to create/update the specific domain(s) and read for dedupe preview.
- Store the token securely (Key Vault/Power Platform environment variable).
In Power Platform
- Create a Custom Connector (or use HTTP actions):
- Base URL: your CluedIn API endpoint.
- Auth: Bearer token header (or OAuth if your environment supports it).
- Define operations you’ll call often (examples):
POST /entities/{type}/validate
(preflight validation)POST /entities/{type}/staging
(stage request)POST /matching/preview
(dedupe preview)POST /workflows/{name}/trigger
(optional)GET /entities/{type}/{id}
(read back)
(Endpoint names vary by tenant/version; map to your API catalogue.)
Tip: Store the token in a Power Platform connection reference or environment variable; never hardcode in flows.
Step B2 — Power Apps: Build the Request Forms
- App Type: Canvas app (start with “New Customer/Supplier/Product” form).
- Data Model: Either write directly to CluedIn via custom connector or to a Dataverse staging table (recommended for audit + drafts).
- Form Sections:
- Identity: legal name/trading name, TaxID/DUNS (B2B), GTIN/Brand/MPN (Product), key personal info (B2C policy).
- Contact/Address: email/phone, parsed address fields (Line1/City/Region/Postcode/Country).
- Reference Data: country/currency/UoM/category (driven by RDM).
- Attachments (optional): documents for KYC/Onboarding (ID proof, contracts).
- Notes & Justification for business approval.
- Client-side early checks (Power Apps)
- Regex for email/phone; country code in a picklist bound to canonical RDM.
- Simple duplicate hint: as the user types TaxID/Domain/GTIN, call a search endpoint; surface “Possible match found” links.
- Visualize completeness with a progress bar.
UX tip: Use normalisers (uppercasing codes, trimming whitespace) in the form before submit.
Step B3 — Power Automate: Early Validation Flow
Trigger: Power Apps (OnSelect) or Dataverse (OnCreate).
Flow steps (detailed):
- Initialize context (request id, domain, requester, environment).
- Compose payload (map form fields to CluedIn schema).
- HTTP — Validation: call CluedIn preflight validation (or rules evaluate endpoint).
- On fail → return errors to the app; highlight fields; set status =
ValidationFailed
.
- On fail → return errors to the app; highlight fields; set status =
- HTTP — Dedupe Preview: call CluedIn matching preview with high-confidence keys.
- If candidate matches exist (score ≥ review threshold), return list to the app for user review.
- Optionally, halt and require steward review before continuing.
- HTTP — Reference Matching (optional): use Find Closest Match for categories/payment terms etc. Keep original value for lineage.
- Branch:
- If No risk (passes threshold + no candidates) → Auto-route to Approval.
- If Medium risk (candidates exist) → Create a Steward Task in CluedIn; wait for decision.
- If High risk (strong dupes or policy flags) → Reject with guidance.
- Approval(s):
- Use Start and wait for an approval (Power Automate).
- Approvers: Data Owner + Domain Steward (parallel or sequential).
- Reminder/timeout policy aligned to SLA.
- Store decision & comments (Dataverse or in CluedIn notes).
- Commit:
- HTTP — Upsert to Staging in CluedIn with
RequestId
andRequestedBy
. - HTTP — Trigger Rules to normalize & compute completeness.
- HTTP — Run Matching (if not already) and Promote to Golden per survivorship.
- HTTP — Upsert to Staging in CluedIn with
- Notify & Return:
- Update Power Apps with result id, completeness score, and any steward actions.
- Send Teams/Email confirmation (optional).
- Error handling:
- Catch 4xx/5xx → log to Dataverse “IntegrationLog”, notify support channel, and surface user-friendly message.
Best practices:
- Keep idempotency: pass a
requestId
; replays don’t create duplicates. - Use parallel branches for enrichment that doesn’t block approval.
Step B4 — Power Automate: Update Requests (Change Management)
Repeat B3 for update scenarios:
- Lock sensitive fields (e.g., TaxID, DUNS) behind extra approval.
- Pre-check: compare proposed changes vs current Golden; highlight breaking changes (e.g., category re-map).
- Route to Data Owner for approval and Steward for quality checks.
- Commit via CluedIn upsert with change reason and source evidence.
Step B5 — Steward Collaboration & Exception Handling
- If dedupe preview returns ambiguous matches, the flow:
- Creates a CluedIn Steward Task referencing the request data.
- Waits for the steward to Approve/Reject/Request changes.
- Continues or exits based on steward outcome.
- For policy exceptions (e.g., missing documents), branch to a conditional approval that requires additional evidence uploaded in the app.
PART C — CluedIn Workflows Post-Commit
Step C1 — Rules & Matching Execution
- On upsert: run Data Part Rules, Find Closest Match (for lookups), then Matching with thresholds.
- If auto-merge: promote to Golden and tag
Verified
. - If steward review required: record sits in Match Review queue.
Step C2 — Golden Record Survivorship & Publishing
- Apply attribute precedence (ERP > Registry > CRM, etc.).
- Compute
CompletenessScore
,VerificationFlags
, and channel-specificReachability
. - Include lineage (SourceSystem, SourceRecordID) and request metadata for traceability.
- Publish to downstream exports (CRM/ERP/PIM/DW) on schedule or ev