CluedIn for Administrators — practical handbook

0) Your First 48 Hours (Checklist)
1) Identity, SSO & Provisioning
2) Connecting Systems: Purview, Power Automate, Power Apps, AI
3) Turning Features On/Off
4) Access Control for Data
5) Managing API Tokens
1. 5.1 Creating Tokens
2. 5.2 Rotation & Hygiene
6) Reading Logs
1. 6.1 Types
2. 6.2 Practical Use
7) Reading Audit Logs
8) Controlling Access to Functionality (Roles & Users)
9) Operational Runbooks
10) Security & Compliance Quick Wins
Appendix A — SSO Attribute Mapping (examples)
Appendix B — SCIM Field Mapping (examples)
Appendix C — Power Automate Flow (HTTP) Sketch
Appendix D — Sample Policies (copy/paste)
Final Notes

Audience: Platform administrators, security engineers, data platform owners
Goal: Give administrators a clear, actionable view of how to operate CluedIn securely at scale—identity, access, integrations, features, observability, and governance.

This handbook focuses on how to run CluedIn day-to-day and what good looks like. It includes checklists, templates, and examples you can adapt to your environment.

0) Your First 48 Hours (Checklist)

Identity & Access

Configure SSO (OIDC or SAML 2.0).
(Optional) Enable SCIM user/group provisioning.
Map IdP groups → CluedIn roles (least privilege).
Enforce SSO-only sign-in and MFA at your IdP.

Access Control & Security

Review built-in roles and create custom roles as needed.
Define data access policies using classifications/labels (PII, Restricted).
Set session timeout and token lifetimes.
Create API tokens with minimal scopes; store secrets centrally.

Features & Integrations

Turn on only the features you need for day‑1; keep others off.
Connect Microsoft Purview (catalog/lineage) via scanning or push.
Connect Power Automate/Power Apps to CluedIn APIs or webhooks.
(Optional) Configure AI integrations/AI Agents with guardrails.

Observability

Decide your log retention strategy; export logs to your SIEM.
Enable audit log export and alerts on high‑risk events.
Create an Admin dashboard: pipeline health, ingestion/export SLAs.
Document a runbook and an incident response checklist.

1) Identity, SSO & Provisioning

CluedIn supports enterprise SSO via OIDC or SAML 2.0 and optional SCIM for provisioning.

1.1 OIDC (OpenID Connect) Setup

In your IdP (e.g., Microsoft Entra ID / Okta)

Register a Web application.
Set the redirect URI to your CluedIn callback (from CluedIn SSO settings).
Issue a Client ID and Client Secret.
Add scopes/claims you need mapped to roles and groups (e.g., email, name, groups).
Assign users/groups to the app.

In CluedIn → Admin → Authentication

{
  "type": "oidc",
  "issuer_url": "https://login.microsoftonline.com/<tenant>/v2.0",
  "client_id": "<CLIENT_ID>",
  "client_secret": "<CLIENT_SECRET>",
  "redirect_uri": "https://<your-cluedin-host>/auth/oidc/callback",
  "scopes": ["openid","profile","email","groups"],
  "group_claim": "groups",
  "enforce_sso_only": true
}

Tip: Prefer OIDC where possible—simpler operations, modern token formats, and easy group claims.

1.2 SAML 2.0 Setup

In your IdP

Create a SAML application.
Upload the SP metadata (from CluedIn) or configure the ACS (Assertion Consumer Service) URL manually.
Map attributes: email, firstName, lastName, and a group attribute (e.g., memberOf).
Download IdP metadata (XML).

In CluedIn → Admin → Authentication

<!-- Paste IdP Metadata -->
<EntityDescriptor entityID="https://idp.example.com/metadata">
  <!-- ... -->
</EntityDescriptor>

{
  "type": "saml2",
  "acs_url": "https://<your-cluedin-host>/auth/saml/acs",
  "entity_id": "https://<your-cluedin-host>/auth/saml/metadata",
  "email_attribute": "email",
  "groups_attribute": "memberOf",
  "enforce_sso_only": true
}

1.3 Group-to-Role Mapping

Map IdP groups to CluedIn roles. Keep mappings in code/config for repeatability.

# cluedin-role-mapping.yaml
mappings:
  - idp_group: "cluedin-admins"
    roles: ["Administrator"]
  - idp_group: "cluedin-data-engineers"
    roles: ["Data Engineer"]
  - idp_group: "cluedin-stewards"
    roles: ["Data Steward"]
  - idp_group: "cluedin-viewers"
    roles: ["Viewer"]
defaults:
  # Users with SSO but no mapped group get these roles (or none)
  roles: ["Viewer"]

Least privilege: give broad read where required, but restrict write/config permissions to a small, accountable set.

1.4 SCIM Provisioning (Optional)

Enable SCIM to automate user/group lifecycle:

Create/Update/Deactivate users in CluedIn based on IdP.
Sync group membership so role mappings stay current.

IdP → SCIM config (example)

{
  "scim_base_url": "https://<your-cluedin-host>/scim/v2",
  "bearer_token": "<SCIM_PROVISIONING_TOKEN>",
  "sync_interval_minutes": 15
}

Operational tips

Treat SCIM like code: change via PR, audit regularly.
Test deprovisioning; confirm tokens and sessions are revoked.

2) Connecting Systems: Purview, Power Automate, Power Apps, AI

2.1 Microsoft Purview (Catalog & Lineage)

You have two main patterns—pick one (or both):

1) Scan exports that CluedIn writes to your lake/warehouse.

Register the storage/database in Purview.
Schedule scans so Purview catalogs the tables/files produced by CluedIn.
Pro: simple, no custom code. Con: lineage may be coarse.

2) Push lineage/metadata into Purview (Atlas-compatible APIs).

Use a job to publish processes/entities/lineage after your exports run.
Pro: explicit lineage from source → CluedIn → export. Con: you own jobs.

Minimal lineage push (pseudo)

POST https://<purview-account>/api/atlas/v2/lineage
{
  "process": {
    "typeName": "cluedin_export",
    "attributes": {
      "name": "warehouse-contacts-v1",
      "qualifiedName": "cluedin.export.warehouse-contacts-v1"
    }
  },
  "inputs": [{"qualifiedName": "cluedin.entity.Person"}],
  "outputs": [{"qualifiedName": "sql.mdm.contacts_v1"}]
}

2.2 Power Automate (Flows) & Power Apps

Pattern A — Call CluedIn APIs from a Flow

Use HTTP action with OAuth2 or PAT (preferred: OAuth client).
Trigger on events (e.g., a new record in Dataverse) → call CluedIn ingestion endpoint or AI Agent.

POST https://<your-cluedin-host>/api/ingest/crm-contacts
Authorization: Bearer <ACCESS_TOKEN>
Content-Type: application/json

{"id":"c_123","email":"a@example.com","updated_at":"2025-08-22T12:00:00Z"}

Pattern B — Webhooks from CluedIn → Flow

Register a webhook in CluedIn (on export success, DQ alert, dedup queue event).
The Flow receives the payload and notifies Teams/updates a ticket/starts approvals.

Pattern C — Power Apps UI + CluedIn APIs

Build a simple app for data stewarding (approve merges, fix invalids).
Use a service principal for API access and enforce role checks server-side.

2.3 AI Integrations

Configure your AI provider (e.g., Azure OpenAI) in CluedIn AI settings.
Scope what entities and fields AI Agents can read/write.
Restrict PII access, enable redaction/masking for prompts and logs.
Start with read-only analysis Agents; expand to suggestion/auto-fix flows after review.

3) Turning Features On/Off

Navigate to Admin → Features (or Workspace Settings → Features).

Best practices

Keep non-essential features off until you need them.
Maintain separate dev/test/prod workspaces.
Use change windows and a rollback plan for feature toggles.
Document compatibility (some features require certain roles/integrations).

Example (pseudo)

{
  "features": {
    "ai_agents": true,
    "dedup_projects": true,
    "experimental_mappers": false,
    "webhooks": true,
    "custom_roles": true
  }
}

4) Access Control for Data

CluedIn uses RBAC (roles → permissions) and can layer policy‑based controls for row/column access (ABAC-style).

4.1 Levels of Control

Workspace/project level: who can configure ingestion, mapping, cleaning, exports.
Entity/Dataset level: who can read/write specific entities or exports.
Field/Column level: mask, hash, or hide selected attributes (e.g., PII).
Row level: filter rows by attributes (region, tenant, ownership).

4.2 Example Policies

Column masking

policy: mask_pii_email
target: entity:Person.field:email
actions: [read]
effect: allow_with_mask
mask: "partial_email"  # e.g., a***@example.com
conditions:
  - role_in: ["Viewer","Analyst"]

Row-level filter

policy: restrict_region
target: entity:Order
actions: [read]
effect: allow_when
when: "record.region in user.allowed_regions"
applies_to:
  - roles: ["Analyst"]
  - groups: ["finance-emea"]

Deny write to exports for non-owners

policy: export_write_guardrail
target: export:mdm.contacts_v1
actions: [write,configure]
effect: deny
unless:
  - role_in: ["Administrator","Data Engineer"]

Order of evaluation: explicit deny should override broad allows; test with staging identities.

4.3 Classifications & Tags

Apply labels (PII, Restricted, Confidential, Public) at the entity/field level and drive policies from labels:

Auto-mask PII for non‑steward roles.
Require approval for exports that include Restricted fields.

5) Managing API Tokens

CluedIn supports Personal Access Tokens (PATs) and/or OAuth clients for service‑to‑service access.

5.1 Creating Tokens

Go to Admin → API Tokens.
Create a token with the minimal scopes and a short expiry.
Tag tokens by purpose (power-automate-flow-42).

Token example (pseudo)

{
  "name": "power-automate-contact-sync",
  "scopes": ["ingest:write","export:read"],
  "expires_at": "2025-12-31T23:59:59Z"
}

Usage

curl -H "Authorization: Bearer <TOKEN>" https://<host>/api/exports/status

5.2 Rotation & Hygiene

Rotate on a 90‑day cadence (or faster).
Revoke tokens immediately on user departure (SCIM + audit).
Keep secrets in a vault; never in code or chat.
Log who/what uses tokens (user agent, IP), alert on anomalies.

6) Reading Logs

CluedIn exposes logs across categories; forward them to your SIEM/Observability stack for long‑term analytics.

6.1 Types

Ingestion logs: HTTP status, schema/parse issues, dead‑letter entries.
Mapping & cleaning logs: transformation steps, validation failures.
Export logs: schedule triggers, job durations, row counts, schema diffs.
System logs: auth, feature toggles, config changes.

Example log (pseudo)

{
  "ts": "2025-08-23T10:15:08Z",
  "category": "export",
  "export": "warehouse-contacts-v1",
  "status": "success",
  "duration_ms": 4213,
  "records_out": 15234,
  "correlation_id": "a6c9-...-4f",
  "actor": "system@scheduler"
}

6.2 Practical Use

Always capture correlation_id from UI/API; thread it through pipelines.
Build alerts for error rate spikes and duration regressions.
Keep log levels sane in prod; switch to debug only during incidents.

7) Reading Audit Logs

Audit logs track who did what, when, and from where—critical for compliance.

Common events

SSO sign‑ins, failed logins.
Role/permission changes; token creation/revocation.
Feature toggles; workspace settings changes.
Policy updates; data export configuration changes.
Bulk actions (dedup merges, cleaning jobs with write effects).

Sample audit record (pseudo)

{
  "ts": "2025-08-23T11:02:44Z",
  "actor": {"type":"user","id":"tiw@cluedin.com"},
  "action": "role.update",
  "target": {"type":"role","id":"Data Steward"},
  "old": {"permissions":["entity.read","dq.view"]},
  "new": {"permissions":["entity.read","dq.view","dedup.review"]},
  "ip": "203.0.113.5"
}

Best practices

Export audit logs daily; retain for 1–7 years per policy.
Monitor high‑risk actions (token create, role grant, export schema change).
Automate tickets/approvals for sensitive actions.

8) Controlling Access to Functionality (Roles & Users)

Start with built‑in roles and add custom roles only when needed.

8.1 Example Permission Matrix (excerpt)

| Capability | Viewer | Data Steward | Data Engineer | Administrator | |—|—:|—:|—:|—:| | Read entities/exports | ✅ | ✅ | ✅ | ✅ | | Approve dedup merges | ❌ | ✅ | ✅ | ✅ | | Edit cleaning projects | ❌ | ✅ (limited) | ✅ | ✅ | | Configure ingestion/export | ❌ | ❌ | ✅ | ✅ | | Manage roles & policies | ❌ | ❌ | ❌ | ✅ | | Manage feature toggles | ❌ | ❌ | ❌ | ✅ | | Create API tokens | ❌ | ❌ | ✅ (scoped) | ✅ |

Use scoped custom roles to carve out precise abilities (e.g., “Export Maintainer” can edit exports in project sales, but nowhere else).

8.2 Custom Role Definition (template)

role: "Export Maintainer"
description: "Manage exports for Sales project only"
permissions:
  - export.read:project:sales
  - export.write:project:sales
  - policy.read
constraints:
  - deny: ["feature.toggle","role.manage"]

8.3 Change Management

Approvals for role grants beyond Viewer/Steward.
Time‑boxed elevated roles (auto‑expire admin for break‑glass).
Quarterly access reviews with audit evidence.

9) Operational Runbooks

Daily

Check pipeline health (last run status, latency, volumes).
Review alerts (ingestion failures, DQ thresholds).
Triage dedup review queues (if enabled).

Weekly

Review audit log highlights and token usage.
Patch/rotate secrets and service principals as needed.
Validate Purview lineage completeness for top datasets.

Monthly

Access reviews; role/permission drift check.
Capacity planning (storage/compute) and cost review.
Disaster recovery tabletop: restore from backup or re‑build exports.

Incident (example)

Identify: Error rate spike on export, correlation_id X.
Contain: Pause affected schedules; toggle feature if implicated.
Diagnose: Compare config (git/PR), check last mapping/cleaning changes.
Remediate: Rollback mapping or re-run cleaning; backfill export.
Review: Post‑incident notes; add alert/test; update runbook.

10) Security & Compliance Quick Wins

Enforce SSO-only and MFA at IdP.
Use least-privilege roles; prefer group-based access.
Mask PII by default; require approvals to export PII.
Short-lived tokens; rotate frequently; monitor token use.
Immutable audit logs with long retention.
Secrets in a vault, never in pipelines or notebooks.
Data residency and encryption: document and validate with your infra team.

Appendix A — SSO Attribute Mapping (examples)

attributes:
  email: "user.email || user.userprincipalname"
  first_name: "user.given_name"
  last_name: "user.family_name"
  groups: "user.groups"  # or saml:memberOf

Appendix B — SCIM Field Mapping (examples)

user:
  id: "id"
  userName: "mail"
  active: "accountEnabled"
  name.givenName: "givenName"
  name.familyName: "surname"
  emails[0].value: "mail"
group:
  displayName: "displayName"
  members[].value: "members[].id"

Appendix C — Power Automate Flow (HTTP) Sketch

trigger: "When a row is added in Dataverse"
steps:
  - name: Compose payload
    action: compose
    inputs:
      id: "@{triggerBody()?['contactid']}"
      email: "@{triggerBody()?['emailaddress1']}"
      updated_at: "@{utcNow()}"
  - name: POST to CluedIn
    action: http
    inputs:
      method: POST
      uri: "https://<host>/api/ingest/crm-contacts"
      headers:
        Authorization: "Bearer @{parameters('CLUE_TOKEN')}"
        Content-Type: "application/json"
      body: "@{outputs('Compose payload')}"

Appendix D — Sample Policies (copy/paste)

D1. Require approval for exports with PII

policy: export_pii_guard
target: export:*   # any export
actions: [promote]
effect: require_approval
when: "export.contains_label('PII')"
approvers: ["Data Protection Officer","Administrator"]

D2. Deny field read of government_id to all but Stewards/Admins

policy: hide_government_id
target: entity:Person.field:government_id
actions: [read]
effect: deny
unless:
  - role_in: ["Data Steward","Administrator"]

D3. Limit AI access to masked views

policy: ai_masking
target: ai:agents
actions: [read]
effect: allow_when
when: "agent.mode == 'analysis' and dataset.view == 'masked'"

Final Notes

Keep configuration as code (YAML/JSON in a repo) where possible to enable PR reviews, versioning, and quick rollbacks.
Separate people permissions (roles) from data policies (labels/rules); you’ll evolve both independently.
Start with simple integrations and tighten controls as you scale.

You’ve now got the admin view: set up identity, connect the Microsoft ecosystem, control features and data access, run with observability, and govern with audit + policy. Clone the templates, fill in your env details, and you’re production‑ready.