Sync data products

On this page

In this article, you will learn how to sync Purview data products and data assets into CluedIn data sources. This feature works only if Azure Data Factory (ADF) automation is enabled and configured. For more information, see Azure Data Factory pipeline automation.

Preparation

To sync Purview data products and data assets into CluedIn data sources, complete 2 preparation steps:

  1. Prepare a data product in Purview – create a governance domain and a glossary term to act a filter for the data products you want to sync, create a data product and add data assets and a glossary term to it, publish the prepared resources, and finally assign the appropriate roles to Purview service principal in the governance domain.

  2. Configure settings in CluedIn – enable the sync data products feature in Purview settings and provide the glossary term to identify the data products for syncing.

Preparation in Purview

To create a governance domain

  1. In the Microsoft Purview portal, navigate to Unified Catalog > Catalog management > Governance domain.

  2. Select new governance domain.

    sync-data-products-new-governance-domain.png

  3. Enter the Name of the governance domain.

  4. Enter the Description of the governance domain.

  5. Select the Type of the governance domain.

    sync-data-products-new-governance-domain-creation.png

  6. Select Save.

The governance domain should have a dedicated glossary term that acts as a filter for the data products you want to sync.

To create a glossary term for a governance domain

  1. On the governance domain details page, in the glossary terms card, select View all.

  2. Select New term.

  3. Enter the Name and Description of the term.

    sync-data-products-new-glossary-term.png

  4. Select Create.

When the glossary term is created, note two methods for obtaining its identification:

  1. The name of the glossary term.

  2. The ID of the glossary term, which can be found in the URL.

    sync-data-products-glossary-term-identification.png

To create data products

  1. On the governance domain details page, in the data products card, select Go to data products.

  2. Select New data product.

  3. Enter the Name and Description of the data product.

  4. In Type, select Master data and reference data.

    sync-data-products-new-data-product.png

  5. Select Next.

  6. Enter the Use cases for the data product.

    sync-data-products-new-data-product-business-details.png

  7. In Next steps, select Add data assets.

  8. Select Done.

To add data assets to the data product

  1. On the data product details page, expand the Add data assets dropdown list, and then select Find and select.

  2. Find and select the data asset that you want to add to the data product.

    sync-data-products-find-and-select.png

  3. Select Add.

    As a result, the data assets are added to the data product.

    sync-data-products-data-assets.png

To add a glossary term to the data product

  1. On the data product details page, in the Glossary terms section, select Add.

  2. Find and select the glossary term that you want to add to the data product.

    sync-data-product-select-glossary-term.png

  3. Select Add.

    As a result, the glossary term is added to the data product.

    sync-data-products-details-page.png

To publish a governance domain

  • On the governance domain details page, select Publish.

    synd-data-products-publish.png

    After successful publishing, the status of the governance domain changes to Successful.

To publish a glossary term

  1. On the governance domain details page, in the glossary terms card, select View all.

  2. Select the glossary term that you created before.

  3. On the glossary term detail page, select Publish.

To publish a data product

  1. On the governance domain details page, in the data products card, select Go to data products.

  2. Select the data product that you created before.

  3. On the data product detail page, select Publish.

To assign roles to service principal

  1. On the governance domain details page, go to the Roles tab.

  2. Find the Data Catalog Reader role, and then select the icon next to the role name.

  3. Find and select the Purview service principal.

    sync-data-products-data-catalog-reader.png

  4. Select Save.

  5. Find the Data Product Owners role, and then select the icon next to the role name.

  6. Find and select the Purview service principal.

    sync-data-products-data-product-owners.png

  7. Select Save.

Preparation in CluedIn

  1. In CluedIn, go to Administration > Settings, and then scroll down to find the Purview section.

  2. Turn on the toggle in Sync Data Products DataSources.

  3. In Sync Data Products Term Pattern, enter the identification of the glossary term that is associated with the governance domain that you want to sync.

  4. If you want to automatically add the asset that has been already synced to CluedIn to the list of data assets of a specific data product, turn on the toggle in Append Asset to Data Product.

    sync-data-products.png

  5. Select Save.

    Once you save the changes, synchronization begins.

Feature overview

Once you enable synchronization of data products, you will receive a notification when the data product is synced.

sync-data-products-notification.png

Additionally, you will receive notifications about the execution of ADF automation pipelines, which create ingestion endpoints in CluedIn and ingest the data from data assets.

sync-data-products-adf-notification.png

How to check the ingested data in CluedIn?

As a result of pipeline run, a new data source group is created in CluedIn. The data source group corresponds to Purview data product, and the data sources within the group correspond to data assets within the data product.

sync-data-products-result.png

After you create the mapping and process the data set, it will be synced to Purview.

sync-data-products-notification-sync-to-purview.png

As a result, you can view a visual representation of an asset within the CluedIn processing pipeline.

sync-data-products-sync-to-purview-lineage.png

Since we enabled Append Asset to Data Product, a new asset is created in the data product to represent the entity type that was created in CluedIn.

sync-data-products-append-assets.png