Configure ADF pipeline with Copy data activity

On this page

This guide provides step-by-step instructions for configuring an Azure Data Factory (ADF) Copy data activity to send data to CluedIn using a private endpoint.

Use the Copy data activity if you do not need to make any data transformations before sending data to CluedIn. When data transformations such as aggregation, filtering, or applying complex business logic are required before sending data to CluedIn, use the Data flow activity instead.

Prerequisites 

  • Configure a private link service between ADF and CluedIn as described in Configure ADF with private link.

  • Make sure you have Azure Data Lake with some CSV data. 

  • Create an ingestion endpoint and authorization token in CluedIn as described in Add ingestion endpoint

Configuring an ADF pipeline with the Copy data activity consists of 4 steps:

  1. Creating a new pipeline

  2. Configuring the source

  3. Configuring the sink

  4. Debugging the pipeline

  5. Validating the result in CluedIn

Create a new pipeline

  1. On the Azure Data Factory home page, select New > Pipeline.

  2. In the Activities pane, expand the Move and transform category, and then drag the Copy data activity to the pipeline canvas.

    copy-data-activity.png

Configure source

  1. Go to the Source tab. Select + New to create a source dataset.

  2. In the New dataset pane, find and select Azure Data Lake Storage Gen2, and then select Continue.

  3. In the Select format pane, select DelimitedText, and then select Continue.

  4. In the Set properties pane, enter the name for the dataset. Then, expand the Linked service dropdown list and select + New.

  5. In the New linked service pane, provide the following details:

    • Name – enter the name for your linked service.

    • Account selection method – select From Azure subscription.

    • Azure Subscription – select the subscription of your Azure Data Lake.

    • Storage account name – select the name of your Azure Data Lake storage account.

  6. Test connection, and then select Create.

    After the linked service is created, you’ll be taken back to the Set properties pane.

  7. In the File path section, add the path to the appropriate folder/file of your Azure Data Lake.

  8. Select OK.

    configure-source.png

Configure sink

  1. Go to the Sink tab. Select + New to create a sink dataset.

  2. In the New dataset pane, find and select REST, and then select Continue.

  3. In the Set properties pane, enter the name for the dataset. Then, expand the Linked service dropdown list and select + New.

  4. In the New linked service pane, provide the following details:

    • Name – enter the name for your linked service.

    • Base URL – enter the URL of the ingestion endpoint in CluedIn. For more information, see Send data.

    • Authentication type – select Anonymous.

    • Auth headers – add a new header with the following details:

      • Name – enter Authorization.

      • Value – enter Bearer, add a space, and then paste the token from CluedIn. For more information, see Send data.

    configure-sink-new-linked-service.png

  5. Test connection, and then select Create.

    After the linked service is created, you’ll be taken back to the Set properties pane.

  6. Select OK.

  7. In the Request method field, select POST.

    configure-sink-request-method.png

Debug pipeline

Once the source and sink are configured, you can debug the pipeline to ensure it is working correctly.

To debug the pipeline

  1. On the toolbar, select Debug.

  2. Monitor the status of the pipeline run on the Output tab at the bottom of the window.

    debug-pipeline.png

Validate result in CluedIn

Once the ADF pipeline is triggered successfully, you should see the data flowing into CluedIn. You can view the incoming records on the Preview tab of the data set.

after-debug.png