Azure Data Lake connector
This article outlines how to configure the Azure Data Lake connector to publish data from CluedIn to Azure Data Lake Storage Gen2.
Prerequisites: Make sure you use a service principal to authenticate and access Azure Data Lake.
To configure Azure Data Lake connector
-
On the navigation pane, go to Consume > Export Targets. Then, select Add Export Target.
-
On the Choose Target tab, select Azure Data Lake Connector. Then, select Next.
-
On the Configure tab, enter the connection details:
-
Name – user-friendly name of the export target that will be displayed on the Export Target page in CluedIn.
-
AccountName – name of the Azure Data Lake storage account where you want to store the data from CluedIn.
-
AccountKey – access key for authenticating requests to the Azure Data Lake storage account.
-
FileSystemName – name of a container in Azure Data Lake.
-
DirectoryName – name of a directory inside the container in Azure Data Lake.
-
Enable Stream Cache (Sync mode only) – when stream cache is enabled, CluedIn caches the golden records at intervals, and then writes out accumulated data to one file (JSON, Parquet, or CSV). When stream cache is not enabled, CluedIn streams out golden records one by one, each in a separate file. Stream cache is available only for the synchronized stream mode.
-
Output Format – file format for the exported data. You can choose between JSON, Parquet, and CSV. However, Parquet and CSV formats are available only if you enabled stream cache. If stream cache is not enabled, you can only choose JSON.
-
Schedule – schedule for sending the data from CluedIn to the export target. You can choose between hourly, daily, and weekly intervals.
-
-
Test the connection to make sure it works, and then select Add.
Now, you can select the Azure Data Lake connector in a stream and start exporting golden records.