OneLake connector
This article outlines how to configure the OneLake connector to push data from CluedIn to Microsoft’s OneLake.
Prerequisites: Make sure you use a service principal to authenticate and access OneLake.
To configure OneLake connector
-
On the navigation pane, go to Consume > Export Targets. Then, select Add Export Target.
-
On the Choose Target tab, select OneLake Connector. Then, select Next.
-
On the Configure tab, enter the connection details:
-
Name – user-friendly name of the export target that will be displayed on the Export Target page in CluedIn.
-
WorkspaceName – name of the workspace where you want to store the data from CluedIn.
To find the workspace, sign in to Microsoft Fabric, and then select Workspaces from the left-hand menu. In the list of workspaces, find the needed workspace and select it.
-
ItemName – name of the data item within the workspace where you want to store the data from CluedIn.
-
ItemType – type of the data item within the workspace where you want to store the data from CluedIn (for example, Lakehouse).
-
ItemFolder – directory within a data item where you want to store the data from CluedIn (for example, Files/FirstLevel/SecondLevel).
-
ClientID – unique identifier assigned to the OneLake app when it was registered in the Microsoft identity platform. You can find this value in the Overview section of app registration.
-
ClientSecret – confidential string used by your OneLake app to authenticate itself to the Microsoft identity platform. You can find this value in the Certificates & secrets section of app registration.
-
TenantID – unique identifier for your Microsoft Entra tenant. You can find this value in the Overview section of app registration.
-
Enable Stream Cache (Sync mode only) – when stream cache is enabled, CluedIn caches the golden records at intervals, and then writes out accumulated data to one file (JSON, Parquet, or CSV). When stream cache is not enabled, CluedIn streams out golden records one by one, each in a separate file. Stream cache is available only for the synchronized stream mode.
-
Output Format – file format for the exported data. You can choose between JSON, Parquet, and CSV. However, Parquet and CSV formats are available only if you enabled stream cache. If stream cache is not enabled, you can only choose JSON.
-
Schedule – schedule for sending the data from CluedIn to the export target. You can choose between hourly, daily, and weekly intervals.
-
-
Test the connection to make sure it works, and then select Add.
Now, you can select the OneLake connector in a stream and start exporting golden records.