Clean datagetting started
Cleaning the data in CluedIn involves finding the data that needs to be cleaned, creating a clean project, and making the necessary changes to the data.
In this article, you will learn how to manually clean the data that you have ingested into CluedIn and streamed to a Microsoft SQL Server database.
If you come across errors in the vocabulary key values or discover records with missing data, you can perform manual data cleaning in CluedIn. This process allows you to ensure the accuracy and completeness of your data set. CluedIn will automatically identify the changes and update the stream with the cleaned records.
Before proceeding with the data cleaning process, ensure that you have completed the following tasks:
Ingested some data into CluedIn. For more information, see Ingest data.
Created a stream that keeps the data synchronized between CluedIn and the Microsoft SQL Server database. For more information, see Stream data.
As a first step, you need to find the data in CluedIn that needs to be cleaned.
To find data
In the search field, select the search icon. Then, select Filter.
In the Entity Types dropdown list, select the entity type that you want to use as a filter for all records.
As a result, all records with the selected entity type are displayed on the page. By default, the search results are shown in the following columns: Name, Entity Type, and Description.
To find the specific values that you want to fix, add the corresponding columns to the list of search results:
In the upper-right corner, select Column Options.
Select Add columns > Vocabulary.
In the search field, enter the name of the vocabulary and start the search. In the search results, select the needed vocabulary keys. Then, select Add Vocabulary Columns.
The columns are added to the search results page.
If you want a certain column to be the first one, you can move it by selecting the corresponding row and dragging it to the top position.
Turn on the advanced filter mode.
Configure a filter to display the records that need to be cleaned:
Select the property type (Vocabulary).
Select the vocabulary key.
Select the operation.
Select the value that needs to be fixed.
Note: The fields for configuring a filter appear one by one. After you complete the previous field, the next field appears.
The records that match the filter criteria are displayed in the search results.
Save the search. In the upper-right corner, select (Save current search). Then, enter the name of the search and select Save.
To find the saved search, select , and then select Saved Searches.
You have found the data that needs to be cleaned.
After you have found the data that needs to be cleaned, create a clean project.
To create a clean project
In the upper-right corner, select the ellipsis button, and then select Clean.
On the Create Clean Project pane, do the following:
On the Configure tab, enter the Clean Project Name. Then, in the lower-right corner, select Next.
On the Choose Vocabulary Keys pane, select the needed vocabulary keys. Then, in the lower-right corner, select Create.
You created the clean project.
In the upper-right corner, select Generate Results. Then, confirm that you want to generate results for the clean project.
You can view the generating results progress bar that appears instead of the Generate Results button. After the results have been generated, the status of the clean project changes to Ready for clean.
Now, you can proceed to modify the needed data records.
After you have generated the results of the clean project, make the needed changes to your data records.
To modify the data
In the upper-right corner of the clean project, select Clean.
A new tab opens. It contains the records that need to be modified.
Point to the value than needs to be modified, and then select Edit.
Enter the correct value.
Depending on whether you want to apply this change just to one record or to multiple similar records, do one of the following:
If you want to apply the change to one record, select Apply.
If you want to apply the change to multiple similar records, select Apply to All Identical Cells.
Go back to the tab with the clean project.
You can notice that the Ready to process label appeared under the clean project name.
In the upper-right corner, select Process.
In the confirmation dialog box, clear the Enable rules auto generation checkbox. Then, confirm that you want to process the data.
After the data has been processed, you will receive a notification. Also, the Processed label appears under the clean project name.
CluedIn automatically identifies the changes and updates the data set. Because the stream is in the synchronized mode, the database table is also automatically updated.
To verify that your changes have been applied:
In CluedIn: select the search button. Then, go to Saved Searches and select the search that we created in step 6 of Find data.
No search results are displayed on the page.
In the database: run the query to find the records that needed to be cleaned.
No search results are displayed.
You have cleaned your data.
Note: All changes to the data records in CluedIn are tracked. You can search for the needed data record and on the History pane, you can view all actions associated with the record.
You have performed manual data cleaning in CluedIn.