Clean data

On this page

  1. Find data
  2. Create a clean project
  3. Modify data in the clean application
  4. Results & next steps

Cleaning the data in CluedIn involves finding the data that needs to be cleaned, creating a clean project, and modifying the data in the clean application.

In this guide, you will learn how to manually clean the data that you have ingested into CluedIn.

Before you start: Make sure you have completed all steps in the Ingest data guide.

Context: This guide focuses on resolving a specific issue—a misspelled job title (Acountant). Here, you’ll find step-by-step instructions on how to correct this error.

If you come across errors in the record properties or discover records with missing data, you can perform manual data cleaning in CluedIn. This process allows you to ensure the accuracy and completeness of your data set. CluedIn will automatically identify the changes and update the stream with the cleaned records.

Useful links: Search, Filters

Find data

Finding the data that needs to be cleaned involves defining search filters and specific properties with incorrect values.

To find data

  1. In the search field, select the search icon. Then, select Filter.

  2. In the Entity Types dropdown list, select the entity type to filter the records.

    find-data-1.png

    As a result, all records with the selected entity type are displayed on the page. By default, the search results are shown in the following columns: Name, Entity Type, and Description.

  3. To find the specific values that you want to fix, add the corresponding column to the list of search results:

    1. In the upper-right corner, select Column Options.

    2. Select Add columns > Vocabulary.

    3. In the search field, enter the name of the vocabulary and start the search. In the search results, select the needed vocabulary key.

      find-data-2.png

    4. Select Add Vocabulary Columns.

  4. Turn on the advanced filter mode.

    find-data-3.png

  5. Add a filter rule to display the records containing values that need to be cleaned.

    find-data-4.png

    The fields for configuring a filter rule appear one by one. After you complete the previous field, the next field appears.

  6. Select Search. The records that match the filter criteria are displayed on the search results page. After finding the records, save the search. This way you can quickly verify if the values have been cleaned.

Create a clean project

After you have found the data that needs to be cleaned, create a clean project.

To create a clean project

  1. In the upper-right corner of the search results page, select the ellipsis button, and then select Clean.

    create-a-clean-project-1.png

  2. Enter the Project Name and then select Next.

  3. Select the checkboxes next to the properties that do not require fixing, and then select Remove Property.

    create-a-clean-project-2.png

  4. Select Create. The clean project is created.

    create-a-clean-project-3.png

  5. Select Generate Results and then confirm your choice. When the results are generated, the status of the clean project changes to Ready for clean. Now, you can proceed to correct the misspelled values.

Modify data in the clean application

  1. In the upper-right corner of the clean project, select Clean.

    The clean application opens, where you can view the values that should be modified.

    modify-data-1.png

  2. Point to the value than needs to be modified, and then select Edit.

  3. Enter the correct value, and then select Apply to all identical cells.

    modify-data-2.png

  4. In the upper-right corner, select Process. In the confirmation dialog box, select Skip stale data and clear the Enable rules auto generation checkbox. Then, confirm that you want to process the data.

    modify-data-3.png

    CluedIn automatically identifies the changes and updates the records. To verify that your changes have been applied, retrieve the saved search.

All changes to the records in CluedIn are tracked. You can search for the needed record and on the History pane, you can view all actions associated with the record. For more information, see History.

Results & next steps

After you manually cleaned the data, the misspelled values were corrected. By following the steps outlined in this guide, you can address various errors and inconsistencies in your data.

The next item on the list of common data management tasks is deduplication. Learn how to identify and merge duplicates in the Deduplicate data guide.