Data Metrics will allow you to score and monitor the history of data quality over time. Data Metrics are aggregated on many different levels and you might find that you need to control how that aggregation is done at each level.
At the core of the Metrics are a way to calculate different types of scores given certain data values. This could be in the form of a percentage between 0 and 100, a simple integer or something more advanced. Most of the out of the box data metrics to do with Data Quality are in the form of a percentage between 0 and 100. To date, we support:
Accuracy Completeness Relevance Connectivity
To add new metrics, you will simply need to create a new C# class and inherit from Metric
There are other metrics, like Accuracy where we calculate down to the property history as we are wanting to measure how all permutations of a value correlate with each other. In some cases you might find that the parent levels of metrics are simply an average, mean or sum of the higher precision scores. In other cases you might find that you calculate a score differently at each level.
Here is an example of how the default Accuracy implementation is structured.
Accuracy
Measurement: For all properties, how much are the same or similar in the property history. Use simple normalization to cater for more values.
How to increase Accuracy : Run the generated mesh commands to change the values in the source systems, resulting in CluedIn showing that all source systems have the same value for a property.
CluedIn Clean: Manually Edit records or use the cleaning functions to normalise the values.
CluedIn Train: Questions in the form of “Is Martin’s best email msh@cluedin.com?”, Are these photos of the same person?
Example of calculation
Property level Property, integration type level Property inldategration level Entity integration type level Entity integration level Entity level Integration type level Integration level Global level
Property Level
Entity
Branch A (SharePoint, Integration 1)
Version 1
Property 1
Old Value
Property 3
Value
Version 2 (HEAD)
Property 2
My Value
Branch B (SharePoint, Integration 2)
Version 1 (HEAD)
Property 1
Lastest Value
Property 2
My Value
Branch C (LDAP, Integration 3)
Version 1 (HEAD)
Property 2
My Old Value
Property 3
Value
Branch D (LDAP, Integration 4)
Version 1 (HEAD)
Property 3
Value
Branch A Merged Record
Property 1
Old Value
Property 2
My Value
Property 3
Value
Branch B Merged Record
Property 1
Latest Value
Property 2
My Value
Branch C Merged Record
Property 2
My Old Value
Property 3
Value
Branch D Merged Record
Property 3
Value
Property 1
Values from each branch merged records:
“Old Value”, “Latest Value”, [Missing], [Missing]
Unique Values:
Value
Count
Count at Head
Old Value
1
0
Lastest Value
1
1
[Missing]
2
Number of branches: 4
Property 2
Values
“My Value”, “My Value”, “My Old Value”, [Missing]
Unique Values:
Value
Count
Count at Head
My Value
2
2
My Old Value
1
1
[Missing]
1
Property 3
Values
“Value”, [Missing], “Value”, “Value”
Unique Values:
Value
Count
Count at Head
Value
3
2
Formula:
[Max Unique Value Count] / ([Sum of value counts] + ([Missing values] / [Branch Count]))
max
Max Unique Value Count
populated
Sum of value counts / Count populated records
missing
Number of records with missing value
branches
Branch Count
Alternative
Min(1, ([Max Unique Value Count] + ([Count at head] / [Branch Count])) / ([Sum of value counts] + ([Missing values] / [Branch Count])) )
Property 1
1 / (2 + (2/4)) = 0,4
Min(1, (1 + (1 / 2)) / (2 + (2/4))) = 0,6
Property 2
2 / (3 + (1/4)) = 0,6154
Min(1, (2 + (2/4)) / (3 + (1/4))) = 0,7692
Property 3
3 / (3 + (1/4)) = 0,9231
Min(1, (3 + (2/4)) / (3 + (1/4))) = 1,00
Other Examples:
Branches: 10
Max
Populated
Missing
Result
1
10
0
0,1000
1
9
1
0,1099
1
8
2
0,1220
1
7
3
0,1370
1
6
4
0,1563
1
5
5
0,1818
1
4
6
0,2174
1
3
7
0,2703
1
2
8
0,3571
1
1
9
0,5263
2
10
0
0,2000
2
9
1
0,2198
2
8
2
0,2439
2
7
3
0,2740
2
6
4
0,3125
2
5
5
0,3636
2
4
6
0,4348
2
3
7
0,5405
2
2
8
0,7143
3
10
0
0,3000
3
9
1
0,3297
3
8
2
0,3659
3
7
3
0,4110
3
6
4
0,4688
3
5
5
0,5455
3
4
6
0,6522
3
3
7
0,8108
4
10
0
0,4000
4
9
1
0,4396
4
8
2
0,4878
4
7
3
0,5479
4
6
4
0,6250
4
5
5
0,7273
4
4
6
0,8696
5
10
0
0,5000
5
9
1
0,5495
5
8
2
0,6098
5
7
3
0,6849
5
6
4
0,7813
5
5
5
0,9091
6
10
0
0,6000
6
9
1
0,6593
6
8
2
0,7317
6
7
3
0,8219
6
6
4
0,9375
7
10
0
0,7000
7
9
1
0,7692
7
8
2
0,8537
7
7
3
0,9589
8
10
0
0,8000
8
9
1
0,8791
8
8
2
0,9756
9
10
0
0,9000
9
9
1
0,9890
10
10
0
1,0000
Property integration type level
Filter the version branches down to integration type
Branch A (SharePoint) Merged Record
Property 1
Old Value
Property 2
My Value
Property 3
Value
Branch B (SharePoint) Merged Record
Property 1
Latest Value
Property 2
My Value
Branch C (LDAP) Merged Record
Property 2
My Old Value
Property 3
Value
Branch D (LDAP) Merged Record
Property 3
Value
SharePoint Property 1
Values “Old Value”, “Latest Value”
Unique Values:
Value
Count
Count at Head
Old Value
1
0
Lastest Value
1
1
1 / (2 + (0/2)) = 0,5
Min(1, (1 + (1/2)) / (2 + (0/2)) ) = 0,75
SharePoint Property 2
Values “My Value”, “My Value”
Unique Values:
Value
Count
Count at Head
My Value
2
2
2 / (2 + (0/2)) = 1
Min(1, (2 + (2/2)) / (2 + (0/2)) ) = 1
SharePoint Property 3
Values “Value”, [Missing]
Unique Values:
Value
Count
Count at Head
Value
1
0
[Missing]
0
1 / (1 + (1/2)) = 0,6667
Min(1, (1 + (0/1)) / (1 + (1/2)) ) = 0,6667
Ldap Property 2
Values “My Old Value”, [Missing]
Unique Values:
Value
Count
Count at Head
My Old Value
1
1
[Missing]
1
1 / (1 + (1/2)) = 0,6667
Min(1, (1 + (0/1)) / (1 + (1/2)) ) = 0,6667
Ldap Property 3
Values “Value”, “Value”
Unique Values:
Value
Count
Count at Head
Value
2
2
2 / (2 + (0/2)) = 1
Min(1, (2 + (2/2)) / (2 + (0/2)) ) = 1
Property integration level
Evaluate all branch versions match the integration
Branch A (SharePoint, Integration 1)
Version 1
Property 1
Old Value
Property 3
Value
Version 2 (HEAD)
Property 2
My Value
Integration 1
Property 1
Values “Old Value”
Unique Values:
Value
Count
Count at Head
Old Value
1
0
[Missing]
1
1 / (1 + (1/2)) = 0,6667
Min(1, (1 + (0/2)) / (1 + (1/2)) ) = 0,6667
Property 2
Values [Missing], “My Value”
Unique Values:
Value
Count
Count at Head
My Value
1
1
[Missing]
1
1 / (1 + (1/2)) = 0,6667
(1 + (1/2)) / (1 + (1/2)) = 1
Property 3
Values “Value”, [Missing]
Unique Values:
Value
Count
Count at Head
Value
1
0
[Missing]
1
1 / (1 + (1/2)) = 0,6667
(1 + (0/2)) / (1 + (1/2)) = 0,6667
Integration 2
Property 1
Property 2
Integration 3
Property 2
Property 3
Integration 4
Property 3
Why do my records show “Not enough data to show trend”?
This is due to the fact that since the first metric for this record and for this specific metric (e.g. Completeness), the value of this score has not changed and hence CluedIn will not record this as a new value.
This is partially due to the fact that these metrics can take up quite a lot of storage when run over huge amounts of data.