Record Similarity
https://analytics.babelstreet.com/rest/v1/record-similarity
https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/record_similarity.curl
Record Similarity compares two lists of records and returns a similarity score for each pair of records, where each record is made up of fields of the following entity types: PERSON, ORGANIZATION, ADDRESS, DATE, or IDENTIFIER/TEXT. The records do not have to contain the same fields; only fields with the same field name are compared. If one record has three fields and the other has two fields, the missing field will be ignored and the other two fields compared.
Field Weighting
When determining match scores, some fields may be more important than others. For example, you may decide the name is more important than the address in determining if the two records match. To accomplish this, the fields can be weighted, where a field's weight represents the magnitude of its impact on the final match score. If no weights are provided, the weight is distributed equally among all fields. If field weights are used, they must be used for all fields.
Request
The request is a combination of json objects:
Fields: A required field whose value is the mapping information for each record. There must be a minimum of 1 field.
Properties: An optional field whose value specifies certain properties for the request.
Records: A required field that holds the "left" and "right" arrays of records to be compared. Each left record is compared to the associated right record.
A record is a collection of name, address, and/or date objects.
Field | Type | Description | Required |
---|---|---|---|
name | string | Field name defined by the user. | yes |
type |
| Fields must be one of these types. | yes |
score_if_null | number | A value between 0 and 1 to use as the score for an individual field in record matching when it is missing from a record. | no |
weight | number | Value indicating weighting of the field in similarity score calculation. | no If one field has a weight, all fields must have weights. |
Field | Type | Description | Required |
---|---|---|---|
threshold | number | Only results where the similarity score is greater than the threshold are returned. When set, only matching records (as defined by a similarity score above the threshold) will be returned. The default value is 0.0; all results are returned. | no |
includeExplainInfo | boolean | True if the response should contain the full details of how the records were matched. False if the response should not contain the detail section. | yes |
parameters | map | A map of string parameter names to string parameter values. | no |
parameterUniverse | string | The string name of a universe already specified in the | no Available in Server and all non-managed plugin instances |
You can only provide a list of parameters or the name of a parameter universe. If both are provided, an error will be returned. Neither are required.
The endpoint compares a left record to a right record. The left and right arrays do not have to contain the same fields, but only matching field names are compared.
Field | Type | Description | Required |
---|---|---|---|
left | array | One of the records to be compared. Must contain at least 1 entry, maximum of 10 entries. The number of entries must match the number of entries in the right array. | yes |
right | array | The other of record to be compared. Must contain at least 1 entry, maximum of 10 entries. The number of entries must match the number of entries in the left array. | yes |
The following fields define the values in the left and right array. Each array is made of a combination of name, address, and date fields where each field has a field name.
Field Type | Field | Type | Description | Required | |||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | field name | string | The name of the field. It must match a value defined in the fields array. | yes | |||||||||||||||||||||||||||||||||||||||||||||
text | string | The value of the field. | required for name fields | ||||||||||||||||||||||||||||||||||||||||||||||
language | string | Three-letter ISO 693-3 language code | no (but strongly recommended if source language is known) | ||||||||||||||||||||||||||||||||||||||||||||||
entityType | string | The type of name being matched. The most common ones are | no If not specified, the type | ||||||||||||||||||||||||||||||||||||||||||||||
languageOfOrigin | string | Three-letter ISO 693-3 language code | no | ||||||||||||||||||||||||||||||||||||||||||||||
script | string | Four-letter ISO-15924 script code | no | ||||||||||||||||||||||||||||||||||||||||||||||
Date | field name | string | The name of the field. It must match a value defined in the fields array. | yes | |||||||||||||||||||||||||||||||||||||||||||||
date | string | The value of the date field | required for date fields | ||||||||||||||||||||||||||||||||||||||||||||||
format | string | The format of entered dates. Follows these rules. | no | ||||||||||||||||||||||||||||||||||||||||||||||
Address[a] | field name | string | The name of the field. It must match a value defined in the fields array. | yes | |||||||||||||||||||||||||||||||||||||||||||||
address | string | The value of the address field | required for address fields | ||||||||||||||||||||||||||||||||||||||||||||||
[a] Fielded addresses are also supported as described in the Address Similarity endpoint |
{ "fields": { "primaryName": { "type": "rni_name", "weight": 0.0 }, "dob": { "type": "rni_date", "weight": 0.2 }, "addr": { "type": "rni_address", "weight": 0.5 } }, "properties": { "threshold": 0.7, "includeExplainInfo": true }, "records": { "left": [ { "primaryName": { "text": "Ethan R", "language": "eng", "entityType": "PERSON", "languageOfOrigin":"eng", "script": "Latn" }, "dob": { "date" : "1993-04-16" } }, { "primaryName": { "text": "Evan R" }, "dob": "1993-04-16" } ], "right": [ { "primaryName": { "text": "Seth R" }, "dob": "1993-04-16" }, { "primaryName": { "text": "Ivan R" }, "dob": "1993-04-16", "addr": "123 Roadlane Ave" } ] } }
Response
Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical. The score is a relative indication of how similar the records are; it is not an absolute value.
If threshold
was provided, only records which had a score higher than the threshold value are returned.
If includeExplainInfo
is true
, the response includes information about how the fields were scored.
"results": [ { "score": 0.87, "left": { "primaryName": { "text": "Ethan R", "language": "eng", "entityType": "PERSON", "languageOfOrigin":"eng", "script": "Latn" }, "dob": { "date" : "1993-04-16" }, "addr": { "address": "123 Roadlane Ave" } }, "right": { "primaryName": { "text": "Seth R" }, "dob": "1993-04-16" }, "explainInfo": { "scoredFields": { "primaryName": { "weight": 0.5, "calculatedWeight": 0.7142857142857143, "rawScore": 0.99, "finalScore": 0.85, "details": "any details" }, "dob": { "weight": 0.5, "calculatedWeight": 0.2857142857142857, "rawScore": 0.8, "finalScore": 0.74 } }, "leftOnlyFields" : ["addr"], "rightOnlyFields": [] }, "error": "string" } ], "errorMessage": "string"
Supported languages
GET /record-similarity/supported-languages
Retrieve the language pairs supported by the record-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.
Response
Field | Type | Description |
---|---|---|
transliterationScheme | string | |
script | string | Four-letter ISO-15924 script code |
language | string | ISO 639 language code |
licensed | boolean | Indicates if you are licensed for this language |
{ "supportedLanguagePairs": [ { "source": { "transliterationScheme": "string", "script": "string". "language": "string" }, { "target": { "transliterationScheme": "string", "script": "string". "language": "string" }, "licensed": true } ] }