Skip to main content

Babel Street Analytics API

Record Similarity

https://analytics.babelstreet.com/rest/v1/record-similarity 

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/record_similarity.curl

Record Similarity compares two lists of records and returns a similarity score for each pair of records, where each record is made up of fields of the following entity types: PERSON, ORGANIZATION, ADDRESS, DATE, or IDENTIFIER/TEXT. The records do not have to contain the same fields; only fields with the same field name are compared. If one record has three fields and the other has two fields, the missing field will be ignored and the other two fields compared.

Field Weighting 

When determining match scores, some fields may be more important than others. For example, you may decide the name is more important than the address in determining if the two records match. To accomplish this, the fields can be weighted, where a field's weight represents the magnitude of its impact on the final match score. If no weights are provided, the weight is distributed equally among all fields. If field weights are used, they must be used for all fields.

Request

The request is a combination of json objects:

  • Fields: A required field whose value is the mapping information for each record. There must be a minimum of 1 field.

  • Properties: An optional field whose value specifies certain properties for the request.

  • Records: A required field that holds the "left" and "right" arrays of records to be compared. Each left record is compared to the associated right record.

    https://netorg42267.sharepoint.com/:u:/r/sites/RosetteDocumentation/_layouts/15/Doc.aspx?sourcedoc=%7BC1CAFB11-1144-4F4E-B5FB-09260EDC8501%7D&file=SimilarRecord.vsdx&action=default&mobileredirect=true&DefaultItemOpen=1

A record is a collection of name, address, and/or date objects.

Table 5. Fields

Field

Type

Description

Required

name

string

Field name defined by the user.

yes

type

rni_name, rni_address, or rni_date

Fields must be one of these types.

yes

score_if_null

number

A value between 0 and 1 to use as the score for an individual field in record matching when it is missing from a record.

no

weight

number

Value indicating weighting of the field in similarity score calculation.

no

If one field has a weight, all fields must have weights.



Table 6. Properties

Field

Type

Description

Required

threshold

number

Only results where the similarity score is greater than the threshold are returned. When set, only matching records (as defined by a similarity score above the threshold) will be returned.

The default value is 0.0; all results are returned.

no

includeExplainInfo

boolean

True if the response should contain the full details of how the records were matched.

False if the response should not contain the detail section.

yes

parameters

map

A map of string parameter names to string parameter values.

no

parameterUniverse

string

The string name of a universe already specified in the parameter_profiles.yaml or internal_param_profiles.yaml file,

no

Available in Server and all non-managed plugin instances



You can only provide a list of parameters or the name of a parameter universe. If both are provided, an error will be returned. Neither are required.

The endpoint compares a left record to a right record. The left and right arrays do not have to contain the same fields, but only matching field names are compared.

Table 7. Records

Field

Type

Description

Required

left

array

One of the records to be compared.

Must contain at least 1 entry, maximum of 10 entries. The number of entries must match the number of entries in the right array.

yes

right

array

The other of record to be compared.

Must contain at least 1 entry, maximum of 10 entries. The number of entries must match the number of entries in the left array.

yes



The following fields define the values in the left and right array. Each array is made of a combination of name, address, and date fields where each field has a field name.

Table 8. Arrays

Field Type

Field

Type

Description

Required

Name

field name

string

The name of the field. It must match a value defined in the fields array.

yes

text

string

The value of the field.

required for name fields

language

string

Three-letter ISO 693-3 language code

no (but strongly recommended if source language is known)

entityType

string

The type of name being matched. The most common ones are PERSON (default), LOCATION, and ORGANIZATION. Additional types of identifiers can also be matched.

no

If not specified, the type PERSON will be used.

languageOfOrigin

string

Three-letter ISO 693-3 language code

no

script

string

Four-letter ISO-15924 script code

no

Date

field name

string

The name of the field. It must match a value defined in the fields array.

yes

date

string

The value of the date field

required for date fields

format

string

The format of entered dates. Follows these rules.

no

Address[a]

field name

string

The name of the field. It must match a value defined in the fields array.

yes

address

string

The value of the address field

required for address fields

[a] Fielded addresses are also supported as described in the Address Similarity endpoint



{
  "fields": {             
    "primaryName": {      
      "type": "rni_name", 
      "weight": 0.0      
    },
    "dob": {
      "type": "rni_date",
      "weight": 0.2
    },
    "addr": {
      "type": "rni_address",
      "weight": 0.5
    }
  },
  "properties": {                
    "threshold": 0.7,             
    "includeExplainInfo": true    
  },
  "records": {                   
    "left": [                    
      {
        "primaryName": {          
          "text": "Ethan R",         
          "language": "eng",        
          "entityType": "PERSON",    
          "languageOfOrigin":"eng",  
          "script": "Latn"           
        },
        "dob": {
          "date" : "1993-04-16"     
        }
      },
      {
        "primaryName": {
          "text": "Evan R"
        },
        "dob": "1993-04-16"
      }
    ],
    "right": [                  
      {
        "primaryName": {
          "text": "Seth R"
        },
        "dob": "1993-04-16"
      },
      {
        "primaryName": {
          "text": "Ivan R"
        },
        "dob": "1993-04-16",
        "addr": "123 Roadlane Ave"
      }
    ]
  }
}

Response

Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical. The score is a relative indication of how similar the records are; it is not an absolute value.

If threshold was provided, only records which had a score higher than the threshold value are returned.

If includeExplainInfo is true, the response includes information about how the fields were scored.

 "results": [
    {
      "score": 0.87,
      "left": {
        "primaryName": {              
          "text": "Ethan R",         
          "language": "eng",        
          "entityType": "PERSON",    
          "languageOfOrigin":"eng", 
          "script": "Latn"          
        },
        "dob": {
          "date" : "1993-04-16"
        },  
        "addr": {
          "address": "123 Roadlane Ave" 
        }
      },
      "right": {
        "primaryName": {
          "text": "Seth R"
        },
        "dob": "1993-04-16"
      },
      "explainInfo": {
        "scoredFields": {
          "primaryName": {
            "weight": 0.5,
            "calculatedWeight": 0.7142857142857143,
            "rawScore": 0.99,
            "finalScore": 0.85,
            "details": "any details"           
          },
          "dob": {
            "weight": 0.5,
            "calculatedWeight": 0.2857142857142857,
            "rawScore": 0.8,
            "finalScore": 0.74
          }
        },
        "leftOnlyFields" : ["addr"],
        "rightOnlyFields": []
      },
      "error": "string" 
    }
  ],
  "errorMessage": "string" 

Supported languages

GET /record-similarity/supported-languages

Retrieve the language pairs supported by the record-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.

Response

Field

Type

Description

transliterationScheme

string

script

string

Four-letter ISO-15924 script code

language

string

ISO 639 language code

licensed

boolean

Indicates if you are licensed for this language

{
  "supportedLanguagePairs": [
    {
      "source": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    {
      "target": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    "licensed": true
    }
  ]
}