Skip to main content

Babel Street Analytics API

Name Similarity

https://analytics.babelstreet.com/rest/v1/name-similarity

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/name_similarity.curl

Name Similarity compares two entity names (Person, Location, or Organization) and returns a match score from 0 to 1. Although not required, we strongly recommend specifying the source language when known, for better accuracy. If you do not specify the source language, the name similarity algorithm will guess the language.

Important

Do not use the Analytics language identification endpoint to determine source language of names. Name similarity uses an algorithm specifically tuned to identify the language of names rather than general text.

Based on the text of the two names, Name Similarity infers values for any of the fields you do not specify, and uses these values when it calculates the match score. In cases where the language may be difficult to infer accurately, such as a Japanese or Korean name in Han script, you may improve the accuracy of the match score by specifying the field.

Name Similarity utilizes override tables and other tools specific to entityType when matching. You can improve the accuracy of the match score by specify the entityType to ensure the correct overrides are applied.

Name similarity uses many parameters to calculate the similarity score for two names. The default values of the parameters are tuned to perform well on most datasets. You can modify match parameters to optimize similarity scores for your data and business case by adding parameters to the request.

Request

Field

Description

Required

name1

One of the two names being matched, along with other identifying information, such as the language, entityType and script.

yes

name2

The other of the two names being matched, along with other identifying information, such as the language, entityType and script.

yes

parameters

Values which can be modified to change scoring algorithms and rules.

no

Table 3. Name Properties

Field

Type

Description

Required

text

string

Name to match

yes

language

string

Three-letter ISO 693-3 language code

no (but strongly recommended if source language is known)

entityType

string

The type of name being matched. The most common ones are PERSON (default), LOCATION, and ORGANIZATION. Match also supports additional types of identifiers.

no

If not specified, the type PERSON will be used.

script

string

Four-letter ISO-15924 script code

no

gender

string

An explicitly defined gender for a name. Valid values are: MALE,FEMALE,NONBINARY.

no



Parameters

Individual name tokens are scored by a number of algorithms or rules. These algorithms can be manipulated by setting configuration parameters, changing the final Match similarity score. There are over 100 configuration parameters.

You can modify the value of one or more parameters used in a request by adding the parameters object to the call. Any non-static parameter can be changed.

Parameters are passed as a map of parameter name and parameter value:

{"parameters": 
  {"parameterName": value}
} 
{
  "name1": {
    "text": "Влади́мир Влади́мирович Пу́тин",
    "language": "rus",
    "entityType": "PERSON"
  },
  "name2": {
    "text": "Vladimir Putin",
    "language": "eng",
    "entityType": "PERSON"
  },
  "parameters": {
    "deletionScore": "0.2"
  }
}

Response

Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical.

The score is a relative indication of how similar the names are; it is not an absolute value. When comparing different name combinations, the scores cannot always be directly compared. For example, similar comparisons in different languages may generate different scores.

{
  "score": 0
}

Supported languages

GET /name-similarity/supported-languages

Retrieve the language pairs supported by the name-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.

Response

Field

Type

Description

transliterationScheme

string

script

string

Four-letter ISO-15924 script code

language

string

ISO 639 language code

licensed

boolean

Indicates if you are licensed for this language

{
  "supportedLanguagePairs": [
    {
      "source": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    {
      "target": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    "licensed": true
    }
  ]
}