Name Similarity

https://analytics.babelstreet.com/rest/v1/name-similarity

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/name_similarity.curl

Name Similarity compares two entity names (Person, Location, or Organization) and returns a match score from 0 to 1. Although not required, we strongly recommend specifying the source language when known, for better accuracy. If you do not specify the source language, the name similarity algorithm will guess the language.

Important

Do not use the Analytics language identification endpoint to determine source language of names. Name similarity uses an algorithm specifically tuned to identify the language of names rather than general text.

Based on the text of the two names, Name Similarity infers values for any of the fields you do not specify, and uses these values when it calculates the match score. In cases where the language may be difficult to infer accurately, such as a Japanese or Korean name in Han script, you may improve the accuracy of the match score by specifying the field.

Name Similarity utilizes override tables and other tools specific to entityType when matching. You can improve the accuracy of the match score by specify the entityType to ensure the correct overrides are applied.

Name similarity uses many parameters to calculate the similarity score for two names. The default values of the parameters are tuned to perform well on most datasets. You can modify match parameters to optimize similarity scores for your data and business case by adding parameters to the request.

Tip

Try it in the interactive documentation

Request

Field	Description	Required
`name1`	One of the two names being matched, along with other identifying information, such as the language, entityType and script.	yes
`name2`	The other of the two names being matched, along with other identifying information, such as the language, entityType and script.	yes
`parameters`	Values which can be modified to change scoring algorithms and rules.	no

Table 3. Name Properties

Field	Type	Description	Required
`text`	string	Name to match	yes
`language`	string	Three-letter ISO 693-3 language code	no (but strongly recommended if source language is known)
`entityType`	string	The type of name being matched. The most common ones are `PERSON` (default), `LOCATION`, and `ORGANIZATION`. Match also supports additional types of identifiers.	no If not specified, the type `PERSON` will be used.
`script`	string	Four-letter ISO-15924 script code	no
`gender`	string	An explicitly defined gender for a name. Valid values are: `MALE,FEMALE,NONBINARY`.	no

Parameters

Individual name tokens are scored by a number of algorithms or rules. These algorithms can be manipulated by setting configuration parameters, changing the final Match similarity score. There are over 100 configuration parameters.

You can modify the value of one or more parameters used in a request by adding the parameters object to the call. Any non-static parameter can be changed.

Parameters are passed as a map of parameter name and parameter value:

{"parameters": 
  {"parameterName": value}
}

{
  "name1": {
    "text": "Влади́мир Влади́мирович Пу́тин",
    "language": "rus",
    "entityType": "PERSON"
  },
  "name2": {
    "text": "Vladimir Putin",
    "language": "eng",
    "entityType": "PERSON"
  },
  "parameters": {
    "deletionScore": "0.2"
  }
}

Response

Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical.

The score is a relative indication of how similar the names are; it is not an absolute value. When comparing different name combinations, the scores cannot always be directly compared. For example, similar comparisons in different languages may generate different scores.

{
  "score": 0
}

Supported languages

GET /name-similarity/supported-languages

Retrieve the language pairs supported by the name-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.

Response

Field	Type	Description
transliterationScheme	string
script	string	Four-letter ISO-15924 script code
language	string	ISO 639 language code
licensed	boolean	Indicates if you are licensed for this language

{
  "supportedLanguagePairs": [
    {
      "source": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    {
      "target": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    "licensed": true
    }
  ]
}

Babel Street Analytics API

Name Similarity

Important

Tip

Request

Parameters

Response

Supported languages

Response

Search results