Name Similarity
https://analytics.babelstreet.com/rest/v1/name-similarity
https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/name_similarity.curl
Name Similarity compares two entity names (Person, Location, or Organization) and returns a match score from 0 to 1. Although not required, we strongly recommend specifying the source language when known, for better accuracy. If you do not specify the source language, the name similarity algorithm will guess the language.
Important
Do not use the Analytics language identification endpoint to determine source language of names. Name similarity uses an algorithm specifically tuned to identify the language of names rather than general text.
Based on the text of the two names, Name Similarity infers values for any of the fields you do not specify, and uses these values when it calculates the match score. In cases where the language may be difficult to infer accurately, such as a Japanese or Korean name in Han script, you may improve the accuracy of the match score by specifying the field.
Name Similarity utilizes override tables and other tools specific to entityType
when matching. You can improve the accuracy of the match score by specify the entityType
to ensure the correct overrides are applied.
Name similarity uses many parameters to calculate the similarity score for two names. The default values of the parameters are tuned to perform well on most datasets. You can modify match parameters to optimize similarity scores for your data and business case by adding parameters to the request.
Request
Field | Description | Required |
---|---|---|
| One of the two names being matched, along with other identifying information, such as the language, entityType and script. | yes |
| The other of the two names being matched, along with other identifying information, such as the language, entityType and script. | yes |
| Values which can be modified to change scoring algorithms and rules. | no |
Field | Type | Description | Required |
---|---|---|---|
| string | Name to match | yes |
| string | Three-letter ISO 693-3 language code | no (but strongly recommended if source language is known) |
| string | The type of name being matched. The most common ones are | no If not specified, the type |
| string | Four-letter ISO-15924 script code | no |
| string | An explicitly defined gender for a name. Valid values are: | no |
Parameters
Individual name tokens are scored by a number of algorithms or rules. These algorithms can be manipulated by setting configuration parameters, changing the final Match similarity score. There are over 100 configuration parameters.
You can modify the value of one or more parameters used in a request by adding the parameters object to the call. Any non-static parameter can be changed.
Parameters are passed as a map of parameter name and parameter value:
{"parameters": {"parameterName": value} }
{ "name1": { "text": "Влади́мир Влади́мирович Пу́тин", "language": "rus", "entityType": "PERSON" }, "name2": { "text": "Vladimir Putin", "language": "eng", "entityType": "PERSON" }, "parameters": { "deletionScore": "0.2" } }
Response
Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical.
The score is a relative indication of how similar the names are; it is not an absolute value. When comparing different name combinations, the scores cannot always be directly compared. For example, similar comparisons in different languages may generate different scores.
{ "score": 0 }
Supported languages
GET /name-similarity/supported-languages
Retrieve the language pairs supported by the name-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.
Response
Field | Type | Description |
---|---|---|
transliterationScheme | string | |
script | string | Four-letter ISO-15924 script code |
language | string | ISO 639 language code |
licensed | boolean | Indicates if you are licensed for this language |
{ "supportedLanguagePairs": [ { "source": { "transliterationScheme": "string", "script": "string". "language": "string" }, { "target": { "transliterationScheme": "string", "script": "string". "language": "string" }, "licensed": true } ] }