Address Similarity
https://analytics.babelstreet.com/rest/v1/address-similarity
https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/address_similarity.curl
Address Similarity compares two addresses and returns a match score between 0 and 1 reflecting the similarity of the addresses. Addresses are defined as a set of address fields. The endpoint compares the fields in address1
to the fields in address2
, matching each specified field and uses these values to calculate the match score. Addresses being matched do not have to contain all the same fields.
The matching algorithm is optimized based on the field type. Named entity fields, such as street address, city, and state are matched using a linguistic, statistically-based system that handles address variations. Numeric and alphanumeric fields, such as house number, postal code, and unit, are matched using numeric-based methods.
Advanced field matching support includes cross-field matching, for example, matching the value in the city field to the value in the state field, and field overrides, for example, matching England and UK.
Request
Addresses can be defined either as a set of address fields or as a single string. When defined as a string, the jpostal library is used to parse the address string into address fields.
Field | Description | Required |
---|---|---|
| One of the two addresses being matched. | yes |
| The other of the two addresses being matched. | yes |
| Values which can be modified to change scoring algorithms and rules. | no |
When entered as a set of fields, the address may include any of the fields below. At least one field must be specified, but no specific fields are required.
Field name | Description | Example(s) |
---|---|---|
| venue and building names | "Brooklyn Academy of Music", "Empire State Building" |
| usually refers to the external (street-facing) building number | "123" |
| street name(s) | "Harrison Avenue" |
| an apartment, unit, office, lot, or other secondary unit designator | "Apt. 123" |
| expressions indicating a floor number | "3rd Floor", "Ground Floor" |
| numbered/lettered staircase | "2" |
| numbered/lettered entrance | "front gate" |
| usually an unofficial neighborhood name | "Harlem", "South Bronx", "Crown Heights" |
| these are usually boroughs or districts within a city that serve some official purpose | "Brooklyn", "Hackney", "Bratislava IV" |
| any human settlement including cities, towns, villages, hamlets, localities, etc. | "Boston" |
| named islands | "Maui" |
| usually a second-level administrative division or county | "Saratoga" |
| a first-level administrative division | "Massachusetts" |
| informal subdivision of a country without any political status | "South/Latin America" |
| sovereign nations and their dependent territories, which have a designated ISO-3166 code | "United States of America" |
| currently only used for appending "West Indies" after the country name, a pattern frequently used in the English-speaking Caribbean | "Jamaica, West Indies" |
| postal codes used for mail sorting | "02110" |
| post office box: typically found in non-physical (mail-only) addresses | "28" |
Parameters
Individual name tokens are scored by a number of algorithms or rules. These algorithms can be manipulated by setting configuration parameters, changing the final Match similarity score. There are over 100 configuration parameters.
You can modify the value of one or more parameters used in a request by adding the parameters object to the call. Any non-static parameter can be changed.
Parameters are passed as a map of parameter name and parameter value:
{"parameters": {"parameterName": value} }
{ "address1": { "city": "string", "cityDistrict": "string", "country": "string", "countryRegion": "string", "entrance": "string", "house": "string", "houseNumber": "string", "island": "string", "level": "string", "poBox": "string", "postCode": "string", "road": "string", "staircase": "string", "state": "string", "stateDistrict": "string", "suburb": "string", "unit": "string", "worldRegion": "string" }, "address2": { "city": "string", "cityDistrict": "string", "country": "string", "countryRegion": "string", "entrance": "string", "house": "string", "houseNumber": "string", "island": "string", "level": "string", "poBox": "string", "postCode": "string", "road": "string", "staircase": "string", "state": "string", "stateDistrict": "string", "suburb": "string", "unit": "string", "worldRegion": "string" }, "parameters": { "addressReorderPenalty": "0.3" }
Response
Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical.
The score is a relative indication of how similar the names are; it is not an absolute value. When comparing different name combinations, the scores cannot always be directly compared. For example, similar comparisons in different languages may generate different scores.
{ "score": 0 }
Supported Languages
The address similarity endpoint is optimized for addresses in English, Simplified Chinese, and Traditional Chinese. Non-English addresses in Latin script may also be matched; results will vary by language.
GET /address-similarity/supported-languages
Retrieve the language pairs supported by the address-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.
Response
Field | Type | Description |
---|---|---|
transliterationScheme | string | |
script | string | Four-letter ISO-15924 script code |
language | string | ISO 639 language code |
licensed | boolean | Indicates if you are licensed for this language |
{ "supportedLanguagePairs": [ { "source": { "transliterationScheme": "string", "script": "string". "language": "string" }, { "target": { "transliterationScheme": "string", "script": "string". "language": "string" }, "licensed": true } ] }