Skip to main content

Babel Street Analytics API

Address Similarity

 https://analytics.babelstreet.com/rest/v1/address-similarity

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/address_similarity.curl

Address Similarity compares two addresses and returns a match score between 0 and 1 reflecting the similarity of the addresses. Addresses are defined as a set of address fields. The endpoint compares the fields in address1 to the fields in address2, matching each specified field and uses these values to calculate the match score. Addresses being matched do not have to contain all the same fields.

The matching algorithm is optimized based on the field type. Named entity fields, such as street address, city, and state are matched using a linguistic, statistically-based system that handles address variations. Numeric and alphanumeric fields, such as house number, postal code, and unit, are matched using numeric-based methods.

Advanced field matching support includes cross-field matching, for example, matching the value in the city field to the value in the state field, and field overrides, for example, matching England and UK.

Request

Addresses can be defined either as a set of address fields or as a single string. When defined as a string, the jpostal library is used to parse the address string into address fields.

Field

Description

Required

address1

One of the two addresses being matched.

yes

address2

The other of the two addresses being matched.

yes

parameters

Values which can be modified to change scoring algorithms and rules.

no

When entered as a set of fields, the address may include any of the fields below. At least one field must be specified, but no specific fields are required.

Table 4. Supported address fields

Field name

Description

Example(s)

house

venue and building names

"Brooklyn Academy of Music", "Empire State Building"

houseNumber

usually refers to the external (street-facing) building number

"123"

road

street name(s)

"Harrison Avenue"

unit

an apartment, unit, office, lot, or other secondary unit designator

"Apt. 123"

level

expressions indicating a floor number

"3rd Floor", "Ground Floor"

staircase

numbered/lettered staircase

"2"

entrance

numbered/lettered entrance

"front gate"

suburb

usually an unofficial neighborhood name

"Harlem", "South Bronx", "Crown Heights"

cityDistrict

these are usually boroughs or districts within a city that serve some official purpose

"Brooklyn", "Hackney", "Bratislava IV"

city

any human settlement including cities, towns, villages, hamlets, localities, etc.

"Boston"

island

named islands

"Maui"

stateDistrict

usually a second-level administrative division or county

"Saratoga"

state

a first-level administrative division

"Massachusetts"

countryRegion

informal subdivision of a country without any political status

"South/Latin America"

country

sovereign nations and their dependent territories, which have a designated ISO-3166 code

"United States of America"

worldRegion

currently only used for appending "West Indies" after the country name, a pattern frequently used in the English-speaking Caribbean

"Jamaica, West Indies"

postCode

postal codes used for mail sorting

"02110"

poBox

post office box: typically found in non-physical (mail-only) addresses

"28"



Parameters

Individual name tokens are scored by a number of algorithms or rules. These algorithms can be manipulated by setting configuration parameters, changing the final Match similarity score. There are over 100 configuration parameters.

You can modify the value of one or more parameters used in a request by adding the parameters object to the call. Any non-static parameter can be changed.

Parameters are passed as a map of parameter name and parameter value:

{"parameters": 
  {"parameterName": value}
} 
{
  "address1": {
    "city": "string",
    "cityDistrict": "string",
    "country": "string",
    "countryRegion": "string",
    "entrance": "string",
    "house": "string",
    "houseNumber": "string",
    "island": "string",
    "level": "string",
    "poBox": "string",
    "postCode": "string",
    "road": "string",
    "staircase": "string",
    "state": "string",
    "stateDistrict": "string",
    "suburb": "string",
    "unit": "string",
    "worldRegion": "string"
  },
  "address2": {
    "city": "string",
    "cityDistrict": "string",
    "country": "string",
    "countryRegion": "string",
    "entrance": "string",
    "house": "string",
    "houseNumber": "string",
    "island": "string",
    "level": "string",
    "poBox": "string",
    "postCode": "string",
    "road": "string",
    "staircase": "string",
    "state": "string",
    "stateDistrict": "string",
    "suburb": "string",
    "unit": "string",
    "worldRegion": "string"
  },
  "parameters": {
    "addressReorderPenalty": "0.3"
}

Response

Similarity scores range from 0 to 1. The higher the score, the greater the confidence that this is a relevant match. A score of 1.0 indicates that the two values are identical.

The score is a relative indication of how similar the names are; it is not an absolute value. When comparing different name combinations, the scores cannot always be directly compared. For example, similar comparisons in different languages may generate different scores.

{
  "score": 0
}

Supported Languages

The address similarity endpoint is optimized for addresses in English, Simplified Chinese, and Traditional Chinese. Non-English addresses in Latin script may also be matched; results will vary by language.

GET /address-similarity/supported-languages 

Retrieve the language pairs supported by the address-similarity endpoint. The endpoint supports matching between the source and target of each pair. The language, script, and transliteration scheme are listed for each source and target.

Response

Field

Type

Description

transliterationScheme

string

script

string

Four-letter ISO-15924 script code

language

string

ISO 639 language code

licensed

boolean

Indicates if you are licensed for this language

{
  "supportedLanguagePairs": [
    {
      "source": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    {
      "target": {
        "transliterationScheme": "string",
        "script": "string".
        "language": "string"
      },
    "licensed": true
    }
  ]
}