Skip to main content

Match Identity

Welcome to Match Studio

Welcome to Match Studio, an interactive tool for evaluating and configuring Babel Street Match for record matching. Match Studio uses Babel Street Match for fuzzy retrieval and matching, while storing the records and search keys in the Elasticsearch full-text search engine.

Match Studio includes the following options:

globalNavigationBar.png
  • Search: Perform searches or batch searches, returning matches from an index. Configure search parameters, import search data, and switch between multiple indices.

  • Compare: Displays the details of a pairwise match, including the algorithms used to calculate the match scores. Modify the values of match parameters and see the impact on the match score. Use these values to optimize match parameters for your data and use case.

  • Evaluate: Calculate the accuracy of Babel Street Match using your gold data and determine the best match threshold.

  • Configure: In this section, you can create, save, edit, import, and export parameter configurations. A parameter configuration is a saved collection of values for the parameters that control how a match is scored. You can also use this section to manage stop words and overrides.

  • Server: Represented by the status icon. Click Configure Servers to access the Configure Servers page, where you can add and remove external servers or change which server Match Studio is connected to.

  • Help: Represented by the question mark icon. Displays this help file and version information.

Your business determines your specific use case and priorities. Search can be optimized for your use case by managing the trade-offs between accuracy and speed, as well as precision (percentage of returned results that are relevant) and recall (percentage of relevant results returned). Optimizing for recall can increase false positives; optimizing for precision can increase false negatives (missed matches).  

System requirements will depend on the size of your index, the required throughput, and your target accuracy levels.

Guided tours

Match Studio includes a set of guided tours that walk you through the features and functions of Match Studio. These tours are an easy way to learn how to use the product.

To start a guided tour, click on the lightbulb on the lower left-hand corner of the product and select a tour. There are tours for Search, Compare, and Configure.

Guided_Tour.png

You must be on the default server to access the guided tours.

The guided tours are not available in locked mode.

Match Studio Limited (free trial)

The free trial is a limited edition of Match Studio that lets you try the tool with your own data for 5 days.

Feature

Full Product

Limited (free trial)

Languages

Expanded language support (see Language support for full list)

English, Arabic, Chinese, Japanese Korean, Russian

Index support

Unrestricted

Up to 3 indices at a time (including permanent OFAC list index) with up to 10,000 records per index

Supported file types for index creation

.csv, .tsv, .xml, .json

.csv, .tsv

Searches per day

Unlimited

500 searches per day. Each row of a batch search file contributes to this limit.

Parallel batch jobs

X

Compare addresses

X

Importing, editing, and deleting parameter configurations

X

Adding, editing, and deleting stop words

X

Adding, editing, and deleting overrides

X

Adding new servers

X

Unlocked mode and locked mode

At various points throughout this guide, you may see references to unlocked mode and locked mode. When you install Match Studio, you may do so in unlocked mode or locked mode. The differences between the mode are as follows:

  • Unlocked mode: Enables full access to Match's configurations (including parameters, stop words, and overrides) at the cost of some performance. This mode is useful for testing or in non-production environments.

  • Locked mode: Limits configurability from within Match Studio (configuration access is still available through Match directly). This ensures production environments cannot be modified through Match Studio, and boosts performance.

For more information on installing Match Studio in either mode, see the installation guide included with your product delivery.

Overview of matching

Matching refers to the process of comparing identifying information about an individual, such as their name, company, address, and/or age, between two records. With Match Studio, you can enter one or more pieces of identifying information and Match Studio will return a list of potential matches from your loaded index. Each match will have a score, between 0% and 100%, indicating the match strength.

Name matching is the core of multi-field entity matching. Names are complex to match because of the large number of variations that occur within a language and across languages. These include, but are not limited to, typographical errors, phonetic spelling variations, transliteration differences, initials, and nicknames.

Match Studio also matches other data types such as organization name, location name, date and address.

You can investigate why particular fields matched and how scores were calculated using the compare functionality. You can even change how the scores are calculated by modifying the match parameters in real time to better understand the process and tune it for your specific application.

Language support

Match Studio currently has two levels of language support: complete and limited. Complete support uses the full set of algorithms to calculate match scores and match parameters. The table below lists the languages and scripts with complete support. Match Studio also supports cross language support, matching between different languages.

Important

Match Studio Limited (free trial) only supports English, Arabic, Chinese, Japanese, Korean, and Russian.

For all other languages, Match Studio has limited support:

  • Exact matches return a score of 1. This is the same for all languages.

  • A score is calculated based on string edit distance.

Types of token and name matches

Match type

Example

Phonetic similarity

Haylee ↔ Hailey ↔ Hayleigh

Initials

J.E. Smith ↔ James Earl Smith

Transliteration spelling differences

Abdul Rasheed ↔ Abd al-Rashid

Nicknames

Wiliam ↔ Will ↔ Bill ↔ Billy

Missing spaces or hyphens

MaryEllen ↔ Mary Ellen ↔ Mary-Ellen

Titles and honorifics

Dr. ↔ Mr. ↔ Ph.D

Truncated name components

McDonalds ↔ McDonald ↔ McD

Missing name components

Phillip Charles Carr ↔ Phillip Carr

Out-of-order name components

Diaz, Carlos Alfonzo ↔ Carlos Alfonzo Diaz

Names split inconsistently across database fields

Dick. Van Dyke ↔ Dick Van . Dyke

Same name in multiple languages

Mao Zedong ↔ Мао Цзэдун ↔ 毛泽东 ↔ 毛澤東

Semantically similar names

Eagle Pharmaceuticals, Inc. ↔ Eagle Drugs, Co.

Semantically similar names across languages

Nippon Telegraph and Telephone Corporation ↔ 日本電信電話株式会社