Match Studio Quick-Start Guide
Introduction to name matching
Welcome to Match Studio! This is an interactive tool for evaluating and configuring Name Match (Match) for record matching. Match Studio uses Match for fuzzy name retrieval and name matching. It includes a multitude of configurable parameters that can be used to optimize the tool for a given use case.
Matching refers to the process of comparing identifying information about an individual, such as their name, company, address, and/or age, between two records. With Match Studio, you can enter one or more pieces of identifying information and Match Studio will return a list of potential matches from your loaded index. Each match will have a score, between 0% and 100%, indicating the match strength.
Name matching is the core of multi-field entity matching. Names are complex to match because of the large number of variations that occur within a language and across languages. These include, but are not limited to, typographical errors, phonetic spelling variations, transliteration differences, initials, and nicknames.
Over the course of this tutorial, you will learn how to import an index, search for names within it, edit search and match configurations, perform evaluations, and more. This quick-start guide will focus only on name matching. However, many of the concepts we will discuss can be applied to other data types as well. For more information on matching other fields, see the Match Studio Help.
Documentation
At any point while using Match Studio, you can hover over the question mark icon on the navigation bar and select Documentation to open the user guide.

Prerequisites
You must have access to Match Studio.
You must have the following files, which are included in
Match_Studio_Quick_Start_File_Package.zip
.Quick_Start_Guide.pdf
: This guide.Quick_Start_Guide_Names.csv
: A list of names for use in creating the index.Quick_Start_Guide_Gold_Data.csv
: A list of name pairs for use in evaluations.
Note
If you do not already have the file package, download it from the Files section of this page.
Create index
Before we start matching, we need to create an index to match against. An index is a list of records, each entry containing one or more fields. In this guide, we will be creating an index of names and related identifiers. Later on, we will search this index for names and examine how Match Studio scores matches between index entries and queries.
Creating an index involves uploading structured data and mapping record fields to it to tell Match Studio what kind of data the index contains. While this tutorial will only cover one index, Match Studio may contain multiple indices. You can switch between indices to search different bodies of data.
The data we are about to import is a list of names in a .csv file with a header. More complete indices may include additional columns for data points such as address or age. For the purposes of this quick-start guide, we will focus only on names.
Hover over Configure on the navigation bar and select Indices.
Select New Index.
Name the index
quickstart
.Select Browse Files and then select
Quick_Start_Guide_Names.csv
(included in the file package).Select Next.
Verify that the data contains a name column and a business column, then select Next.
Now, it's time to start mapping.
Mapping is the process of assigning data types, or fields, to the columns in your dataset. Each column must have a field type assigned to it.
While the included file contains both person names and organization names, we will only be focusing on person names in this tutorial. Other possible field types include location name, date, address, and any other data type that is valid in Match.
Select Name - Person (Match) as the data type for the NAME field.
Select Name - Organization (Match) as the data type for the BUSINESS field.
Select Create.
Once the import has finalized, select Close to view the index list. It should now contain your newly created index.
Search
Now that we have created the index, we can perform a search. Search returns all matching records from the index for a single query. This is useful for situations in which you want to search a single name against an existing index.
Select the search icon in the Options column for the quickstart index.
Type the name “Phillip Ward” in the NAME field. Leave the BUSINESS field blank.
Select Search.
After completing the above steps, scroll down to view a list of potential name matches from the index in descending order by match score. This match score is an indication of how similar two names are and ranges from 0% to 100%. The top result is Phillip Ward with a match score of 100%. (A match score is a percentage indicating how similar two names are, with a higher percentage indicating a closer match. For more information, see Understanding name match scores.) The match score is highlighted in green to indicate that it is a good match. Any match with a score above the match threshold, a configurable value which is set to 70% by default, is highlighted in this manner.
The next highest match score is 44% for Felipe Viana. This can be attributed to the fact that while Felipe and Phillip are similar, Viana and Ward are not. The score is not high enough to be considered a match in our current configuration.
Let's explore what happens when we search for a company name as well.
Ensure that the NAME field still contains "Phillip Ward".
Type or paste "Babel St" in the BUSINESS field.
Select Search.
Now, the match score for the Phillip Ward result should drop to 87%. Let's find out why.
Select the carrot icon next to the Phillip Ward result to expand it.
A match score is displayed beneath each field. This is because each field is scored separately. While the person names in this search have a match score of 100%, the business score is only 75%. Each field also has a configurable weight, but the default is to divide the weight evenly among all searched fields. To determine the final match score, Match Studio calculates the sum of all field scores multiplied by their corresponding field weights. In this case: (75% * 50%) + (100% * 50%) = 87%.
Comparing results
The Compare function allows you to see more details about the match score of two values with the same field type. In this case, we will compare a search value with a match value.
Select the Compare icon
next to the Phillip Ward result. Match Studio displays the Compare tab.
Scroll down to the score computation table.
The first step of calculating a name match score is tokenization, in which each name is broken up into smaller pieces called tokens. The score computation table separates each name into tokens and calculates how much each token from one name matches each token from the other name.

The tokens from the left name are listed down the first column, while the tokens from the right name are along the top row.
The shaded boxes highlight the token pairs selected during matching that produce the best score. A token pair is a token from the left name and its matching token from the right name.
Matchtype: Lists the reason for each match, also known as the match phenomenon. In this case, both token pairs are an exact match.
Raw Score: This is the score for the token pair when matched in a vacuum. It is a number between 0 and 1, with 1 indicating a perfect match. Since we are currently comparing a name against itself, each token pair has a match score of 1.
Context Score: This score takes into account weight, token position, and token type. It is a number between 0 and 1, with 1 indicating a perfect match.
Note
A penalty is applied if the tokens are out of order. When the tokens line up on the diagonal, they are all in order.
If you scroll further down, you will see each token's weight. The weightings determine how important the token pair match is in calculating the final score. Unusual tokens get a higher weighting than common names because their match is more significant, and initials are weighted less than full names.
Let’s see what happens when we misspell one of the tokens.
Change the right name to “Fillip Ward”.
Select Compare.
Now let’s examine how the following have changed:
Weight: The weight for Fillip increased to 74%. This can be attributed to the fact that Fillip is a less common name, meaning a match would be more significant.
Matchtype: Fillip and Philip have “HMM_MATCH,” also known as a fuzzy match, which means the tokens are similar strings.
Tip
Hover over an entry in the matchtype column to view its description.
Match Score: Match Studio still matches Fillip with Phillip, but their match score has been decreased to 0.661.
The final score for these names is 87.4%. While not an exact match, this is still considered a match in the current configuration, as indicated by the green color.
What happens if one of the names is out of order?
Change the right name to “Ward Phillip”.
Select Compare.
Now let’s examine how the following have changed:
Weight: The weight has returned to the initial 60/40 split for both names.
Matchtype: Both token pairs have “MATCH,” indicating an exact match.
Context Score: A penalty was applied, lowering the score, because the names were not in the same order.
The final score for these names is 91.5%.
Next, let’s give Match Studio a bigger challenge by misspelling the first and last name, placing them out of order, and adding an initial.
Change the full name for Person 2 to “Wand J. Fillip”.
Select Compare.
Now let’s examine how the following have changed:
Weight: Fillip has a slightly higher weight at 61% because it is less common than Phillip. The initial J has a low weight of 8%, indicating it might not be an important part of the name. Note how Match Studio has ignored all punctuation when separating the name into tokens.
Matchtype: Both token pairs have “HMM_MATCH,” indicating a fuzzy match. The initial J has “DELETION,” indicating that it does not have a match.
Context Score: The first and last name token pairs have middling match scores of 0.506 and 0.413, respectively. The unmatched j has a much lower match score, but since it is only weighted at 8%, it does not bring the final score down too far.
The final score for these names is 66.9%. This is not high enough to be considered a match in our current configuration (note how the circle surrounding the score is no longer green), but it is close. What if we want Match Studio to consider these two names a match? We will look at a few ways to accomplish this next.
Configure index
You should always configure Match Studio to suit the goals of any given search. This might mean minimizing false negatives at the expense of increased false positives, or vice versa. A false negative does not match with the searched name when it should. A false positive matches the searched name when it should not.
Whether two names match or not depends on the match threshold, a configurable percentage at or above which two names are considered a match. The default match threshold is 70%. With this threshold, two names with a match score of 60% would not be considered a match, but two names with a match score of 80% would be. A match threshold that is too high increases the risk of false negatives, while a match threshold that is too low increases the risk of false positives.
Select the Search tab from the navigation bar.
Select the quickstart index.
Type "Heart Kirsten" in the Name field.
Ensure that the Business field is blank.
Select Search. The best result is "Kristy Hart".
The score of 62% is below the default threshold of 70%, but it isn't hard to see why someone might want these names to match. The names are spelled slightly differently and out of order, but phonetically they are very similar. We could easily make them match by changing the match threshold.
Select Show Configurations.
Change the Match Threshold value to 62%.
Select Save and Apply. The "Kristy Hart" match score is now green, indicating a match.
The problem with decreasing the match threshold to force a match between two names is that it increases your risk of false positives without actually tuning Match Studio to better suit your search goals. Later in the tutorial, we will look at how evaluations can help us determine the ideal match threshold. For now, let's reset the match threshold to the default value.
Change the Match Threshold value back to 70%.
Select Save and Apply.
The Search page also has options to edit the display threshold and choose a match configuration. The display threshold's function is very similar to that of the match threshold, but instead of determining whether a name matches for a searched name, it determines whether a name appears in the list of results for a searched name. We will learn about match configurations later in the tutorial.
Evaluate
Evaluations allow you to see how Match Studio's results for a given index change with different match configurations and thresholds. It does this by comparing its own search results to user-provided gold data. You can think of gold data as the correct answers to a test that Match Studio is administering to itself. The gold data file consists of a list of name pairs and whether or not they should match. For the purposes of this tutorial, we have provided you with a pre-made gold data file for use in your first evaluation.
Select the Evaluate tab from the navigation bar.
Select or drag
Quick_Start_Guide_Gold_Data.csv
(included in the file package) into the import field.Select New Evaluation in the Options column for the uploaded gold data file.
Ensure
Match-Studio-<version> Default
is selected from the Match Configuration dropdown menu.Note
This file is built into Match Studio to make it easy to run an evaluation, but you can create your own configurations, too.
Select Start Evaluation.

When the Evaluation is complete, you will see a table showing the match configuration used to perform the evaluation, date of the evaluation, and the following additional information:
Threshold: The match threshold at which the rest of the data in the table is true. When Best Threshold is selected (next to Display), the threshold at which the match configuration performs best is automatically selected.
TPs: True positives. Number of matching name pairs that were labeled a match.
TNs: True negatives. Number of name pairs that did not match and were not labeled a match.
FPs: False positives. Number of name pairs that did not match and were labeled a match.
FNs: False negatives. Number of matching name pairs that were not labeled a match.
P: Precision. A number between 0 and 1 that indicates what proportion of the matches labeled by Match Studio were correct. A precision value of 1 means there were no false positives.
R: Recall. A number between 0 and 1 that indicates what proportion of matches in the gold data were identified as matches by Match Studio. A recall value of 1 means there were no false negatives.
F1: The harmonic mean of precision and recall. A higher F1 measure indicates better overall accuracy, taking into account both false positives and false negatives.
From this evaluation, we can see that our current threshold of 70% gives us very good results for precision, recall, and F1 measure.
Threshold report

Select Threshold Report in the Options column of the evaluation to open the threshold report. This is a graph displaying how precision, recall, and F1 measure change as the match threshold increases or decreases.
Earlier in the tutorial, we decreased the match threshold to 0.62 to force a match between the names "Kristy Hart" and "Heart Kirsten". Mouse over the graph at a match threshold of 0.62 to see the performance at that threshold. You will see that while we have perfect recall (no false negatives) at this threshold, precision has significantly decreased (more false positives), resulting in an overall lower F1 measure.

When false negatives (missed matches) are troublesome or even dangerous, select a threshold that favors recall. But imagine we are using Match Studio to match medical records; in this case, we don't want to have so many extra matches (false positives) that it is impossible to find the right one (true positive). This situation could result in someone receiving incorrect medical care. With that in mind, let's take another approach to tune our model for this scenario.
Compare Page – configurations
Name order, initialisms, and misspellings are all factors that can affect the match score between two versions of a name. You can edit various match parameters to control how much factors like these affect the match score of a given pair of names. The Compare page has additional features that will allow us to test out different match parameters.
Select the Compare tab from the navigation bar.
Type "Heart Kirsten" in the Left Name field.
Type "Kristy Hart" in the Right Name field.
Select Compare. Note that they have a match score of 62.4%, below the match threshold.
Select Show Configurations.
Select Advanced Configuration.
Decrease the
reorderPenalty
value to 0.01. This tells Match Studio that we don't really care if tokens are out of order as long as they match.Select Apply and Compare to update the match score. Notice how decreasing this parameter increased the score to the default match threshold of 70%.
Match configuration
Now that we have identified which parameter to edit in order to get our test names to match, we can use that information to create a new match configuration. Match configurations allow you to save specific combinations of parameter values, allowing you to easily swap or compare them on the fly.
Hover over Configure on the navigation bar and select Match Configurations.
Select New Configuration.
Name the new match configuration
KH Test
.Select Create.
Lower
reorderPenalty
to 0.01.Select Save.
Evaluate new match configuration
Now, let's combine what we've learned about evaluations and match configurations to evaluate the performance of our new match configuration.
Select the Evaluate tab from the navigation bar.
Select New Evaluation in the Options column for the uploaded gold data file.
Select
KH Test
from the Match Configuration dropdown menu.Select Start Evaluation.
Select Specific Threshold next to Display and set it to 70%. We know that our test query, Heart Kirsten, is a match at this threshold for this configuration.
Looking at the evaluation table, you will see that precision, recall, and F1 are unchanged. However, both precision and F1 are higher at our 0.7 match threshold than they were for the default configuration at a 0.62 match threshold. By creating a new match configuration, we have managed to make Match Studio match our two test names without becoming overly vulnerable to false positives.


Apply new match configuration to search
Now that we have created a new match configuration and evaluated it to determine the best threshold, we can apply it to the search configuration to see its effects in practice.
Select the Search tab from the navigation bar.
Select the quickstart index.
Type "Heart Kirsten" in the Name search field.
Select Show Configurations.
Select
KH Test
from the Match Configuration dropdown menu.Select Save and Apply.
The search now has a 70.1% match score with Kristy Hart, making them a match with our original match threshold.
Export match configuration
You can also export any match configurations you create for use with any version of Name Match or other instances of Match Studio.
Hover over Configure on the navigation bar and select Match Configurations.
Select the Export button in the Options column for the
KH Test
match configuration. The match configuration is downloaded is a .yaml file.
To import the match configuration file into the Match plugin, first rename the file to parameter_profiles.yaml
. Then replace the existing parameter_profiles.yaml
file in the plugin with the new version. After restarting, the parameters in the new file will take effect.
Conclusion
Congratulations! You have just learned how to search with Match Studio, run an evaluation on your data, adjust parameters, and save a modified match configuration.
This was a simplified test project for the purpose of familiarizing you with the uses and general workflow of Match Studio. When using Match Studio for real applications, be prepared to work with larger indices, more fields, and even multiple languages.
For more information on Match Studio, see our online Match Studio documentation.