Babel Street Match
Supported platforms
You must install an SDK package that is appropriate for your platform with respect to operating system and CPU. Since the public API for Match is Java, the C++ compiler that appears in the following list is irrelevant.
OS | CPU | Compiler | $BT_BUILD[a] | ||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
macOS v15+ (Sequoia) | AMD64 | xcode 16 | amd64-darwin24-xcode16 | ||||||||||||||||||||||||||||||||||||||||||||||
macOS v13+ (Ventura) | AARCH64 | xcode 15 | aarch64-darwin22-xcode15 | ||||||||||||||||||||||||||||||||||||||||||||||
Linux | AMD64 | gcc 9.4 | amd64-glibc231-gcc9 | ||||||||||||||||||||||||||||||||||||||||||||||
Linux | AARCH64 | gcc 11 | aarch64-glibc234-gcc11 | ||||||||||||||||||||||||||||||||||||||||||||||
Windows | AMD64 | Visual Studio 2013 | amd64-w64-msvc120 | ||||||||||||||||||||||||||||||||||||||||||||||
Java Only[b] | n/a | n/a | jvm | ||||||||||||||||||||||||||||||||||||||||||||||
[a] [b] The Java-only SDK runs on any OS and CPU with 64-bit Java SDK 11 through 23. |
The compressed SDK package file names take the form:
rni-rnt-<version>-sdk-$BT_BUILD.<ext>
where <version> is the Match version ( x.xx.x.cxx.x is the format), $BT_BUILD is in the table above, and <ext> is .zip
for Windows or Java-only, and tar.gz
for Unix platforms.
Note
The version number is embedded in the package file name.
Match-<version>-api-reference.zip
Match-<version>-ReleaseNotes.pdf
Match-<version>-AppDevGuide.pdf
Release 7.50.0.c78.0-java21
June 2025
Note
This version requires Java 21.
New
BT_BUILD value upgraded: The supported BT_BUILD value for intel mac machines has been upgraded from
amd64-darwin23-xcode15
toamd64-darwin24-xcode16
(RLPNC-8202)Updated real world ID dictionaries: The real world ID dictionaries have been updated. You may see some differences in matching when using real world IDs. (RLPNC-8170)
1 letter misspellings: We added new parameters to control the token score when there is 1 letter difference between 2 versions of a name, or when the edit distance equals 1. To use, set
alternateEditDistanceTokenScorerMechanism
totrue
. The token score will then take the value of the parameteralternateEditDistanceTokenScorerMechanismScore
, the default value of which is 0.95. (RLPNC-8227)New date parameter: We added a new parameter,
boostSwappedDigits
, to control whether dates that are identical apart from two adjacent digits being swapped (e.g. '1958-05-02' and '1958-05-20') are scored higher than other dates with an edit distance of 2. This is set totrue
by default to match previous behavior. (RLPNC-8114)
Bug Fixes
We fixed a problem that caused the HMM token cache to be invalidated unnecessarily. You may see an improvement in performance. (RLPNC-8183)
We fixed an issue that caused an array out of bounds exception when scoring names with
useGeneralizedEditDistanceTokenScorer
enabled. This fix ensures consistent behavior for Latin-script languages and prevents errors with names containing numeric characters. (RLPNC-8090)We fixed an issue, where English to Chinese translations always had a confidence of 1.0 and the response didn't contain the from and target domains. (RLPNC-8101)
We fixed a bug where Simplified Chinese names could be improperly translated during name matching. (RLPNC-8199)
We fixed a Chinese translation error that could lead to an index array out of bounds exception. (RLPNC-8200).
We fixed a problem matching Chinese names where the match value was not = 1. (RLPNC-8199)
We fixed a bug where real world ID values were not properly associated with some Chinese names. You may see an improvement in matching Chinese organization names. (RLPNC-8170)
We fixed an issue that caused a null pointer exception when translating Chinese names with Latin characters. (RLPNC-8088)
We fixed an issue that caused a null pointer exception during Chinese name translation. Annotators are now correctly initialized with safe default settings. (RLPNC-8067)
We fixed an error running the samples using the seq2seq (neural) model for Japanese-English matching. (RLPNC-8081)
We fixed an issue where match scores could slightly exceed 1.0 (RLPNC-8207)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons IO | 2.18.0 | 2.19.0 |
Guava | 33.4.0-jre | 33.4.8-jre |
Jackson | 2.18.2 | 2.19.0 |
Protocol Buffers | 4.29.3 | 4.30.2 |
SnakeYAML | 2.3 | 2.4 |
Package | Version | License |
---|---|---|
JSpecify | 1.0.0 | Apache 2.0 |
Release 7.50.0.c78.0
June 2025
Note
Match will require Java 21 starting in June 2026.
New
BT_BUILD value upgraded: The supported BT_BUILD value for intel mac machines has been upgraded from
amd64-darwin23-xcode15
toamd64-darwin24-xcode16
(RLPNC-8202)Updated real world ID dictionaries: The real world ID dictionaries have been updated. You may see some differences in matching when using real world IDs. (RLPNC-8170)
1 letter misspellings: We added new parameters to control the token score when there is 1 letter difference between 2 versions of a name, or when the edit distance equals 1. To use, set
alternateEditDistanceTokenScorerMechanism
totrue
. The token score will then take the value of the parameteralternateEditDistanceTokenScorerMechanismScore
, the default value of which is 0.95. (RLPNC-8227)New date parameter: We added a new parameter,
boostSwappedDigits
, to control whether dates that are identical apart from two adjacent digits being swapped (e.g. '1958-05-02' and '1958-05-20') are scored higher than other dates with an edit distance of 2. This is set totrue
by default to match previous behavior. (RLPNC-8114)
Bug Fixes
We fixed a problem that caused the HMM token cache to be invalidated unnecessarily. You may see an improvement in performance. (RLPNC-8183)
We fixed an issue that caused an array out of bounds exception when scoring names with
useGeneralizedEditDistanceTokenScorer
enabled. This fix ensures consistent behavior for Latin-script languages and prevents errors with names containing numeric characters. (RLPNC-8090)We fixed an issue, where English to Chinese translations always had a confidence of 1.0 and the response didn't contain the from and target domains. (RLPNC-8101)
We fixed a bug where Simplified Chinese names could be improperly translated during name matching. (RLPNC-8199)
We fixed a Chinese translation error that could lead to an index array out of bounds exception. (RLPNC-8200).
We fixed a problem matching Chinese names where the match value was not = 1. (RLPNC-8199)
We fixed a bug where real world ID values were not properly associated with some Chinese names. You may see an improvement in matching Chinese organization names. (RLPNC-8170)
We fixed an issue that caused a null pointer exception when translating Chinese names with Latin characters. (RLPNC-8088)
We fixed an issue that caused a null pointer exception during Chinese name translation. Annotators are now correctly initialized with safe default settings. (RLPNC-8067)
We fixed an error running the samples using the seq2seq (neural) model for Japanese-English matching. (RLPNC-8081)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons IO | 2.18.0 | 2.19.0 |
Guava | 33.4.0-jre | 33.4.8-jre |
Jackson | 2.18.2 | 2.19.0 |
Protocol Buffers | 4.29.3 | 4.30.2 |
SnakeYAML | 2.3 | 2.4 |
Package | Version | License |
---|---|---|
JSpecify | 1.0.0 | Apache 2.0 |
Release 7.49.0.c77.0
March 2025
New
Intel-based MAC support deprecated: Match will no longer support Intel-based Macs starting in March 2026.
Build requirements: The minimum compatibility requirements for Linux users has changed. Please update your environment. (RLPNC-7639, 7957)
OS/CPU
Old $BT_BUILD
New $BT_BUILD
Linux/AARCH64
aarch64-glibc226-gcc73
aarch64-glib234-gcc11
Linux/AMD64
amd64-glibc217-gcc48
amd64-glibc231-gcc9
Improved Hebrew-English ORG matching: We've improved name matching for organizations for names containing affixes. (RLPNC-7944)
Example: AL-QAID IN IRAQ vs אלקאעדה עירא
Previously: 0.51
Now: 0.89
Improved Korean name matching: We've improved Korean matching for PERSONS and ORGANIZATIONS by updating the stop word list. (RLPNC-7951)
New supported platform: Match now supports machines with Apple silicon.
OS
CPU
Compiler
$BT_BUILD
MAC OS X v13+ (Ventura)
AARCH64
xcode 15
aarch64-darwin22-xcode15
New parameter: We've added a parameter,
maxExpansions
to control the number of phonetically similar terms considered during the first-pass fuzzy matching. Increasing this parameter can improve first-pass results, ensuring that the correct name will be sent to the second pass, but may impact performance. (RLPNC-7967)Improved stop words: We now support stop word prefixes and stop patterns that contain the forward slash (/) characters. This is especially useful for Indian and Malaysian names that include titles which are acronyms, such as A/P, A/L, S/O, and D/O. (RLPNC-7919)
Speed result changes: We've improved our speed testing framework to be more consistent. For this release, we have reported results from both the old and new frameworks for comparison purposes, but future releases will only include results from the new framework. (RLPNC-7975)
English to Chinese name translation: The English to Chinese name translation is now implemented in Java instead of C++. You may see some differences in translations. (RLPNC-8003)
New character support: Match now supports CJK Unified Ideographs Extension B, which includes rare and historical Chinese characters. This update ensures that characters from U+20000 to U+2A6DF are correctly recognized and processed, improving compatibility with Chinese, Cantonese, Korean, and Japanese data. (RLPNC-7956)
Updated real world ID dictionaries: We've updated the real world ID dictionary. You may see some differences in matches when using real world IDs, especially for Chinese organization names. (RLPNC-8043)
Bug Fixes
We fixed a bug in Cantonese where the confidence score was always 0.0 if a token had a special character at the end. (RLPNC-7893)
We fixed an issue where left and right names in the explain info could appear in the wrong order (RLPNC-7922)
We fixed an issue where removing stop words produced improper segmentation results for Chinese, leading to poor match scores. (RLPNC-8036)
We fixed a bug in native Chinese name translation where too many results could be returned if a name had multiple segmentations (RLPNC-8058)
We fixed a bug in Chinese name translation where a
StringIndexOutOfBoundsException
occurred due to incorrect handling of token dictionary matches. (RLPNC-8009)We fixed a bug with the handling of transliteration schemes in Cantonese to English translations. You can now set the language of origin and language of use to
yue
and receive the correct translation without error. (RLPNC-7865)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons IO | 2.17.0 | 2.18.0 |
Apache Log4j | 2.24.1 | 2.24.3 |
Apache Zookeeper | 3.7.0 | 3.9.1 |
Guava | 33.3.1-jre | 33.4.0-jre |
Jackson Annotations | 2.17.2 | 2.18.2 |
Jackson Core | 2.17.2 | 2.18.2 |
Jackson Module Jaxb Annotations | 2.17.2 | 2.18.2 |
Jackson DataFormat | 2.17.2 | 2.18.2 |
Jackson Databind | 2.17.2 | 2.18.2 |
Tensorflow | 0.2.0 | 1.0.0 |
Package | Version |
---|---|
Caffeine | 3.1.8 |
Release 7.48.0.c76.0
December 2024
New
New name: Rosette Name Indexer has been renamed to Babel Street Match.
Improved L337/OCR scorer: We've improved the
GeneralizedEditDistanceTokenScorer
for L337 and OCR type errors. (RLPNC-7776)New parameter added: We've added a parameter
charactersToAlwaysNormalizeToSpace
to define a set of characters to replace with a space when normalizing names and addresses. (RLPNC-7808)Improved performance: We've improved performance of English and Spanish name matching. (RLPNC-7782)
Gender ignored by L337/OCR scorer: We no longer apply a
genderConflictPenalty
in the case of L337 matching.Updated real world ID dictionaries: We've updated the real world ID dictionary. You may see some differences in matches when using real world IDs, especially for Arabic organization names containing numerals. (RLPNC-7642)
Bug Fixes
We now use the correct value for
sourceLanguageOfUse
when translating Cantonese names. The yue/hani/jyutping domain is now used instead of zho/hani/hypy. (RLPNC-7837)An error is no longer thrown when one of the search terms is normalized away. (RLPNC-7870)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons IO | 2.16.1 | 2.17.0 |
Apache Commons Lang | 3.16.0 | 3.17.0 |
Apache Log4j | 2.23.1 | 2.24.1 |
fastutil | 8.5.14 | 8.5.15 |
Guava | 33.3.0-jre | 33.3.1-jre |
Protocol Buffers | 3.25.3 | 3.25.5 |
SnakeYAML | 2.2 | 2.3 |
Woodstox | 7.0.0 | 7.1.0 |
Release 7.47.0.c75.0
September 2024
New
l337/OCR error token scorer added: We've added a new feature to handle leetspeak (l337) and OCR errors. The token scorer contains a series of rules files defining symbol substitutions, along with the penalty for the substitution.
Japanese date parsing: We now support Japanese Imperial date formats, including data matching between Japanese and Gregorian dates. (RLPNC-7620)
Examples:
平成2年3月13日
令和元年
Improved explainability: Explain info now contains alternative Latin readings (
alternativeLatnData
) for translated names. This field displays the alternative Latin readings of translated names from the input-info. (RLPNC-7609)Example:
{ "leftInput": { "data": "温家宝", "normalizedData": "温家宝", "latnData": "wen jiabao", "script": "Hani", "languageOfUse": "CHINESE", "languageOfOrigin": "UNKNOWN", "entityType": "PERSON", "alternativeLatnData": [ "on kahou", "on ka-po" ] }, "rightInput": { "data": "عبد الله بن عبد الرحمن بن جبرين", "normalizedData": "عبد الله بن عبد الرحمن بن جبرين", "latnData": "abd allah bin abd alrahman bin jabrin", "script": "Arab", "languageOfUse": "ARABIC", "languageOfOrigin": "UNKNOWN", "entityType": "PERSON", "alternativeLatnData": [ "abid allah bin abd alrahman bin jabrin", "abd allah bin abid alrahman bin jabrin", "abd allah bunn abd alrahman bin jabrin", "abd allah bin abd alrahman bunn jabrin" ] }, "finalScore": 0.0 }
Improved explainability in record matching: Request-level information is now included in the information block of the response. Messages concerning the mapping or properties of a request are now part of the info field of the response (RLPNC-7653)
Improved unsupported language scoring: We added a new parameter
editDistanceFalloff
to control scoring of names in unsupported languages. The parameter is set to 0 (disabled) by default; a value between 0 and 1 will reduce the dropoff in edit distance. When set to a value between 0 and 1, you may now find matches in scripts and languages that failed to match in previous releases. (RLPNC-7481)Improved error handling in record matching: Record matching will now throw an exception if a mapping does not contain any valid fields. (RLPNC-7653)
Improved performance: Significant improvement of the HMM used in the high-recall phase of name matching. (RLPNC-7686, RLPNC-7679)
Improved performance: We've added a parameter,
disableHMMMatching
, which disables the second pass HMM matching. Enabling this parameter greatly increasing performance, but at the cost of reducing accuracy. By default, this parameter is off. (RLPNC-7605, RLPNC-7604, RLPNC-7603)Faster Arabic translation: We've added a new parameter,
tokenByTokenArabLatnFolkTranslation
to enable faster translation of Arabic-script, Arabic-language names to Latin-script English names with the Folk transliteration scheme. This speeds up translation by processing the name token-by-token. (RLPNC-7719)New parameters added: You can now boost a specified number of tokens from the left, right, or both ends. This can be useful when scoring multi-token given names or surnames. The provided value sets the number of tokens from the left and/or right to give a boost. This feature is supported by adding the parameters
leftBoostTokens
,rightBoostTokens
, andbothEndsBoostTokens
to theparameter_defs.yaml
file. (RLPNC-7606)Improved Hebrew transliteration: We've improved the FOLK transliteration scheme for Hebrew-English. (RLPNC-7623)
Improved Persian/English matching: We've improved accuracy of Persian-English matching through tuning of parameters. (RLPNC-7631)
Persian organization improvements: We've added stop words for organizations for Persian. (RLPNC-6890)
Chinese migration: Chinese name translation and matching is now implemented completely in Java, instead of C++. You may notice some small fluctuations in Chinese name matching. (RLPNC-6768)
Improved token span definitions: Token spans now attempt to include boundary characters removed from the original input during normalization (RLPNC-7730)
Bug Fixes
We fixed a bug where two invalid dates now give an appropriate final score. Previously, they could return a similarity score of 1, now a score of 0 will be returned. (RLPNC-7473)
We fixed a bug where the seq2seq neural model could not be loaded properly. The model now loads. (RLPNC-7754)
We fixed a bug where if you set
haniFourCornerCodeMismatchPenalty
globally, it didn't work for any profiles. Users can now either set thehaniFourCornerCodeMismatchPenalty
globally (enabling it for all Hani matching) or using the language profiles (i.e. specifically forzho_zho
).(RLPNC-7717)We removed some erroneous overrides. (RLPNC-7758)
We fixed a memory leak in RNT that was causing slowdowns in Rosette Cloud and Rosette Server. (RLPNC-7703)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons Lang | 3.14.0 | 3.16.0 |
fastutil | 8.15.13 | 8.15.14 |
Guava | 33.2.0-jre | 33.3.0-jre |
Jackson | 2.17.1 | 2.17.2 |
jna | 4.5.0 | 5.10.0 |
Package | Old Version | New Version |
---|---|---|
oshi-core | 6.4.12 | 5.8.7 |
Release 7.46.0.c74.0
June 2024
New
Japanese translation improvements: We've improved Japanese translation by updating the custom reading dictionary (RLPNC-7539).
Hebrew translation overrides: Added additional overrides for Hebrew translation. (RLPNC-7471)
Added Gender as a name property: You can now force the gender bias/penalty for a certain name instead of using statistical models. Use
ExplicitGender
to assign a gender to a name. Valid values are Male, Nonbinary, Female. (RLPNC-7582)Record similarity improvements
We've added support for fielded dates. (RLPNC-7520)
We've added support for fielded addresses. (RLPNC-7519)
We've added support for specifying a
scoreIfNull
for fields in record similarity. (RLPNC-7516, RLPNC-7517)We've added support for specifying either a parameter universe or a mapping of parameter names to parameter values. (RLPNC-7497)
We've improved validation of field mappings and field weights. (RLPNC-7545)
Record matching responses now return info messages when default property values are used.(RLPNC-7509)
Fields not included in the mapping or fields with unknown types no longer cause an error in record matching. Instead, this information is returned in the "info" block. A record with only non-included fields or fields with unknown types will lead to an error in the response. (RLPNC-7512, RLPNC-7514)
Record similarity now supports partial request success if some record pairs contain unmapped or unknown fields, or encounter other scoring errors. (RLPNC-7502)
We've removed hard limits on the number of records or mapping fields in record-similarity requests (RLPNC-7601)
Non-Latin numeric characters: Numeric characters in certain languages are now normalized to their Latin-script counterparts. Supported languages currently include Thai (RLPNC-7562), Arabic, Burmese, Pashto (RLPNC-7564), Persian (including Iranian and Afghan Persian), Urdu, and Khmer (RLPNC-7565).
Pashto organization improvements: We've added stop words for organizations for Pashto. (RLPNC-6889)
Spanish organization improvements: We've added stop words for organizations for Spanish (RLPNC-6893)
Date improvements: Added support for more date formats. (RLPNC-7147)
Solr updates
We now support Solr 8.11.3 (RLPNC-7551)
We updated Solr 9.0.0 to 9.6.0 (RLPNC-7553)
Solr7 is no longer supported and Solr7 content is no longer included in the package. (RLPNC-7591)
We've added gender support. (RLPNC-7450)
Change to licensing when an overlay directory is specified: RLP licenses must be in the
BT_ROOT
directory, even if an overlay root is set. (RLPNC-7599).
Bug Fixes
We fixed a bug where translating certain Japanese or Korean names could lead to a memory leak. (RLPNC-7550, 7558)
We fixed a bug with cross-entity-type matching: cross-entity-type match scoring is now commutative. As a result of this, cross-entity-type matching will now ignore entity-type-specific parameters and overrides. (RLPNC-7485).
We fixed a bug where khm-khm was returned as a supported language pair for name translation. (RLPNC-7556)
We fixed a bug where native resources were not appropriately freed after generating Cantonese readings. (RLPNC-7589)
We fixed a bug, that caused the left and right input fields in the explain info to swap places, when a Japanese organization name is matched against a single character, that is normalized away (like '*') and the language of use is not defined on that side. (RLPNC-7554)
We fixed a bug that improved performance handling of Cyrillic-script names. (RLPNC-6999)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Log4j | 2.21.1 | 2.23.1 |
Dropwizard Metrics Core | 3.2.3 | 4.2.25 |
fastutil | 8.5.12 | 8.5.13 |
Jackson Annotations | 2.16.1 | 2.17.1 |
Jackson Core | 2.16.1 | 2.17.1 |
Jackson Databind | 2.16.1 | 2.17.1 |
Apache Commons IO | 2.15.1 | 2.16.1 |
Jackson Dataformat Yaml | 2.16.1 | 2.17.1 |
OSHI Core | 6.4.10 | 6.4.12 |
JavaCPP | 1.5.9 | 1.5.10 |
Metrics Integration with JMX | 4.1.5 | 4.2.25 |
Jetty | 9.4.44.v20210927 | 9.4.53.v20231009, 10.0.20 |
Protobuf Java | 3.21.7 | 3.25.3 |
Google Guava | 33.0.0-jre | 33.2.0-jre |
Apache Solr | 8.11.1 | 8.11.3 |
Apache Solr | 9.0.0 | 9.6.0 |
Apache Calcite | 1.27.0 | 1.35.0 |
Apache Avatica | 1.18.0 | 1.23.0 |
Caffeine | 2.9.2 | 2.9.3 |
Package | Version |
---|---|
Apache Solr | 7.6.0 |
RRD4J | 3.2.0 |
Restlet | 2.3.0 |
Noggit | 0.8 |
Apache Commons FileUpload | 1.3.3 |
Apache Zookeeper | 3.4.11 |
Release 7.45.0.c73.0
March 2024
New
Record matching added: We have added pairwise matching for multi-field records. Refer to the Matching Records section of the RNI-RNT Application Developer's Guide for more information on how to use it. (RLPNC-7378)
Malay support expanded: We have improved name Malay matching by expanding the stop word list. (RLPNC-7175, RLPNC-7176)
Hebrew improved: We have improved name matching and translation for Hebrew by expanding translation override lists. (RLPNC-7234)
Explain info improved:
Sub-elements are now ordered consistently and provide additional detail for any given pairwise match. (RLPNC-7293)
All date matches now return explain info about the parsed date fields, and report the “time distance” for time distance and time proximity matches. (RLPNC-7309)
All address matches now return explain info about tokenization and the final score for each address field to address field match. (RLPNC-7292)
Overlay directory size decreased: Larger, immutable data files no longer need to be copied over when utilizing an overlay directory, reducing required disk space. (RLPNC-7370)
Token override selectors improved: Previously, override selectors were additive to the default selector. Now, if a custom
overrideSelector
value is specified, RNI will only consider overrides in that specific selector. (RLPNC-7425)
Bug Fixes
Fixed a bug in which stop words were not being applied properly for Greek. (RLPNC-7144)
Names that get normalized to empty now return an empty list of Real World IDs. (RLPNC-7202)
Overrides are no longer considered for the gender penalty. (RLPNC-7346)
Names in Han script with unknown language and unknown language of origin now give appropriate Japanese, Chinese, and Korean readings. (RLPNC-7367)
Confidence scores for fullname overrides are now correctly calculated when at least one token has a confidence score specified. (RLPNC-7456)
Fixed token tagging in names that end with a suffix. (RLPNC-7417)
Fixed the error being returned when matching addresses. Only English and Chinese are supported for address matching; RNI will now throw an unsupported language exception when matching non-English, non-Chinese addresses if
allLanguageSupport
is disabled. (RLPNC-7416)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
ICU4J | 70.1 | 74.2 |
Jackson Annotations | 2.15.3 | 2.16.1 |
Jackson Core | 2.15.3 | 2.16.1 |
Jackson Databind | 2.15.3 | 2.16.1 |
Jackson Dataformat YAML | 2.15.3 | 2.16.1 |
Google Guava | 32.1.3 | 33.0.0 |
ThreeTen Backport | 1.3.6 | 1.6.8 |
OSHI Core | 3.4.4 | 6.4.10 |
JavaCPP | 1.5.8 | 1.5.9 |
Apache Commons Codec | 1.10 | 1.11 |
Apache Commons Lang | 3.12.0 | 3.14.0 |
Apache Commons IO | 2.15.0 | 2.15.1 |
Release 7.44.0.c72.0
December 2023
New
Overlay directory: RNI now allows users to specify an "overlay directory" location along with the existing
BT_ROOT
location. See the Application Developer's Guide for more information. (RLPNC-7217, RLPNC-7218)Important
Even if this feature is not used, users must now replace all usages of the
com.basistech.utils.Pathnames
class with thecom.basistech.names.internal.Pathnames
class. Users who specify the root location with a system property and plugin users are not impacted.Added parameter
hmmNormalizationAlternative
: This parameter adjusts normalization for more accurate HMM match scores in certain languages. It is available for Russian, Hebrew, Korean, Japanese, Arabic, and Greek. When this parameter is enabled, HMM scores may be lowered. It is disabled for all languages by default. This can lower the probability of names being translated, resulting in them being transliterated instead, which may be more accurate for names in some languages. (RLPNC-7192).Modified name translation responses: When using the
RNTAnnotator
class, name translation responses now include asourceDomain
field with information about the input name. The existingdomain
field has been deprecated and renamedtargetDomain
. (RLPNC-7170)Improved JNI returns: The output objects from JNI calls now contain ISO language codes. (RLPNC-7149)
Cyrillic support expanded: Extended Cyrillic characters are now supported. This will improve performance for non-Russian, Cyrillic languages. (RLPNC-7236)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Log4j | 2.20 | 2.21.1 |
Apache Log4J Core | 2.20 | 2.21.1 |
Apache Log4j SLF4J Binding | 2.20 | 2.21.1 |
Apache Commons IO | 2.11.0 | 2.15.0 |
Google Guava | 32.1.2-jre | 32.1.3-jre |
Jackson Annotations | 2.15.2 | 2.15.3 |
Jackson Core | 2.15.2 | 2.15.3 |
Jackson Databind | 2.15.2 | 2.15.3 |
Jackson Dataformat XML | 2.15.2 | 2.15.3 |
Jackson Dataformat YAML | 2.15.2 | 2.15.3 |
Liblinear-java | 2.30 | 2.44 |
SnakeYAML | 2.0 | 2.2 |
Protobuf java | 3.23.4 | 3.25.0 |
Release 7.43.0.c71.0
September 2023
Important
The former detectableLanguages
parameter is now called detectableLanguagesRuleBased
. If using a previous copy of the parameter_profiles.yaml
file, edit the file to change all instances of detectableLanguages
to detectableLanguagesRuleBased
.
Important
The functionality of response explain info has changed significantly in this version. Please review the changes before upgrading if you rely on this feature.
New
Improved JSON-formatted explain info block in response:
Sub elements are now ordered consistently and provide additional detail about the match logic for any given pairwise match, such as RealWorld ID values for ORG names. (RLPNC-7112, RLPNC-6706, RLPNC-7059)
All date matches now return explain information. (RLPNC-7039)
All address matches now return explain information. (RLPNC-6971)
Malay support added: Added support for Malay-English and Malay-Malay name matching. (RLPNC-7079)
Model-based language detection expanded: Added support for model-based language detection of Malay, Hungarian, Vietnamese, and Turkish names. Enable to detect untagged Latin script as one of these languages. This detection is disabled by default, and enabling it will cause a performance penalty when processing Latin-script names.
Added a new parameter,
detectableLanguagesModelBased
, to control which languages can be detected by our new model. This parameter is turned off by default. Enabling this parameter causes a performance penalty.The former
detectableLanguages
parameter is now calleddetectableLanguagesRuleBased
. Its function has not been changed. (RLPNC-7083, RLPNC-7085)
Date matching expanded: Improved date matching by expanding support for date formats that contain a comma and no space between the date and year, e.g. "August 31,1975". (RLPNC-7143)
Real World ID Omission: Users can specify organization names and QIDs to be omitted from real world ID matching via an omit file. For more information, see the Matching Organizations with Real World IDs section of the RNI-RNT Application Developer's Guide. (RLPNC-6853)
Improved matching speed: Our phonetic name matching algorithm has improved in speed, resulting in performance benefits for users. Users with mostly Latin script and non-Hebrew cases will notice the biggest impact. (RLPNC-7166)
Bug Fixes
RNI-RNT handles malformed override files more gracefully and provides more detailed exceptions. (RLPNC-7102)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
fastutil | 8.5.9 | 8.5.12 |
Jackson Annotations | 2.15.0 | 2.15.2 |
Jackson Core | 2.15.0 | 2.15.2 |
Jackson Databind | 2.15.0 | 2.15.2 |
SnakeYAML | 1.33 | 2.0 |
Release 7.42.0.c70.0
June 2023
New
Improved organization name matching The stop word list for organization entities has been expanded in the following languages (RLPNC-7025):
Turkish
Thai
Portuguese
Russian
Korean
Japanese
Italian
Hungarian
German
French
English
Greek
Arabic
Improved Chinese address matching: Users can now enable word embeddings for Chinese address matching by adding the
useEmbeddings
parameter to address parameter profiles likezho_eng_ADDRESS_FIELD
andzho_zho_ADDRESS_FIELD
. (RLPNC-5884)
Bug Fixes
Fixed a bug in which date matching could return a result greater than 1.0. (RLPNC-6982)
Fixed a bug in which very large Chinese tokens could cause JVM to crash. (RLPNC-7000)
Fixed improper match behavior previously observable with Java 19 and 20. (RLPNC-6969)
Known Issues
When performing address matching, the setting
allLanguageSupport
must be set totrue
. If it is set tofalse
, an unsupported language exception will be thrown, regardless of the language of the address.
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache CXF | 3.5.1 | 3.5.5 |
Apache Log4j Core | 2.19.0 | 2.20.0 |
Apache Log4j API | 2.19.0 | 2.20.0 |
Apache Log4j SLF4J Binding | 2.19.0 | 2.20.0 |
Jackson Annotations | 2.14.0 | 2.15.0 |
Jackson Core | 2.14.0 | 2.15.0 |
Jackson Databind | 2.14.0 | 2.15.0 |
Xerces2 J | 2.9.1 | 2.12.2 |
Package | Version |
---|---|
Google Gauva | 14.01 and 18.0.0 |
Logback Classic Module | 1.1.3 |
Logback Core Module | 1.1.3 |
Release 7.41.0.c69.0
March 2023
New
Turkish support added: Added support for Turkish-English and Turkish-Turkish name matching. We have also added person and organization overrides, stopwords, and language detection to improve matching in Turkish. (RLPNC-6499)
Improved person name matching: RNI-RNT now has the ability to detect given names and surnames in Latin script when the name is of English origin. When the
enableAdditionalOnomastics
parameter is true, gender mismatch penalty is only applied to the detected given name, as opposed to the first name token in a query. (RLPNC-6719, RLPNC-6720)Improved Arabic person name matching: The new TRAILING_PATRONYMIC_DELETION match phenomenon provides improved scores for matches which contain a deletion that is caused by truncation of a patronymic. The score of this deletion is controlled by the
trailingPatronymicDeletionScore
parameter. This only applies to Latin script names of Arabic origin whenenableAdditionalOnomastics
is true. (RLPNC-6756)New date parameters: New parameters have been added for adjusting scores for dates that may contain a single digit manipulation. A digit manipulation is a transformation to a digit that can be accomplished with minimal additional lines (for example, changing a 1 to a 7). The new
improveSingleDigitManipulationMatch
parameter controls how much the score is increased for dates containing a single digit manipulation. ThemaxYearDistanceForDigitManipulation
allows you to sets the maximum number of years beyond which two dates will not be affected by the former parameter. (RLPNC-6677)The new
thresholdToDropoffBiasMapping
parameter allows you set score dropoffs when matching dates that are a certain number of years apart. You can specify the number of years beyond which there should be a score dropoff and how large the dropoff should be. Multiple dropoff points can be set with this parameter. (RLPNC-6810)Russian support migrated from C++ to Java: Russian translation services are now part of jvm-only. Translations from Russian to English may differ slightly. (RLPNC-6764)
Bug Fixes
Fixed a nickname in the
tokens_eng_eng.txt
file. (RLPNC-6807)Fixed a bug in which Solr documentation was missing from Java API documentation. (RLPNC-6769)
Release 7.40.1.c68.0
December 2022
Bug Fixes
Fixed a bug which caused an increase in query time. (RLPNC-6733)
Release 7.40.0.c68.0
December 2022
New
Solr 9 support: Solr 9 is now supported. (RLPNC-6476)
Improved Japanese organization matching: カンパニー (company) added to the Japanese organization stopwords list. (RLPNC-6545)
Improved date matching: Invalid dates are now rejected. For example, April 31, 2021 will now be rejected. (RLPNC-6610)
Improved Chinese address matching: We've expanded the list of Chinese stop words for addresses. (RLPNC-6587)
Improved Chinese organization matching: We've expanded the list of Chinese stop words for organizations. (RLPNC-6615)
Improved name matching results: When no
entityType
is specified, the typePERSON
will be applied. Previously, the typeNONE
was applied. (RLPNC-6576)New parameter for token overrides: We've added a new parameter,
overrideSelector
, to control which overrides will be considered during querying and matching. Override filenames can now specify a “selector” value which will be matched against this parameter. (RLPNC-6561)Improved documentation: Tables listing all parameters and match phenomena have been added to the Application Developer's Guide. (RLPNC-6575)
New JSON-formatted explain info block: We've added a new JSON-formatted explain info block providing detail about the match logic for any given pairwise match. It is executed at run time by passing in the new boolean parameter
jsonExplainInfo
. When set to true, RNI returns the JSON-formatted explain info block. There is a small performance impact to this function; so it is not recommended to run on all queries, but it is a great diagnostic and reporting tool. (RLPNC-6574)
Bug Fixes
Horizontal tabs are now removed as part of normalization in English. (RLPNC-6541)
Control characters are now removed from Arabic names before matching. (RLPNC-6543)
Fixed a case where unexpected name inputs could lead to a null pointer exception. (RLPNC-6634)
Fixed an issue where date parsing of valid inputs could result in an exception. (RLPNC-6600)
Fixed an issue where RNI could return match scores greater than 1. (RLPNC-6595)
Third-party component updates
Package | Old version | New version |
---|---|---|
Apache Log4j | 2.17.1 | 2.19.0 |
commons-compress | 1.21 | 1.22 |
fastutil | 8.5.6 | 8.5.9 |
Jackson | 2.11.1 | 2.14.0 |
JavaCPP | 1.58-alpha.20220614.013710.426 | 1.58 |
SLF4J | 1.7.33 | 1.7.36 |
SnakeYAML | 1.30 | 1.33 |
Release 7.39.0.c67.0
September 2022
New
Improved matching of Han character names: We've added a parameter,
haniFourCornerCodeMismatchPenalty
to add a penalty for names with different four-corner scores. By default, this features is disabled; the parameter is set to 0. To enable this feature, in your parameter_profiles.yam file, set: (RLPNC-6428)zho_zho_PERSON: haniFourCornerCodeMismatchPenalty: 1
Note
This is an experimental feature. As with any experimental feature, we highly recommend experimenting in your environment with your data.
Improved accuracy for Chinese and Japanese organization names: We've improved the embedding match scores for Chinese and Japanese by enhancing the Chinese and Japanese embedding dictionaries. (RLPNC-6420, RLPNC-6421)
Improved Thai, Burmese, and Khmer organization name matching: We've improved real world ID matching for Thai, Burmese, and Khmer. (RLPNC-6337)
Cantonese name segmentation: When
enableYueReadings
is set totrue
, Yue Chinese readings are now segmented.(RLPNC-6369)Example: 葉國謙 vs. Ip Kwok-him
Previously: 0.41
Now: 0.71
Bug Fixes
Fixed a bug where multiple concurrent requests specifying dynamic parameters could cause incorrect match score results (RLPNC-6475)
Creating a
DateSpec
with just a month no longer throws anArrayIndexOutOfBoundsException
. (RLPNC-6456)
Deprecation Notification
The following classes and functions are now deprecated and will be removed in an upcoming release, no earlier than September 2023.
Lookup classes and methods for name indexes. All lookup methods and classes have been replaced with equivalent query versions.
Classes:
com.basistech.rni.index.NameIndexLookupResult
Methods:
com.basistech.rni.index.INameIndex#lookup
com.basistech.rni.index.AbstractNameIndex#lookup
com.basistech.rni.index.INameIndexSession#lookup
com.basistech.rni.index.StandardNameIndex#lookupInList
Objects for creating a name and changing its data. These have been replaced with the NameBuilder class.
Constructors:
all forms of
com.basistech.rni.match.Name#Name
Methods:
com.basistech.rni.match.Name#getMaximumNumTokens
com.basistech.rni.match.Name#setMaximumNumTokens
com.basistech.rni.match.Name#setLanguage
com.basistech.rni.match.Name#setScript
com.basistech.rni.match.Name#setFieldedData
getComparator method.
Methods
com.basistech.rni.index.INameIndexFilter#getComparator
com.basistech.rni.index.internal.OracleNameIndexFilter#getComparator
com.basistech.rni.index.StandardNameIndexFilter#getComparator
Flags.
Previously, the two flags controlled by these methods could vary based on your setup. Now, they are always true, so there is no need to check them or change them.
Methods
com.basistech.rni.index.IndexStoreDataModelFlags#isStoringNamePrimary
com.basistech.rni.index.IndexStoreDataModelFlags#setStoringNamePrimary
com.basistech.rni.index.IndexStoreDataModelFlags#isStoringNameTransliterations
com.basistech.rni.index.IndexStoreDataModelFlags#setStoringNameTransliterations
Removed Constructors for Exceptions.
Constructors:
all versions of
com.basistech.rni.index.UnsupportedNameDomainException#UnsupportedNameDomainException
all versions of
com.basistech.rni.match.UnsupportedDomainPairException#UnsupportedDomainPairException()
Miscellaneous.
Methods
com.basistech.rni.index.StandardNameIndex#completeName
com.basistech.rnt.TranslationResult#getInternalExtraInfo
Constructors
com.basistech.rnt.BasicTranslatorFactory#BasicTranslatorFactory
Release 7.38.1.c67.0
August 2022
New
Neural model support: An open source package (JavaCPP) has been updated to allow the Elasticsearch plugin to use TensorFlow. (RLPNC-6336)
Complete CJK Ext A support: We now have full support of CJK Unified Ideographs Extension A. (RLPNC-6324)
Improved Spanish name matching: We have improved Spanish surname detection. (RLPNC-6294)
Improved Japanese location matching: All prefectures of Japan are now included in the override list. (RLPNC-6326)
Example: 北海道 vs. Hokkaido Prefecture
Previously: 0.7246
Now: 0.99
Bug Fixes
All date matches now return explain information. (RLPNC-6318)
The string "luiz arlos da silva bueno" is no longer in the Greek, English, and Vietnamese stop word lists. (RLPNC-6358)
The parameter
reRankWeight
is now ignored whenreRankMode
is set to replace when using Solr. (RLPNC-6279)Yue Chinese is no longer a language option in the RNI web services client. (RLPNC-6393)
Release 7.38.0.c67.0
June 2022
New
Improved matching of Spanish names and names of Spanish origin: RNI now has a deeper understanding of Spanish surnames. For example: "JOSE JORGE RIOS TORRES" now gets a higher score when matched against "JOSE RIOS" than it does when matched against "JOSE TORRES", since "RIOS" is recognized as the primary surname. (RLPNC-6037)
The following parameters impact how Spanish names are matched:
The new Boolean parameter
enableAdditionalOnomastics
controls whether to assign aTokenType
to allow for multiple Spanish surnames. When set totrue
, each token is assigned aTokenType
, where theTokenType
is one of:UNKNOWN
,SURNAME
, orSURNAME2
. It is currently set totrue
for thespa_eng_PERSON
,spa_spa_PERSON
andeng:spa_eng_PERSON
profiles.The preexisting parameter
surnameTokenTypeWeight
now applies only to theTokenType.SURNAME
tokens. Its default value was changed from 1 to 1.2.The new parameter
secondarySurnameTokenTypeWeight
applies toTokenType.SURNAME2
tokens. Its default value is 0.6.The new parameter
crossSurnameMatchPenalty
parameter is applied (by simple multiplication) when aTokenType.SURNAME
token is scored against aTokenType.SURNAME2
token. Its default value is 0.75.Example: Pablo Emilio Escobar Gaviria vs. Pablo Escobar
Previously: 0.7945
Now: 0.8309
Example: Pablo Emilio Escobar Gaviria vs. Emilio Gaviria
Previously: 0.7999
Now: 0.7365
Improved matching of English organization names: We've added ordinal numbers to the override list for English organizations. (RLPNC-6225)
Example: 1st National Bank vs. First National Bank
Previously: 0.6470
Now: 0.9257
Cantonese support added: RNT can now transliterate Han into Latin characters using the Jyutping transliteration scheme for Cantonese. (RLPNC-6232)
Custom real world id support: A real world identifier associates company names, along with their associated nicknames and permutations, with an identifier. This makes it possible to match different names for an organization which have no phonetic similarity (for example, IBM vs. Big Blue). RNI is shipped with a file of real world ids. You can now create your own file of organizations with all versions of their names and real world ids. (RLPNC 6040, RLPNC-6041)
Improved Vietnamese name matching: We've expanded the Vietnamese stop word lists for PERSON and ORGANIZATION entity types. (RLPNC-5694)
Example: Chủ tịch Hồ Chí Minh vs. Hồ Chí Minh (translation: President Ho Chi Minh)
Previously: 0.83
Now: 0.99
New parameter for non-phonetic matches: We've added the parameter
editDistanceScoreBias
to adjust the bias for edit distance scores. Increasing the impact of edit distance scores can improve the match scores of typographical errors and other non-phonetic matches. (RLPNC-6199)New parameter for organization names: We've added the parameter
tokenizeOrganizationsWithNumbers
that prevents tokenization of names with numbers within the name. When set totrue
(default), the number is left within the token and the name will get a higher value from the edit distance scorer. This is desirable if your data contains organization names which intersperse alphabetic and numeric characters or if your data often contains typographical errors with numerals inserted into otherwise valid tokens. When set tofalse
, the number remains within the organization name token. (RLPNC-6200)Support for Cantonese name transliterations: We've added Jyutping transliterations (Cantonese pronunciation of Chinese names) to the list of readings. The new parameter
enableYueReadings
enables Jyutping readings. It is set tofalse
by default. To enable Jyupting readings, setenableYueReadings
totrue
. (RLPNC-6239)Improved Japanese-English location name matching: We've expanded the Japanese-English overrides list for location names. (RLPNC-6268)
Example: 大阪府 vs. Osaka Prefecture
Previously: 0.5853
Now: 0.99
New parameter to improve performance: We've added the parameter
enableCompletedDataTermFiltering
which when set tofalse
will exclude part of the first pass query. This results in a large performance improvement, but may impact accuracy as some potential results may not be passed to the second pass query. The accuracy impact is small in Latin-Latin matches, but has a much larger impact in other scripts, such as Chinese. The default value istrue
, which is the previous behavior. (RLPNC-6133).Java 17 support added: Java 8 and 9 support has been removed. (RLPNC-6171)
Solr 6 support deprecated: RNI-RNT no longer supports Solr 6 or earlier (RLPNC-6213)
Bug Fixes
More consistent matching scores are now returned from RNI-RNT when using Lucene. To resolve an issue with later versions of Lucene, the internal version of Lucene has been downgraded to 7.6.0 from 8.11.1. (RLPNC-6227)
The
frequencyModelTrainer
now runs without errors. (RLPNC-6220)RNI-RNT will no longer return match scores above 1.0. (RLPNC-6254)
Known Issues
When using RNI-RNT with Solr 7, the flag
-XX:+IgnoreUnrecognizedVMOptions
must be added to the jvm arguments.
Release 7.37.0.c66.0
March 2022
Notice
Solr 6 and earlier support is deprecated as of this release.
Java 8 and Java 9 support is deprecated as of this release.
New
Khmer support
We now support Khmer - Khmer and Khmer - English name matching. (RLPNC-5712)
Khmer stop word lists are included for person and organization types. (RLPNC-5715)
We now support Khmer - English name translation. (RLPNC-5708)
Improved language detection: We've improved language detection for languages that use Han characters (Chinese, Japanese, Korean). (RLPNC-6059)
Improved ORG matching: We've expanded the list of known organization names in our real world ID tables to improve ORG matching in Arabic (ara), Burmese (mya), Chinese (zho), French (fra), German (deu), Greek (ell), Hebrew (heb), Hungarian (hun), Italian (ita), Japanese (jpn), Korean (kor), Portuguese (por), Russian (rus), Spanish (spa), Thai (tha), and Vietnamese (vie). (RLPNC-6090)
Improved Chinese - English address matching: We expanded overrides for ethnic minority regions, particularly from Xinjiang, Tibet, and Inner Mongolia. (RLPNC-6077)
Improved time interval date matching: The
timeProximityYearInterval
parameter now allows any integer interval value. Previously, it would round the increments up to a 10 year interval. (RLPNC-6060)New parameter
boostWeightAtLeftEnd
: We added a new parameterboostWeightAtLeftEnd
to increase the weighting of the first token in a name. When setting this parameter, theboostWeightAtRightEnd
parameter should not be modified. (RLPNC-6094)Improved Chinese - English ORG matching: We added override mappings for Chinese numerals in Hanzi to Arabic numbers from zero through twenty-one. (RLPNC-6028)
Bug Fixes
Pairwise match now works with all languages that have limited language support. Previously, an error was returned for unidentified languages. (RLPNC-6100)
Java-only distributions now contain the model files for Thai, Hungarian, and Greek. (RLPNC-6111)
Third-party component updates
This release includes the following third-party component changes:
Component | Old Version | New Version |
---|---|---|
Lucene | 7.6.0 | 8.11.1 |
Apache Commons IO | 2.7 | 2.11.0 |
ICU4J | 59.1 | 70.1 |
fastutil | 8.4.0 | 8.5.6 |
SLF4J | 1.7.28 | 1.7.33 |
SnakeYAML | 1.26 | 1.30 |
commons-lang | 2.6 and 3.10.0 | 3.12.0 |
Component | Version |
---|---|
slf4j-log4j | 1.7.28 |
Release 7.36.1.c65.0
January 2022
New
Improved address searching: The set of address match results returned are now consistent. (RLPNC-6057)
Neural model for Katakana: To enable the neural-based phonetic name matching model, set
enableSeq2SeqTokenScorer
to true in thejpn_eng
profile in theparameters_profiles.yaml
file. Previously, it was set in theinternal_param_defs.yaml
file. (RLPNC-6068)log4j update: Updated log4j to the 2.17.1 release. (RLPNC-6071)
Bug Fixes
RNI-RNT no longer emits a
NullPointerException
in certain cases of address matching in which an address contains multiple languages. (RLPNC-6083)
Third-party component updates
This release includes the following third-party component changes:
Component | New Version |
---|---|
log4j | 2.17.1 |
Release 7.36.0.c65.0
November 2021
Important
If you have any customizations for address stop words or overrides from previous releases, the file names must be renamed to the new file naming convention. The file names now include three letter language codes.
New
Chinese address matching: We now support Chinese-Chinese and Chinese-English address matching. (RLPNC-5822)
Language-specific address override files: Address override files are now language-specific, and the file name must include the language codes. (RLPNC-6032)
Example: English-English state overrides
Previous file name:
BT_ROOT/rlpnc/data/addresses/ref/override/state.txt
New file name:
BT_ROOT/rlpnc/data/addresses/ref/override/eng_eng_state.txt
Example: Chinese-English city overrides
Previous file name:
BT_ROOT/rlpnc/data/addresses/ref/override/city.txt
New file name:
BT_ROOT/rlpnc/data/addresses/ref/override/zho_eng_city.txt
Language-specific address stop word files: Stop word files for address matching on text fields (house, road, city, state, country) are now language-specific, and the file name must include the language code (either
eng
orzho
). (RLPNC-6031)Example: English city stop pattern
Previous file name:
BT_ROOT/rlpnc/data/addresses/ref/stopwords/stopregexes_city.txt
New file name:
BT_ROOT/rlpnc/data/addresses/ref/stopwords/stopregexes_eng_ADDRESS_FIELD__CITY.txt
Basic support for all languages: RNI can now index and match names in any language. Languages which previously would have returned an "unsupported language" error now return a match score. The score is either 1 for a perfect match, or a value based on edit distance. Set the parameter
allLanguageSupport
tofalse
for backwards compatible behavior to previous versions. (RLPNC-5979)New parameter to improve recall for ORG matching: You can now improve recall in RNI's first-pass when using real world IDs by increasing the value of the parameter
nameRealWorldIdQueryBoost
. (RLPNC-5938)Improved ORG matching: We added real world ID tables or organizational names to improve ORG matching in the following languages: Thai (tha), Greek (ell), Hebrew (heb), Burmese (mya), German (deu), French (fra), Hungarian (hun), Italian (ita), Portuguese (por), Spanish (spa), and Vietnamese (vie). (RLPNC-5986)
Example: "International Astronomical Union" vs. "האיגוד האסטרונומי הבינלאומי"
Previously: score 0.70
Now: score 0.98
Neural model for Katakana: We've added a neural-based phonetic matching model to improve Katakana-Latin name matching. To enable the model, set
enableSeq2SeqTokenScorer
to true ininternal_param_defs.yaml
file. (RLPNC-5945)Improved Hebrew name matching: We've added a rule-based vocalization checker for the statistical-model vocalizer to improve Hebrew-Hebrew and Hebrew-English name matching. (RLPNC-5990)
Time-distance capable date matching parameter: We have added an alternative date matching solution that aims to make the definition of closeness for dates more flexible and adjustable. This new algorithm computes the chronological distance between dates in years and uses a
timeProximityYearInterval
parameter to determine matching candidates and apply an appropriate score penalty. To enable this feature, setalternativeTimeProximityMatch
totrue
. (RLPNC-5948)Example: 01/06/1982" vs. "31/12/1980" [
timeProximityYearInterval
= 10 years]Previously: score 0.47
Now: score 0.88
Bug Fixes
Hebrew Folk transliteration has been improved, especially for the letters vav and yod. (RLPNC-5916)
The tokenization of names written in the Burmese script has been improved. This change applies only to RNI and mainly affects names of non-Burmese origin. (RLPNC-5957)
Burmese-English transliteration has been improved by revising the Folk and MLCTS transliteration schemes. (RLPNC-5950)
Third-party component updates
This release includes the following third-party component changes:
Component | Old Version | New Version |
---|---|---|
OSHI Core | 3.4.2 | 3.4.4 |
Java Native Access Platform | 4.3.0 | 4.5.0 |
Java Native Access | 4.3.0 | 4.5.0 |
ThreeTen Backport | 1.3.3 | 1.3.6 |
JavaCPP | 1.5.3 | 1.5.4 |
Component | Version |
---|---|
Tensorflow core API | 0.2.0 |
Protobuf java | 3.12.2 |
Tensorflow NdArray | 0.2.0 |
Component | Version |
---|---|
DeepLearning4j Core | 1.0.0-beta7 |
DeepLearning4j TSNE | 1.0.0-beta7 |
Nearestneighbor Core | 1.0.0-beta7 |
DeepLearning4j Datasets | 1.0.0-beta7 |
DeepLearning4j Common | 1.0.0-beta7 |
DeepLearning4j DataVec Iterators | 1.0.0-beta7 |
DeepLearning4j Modelimport | 1.0.0-beta7 |
JavaCPP Presets Platform For HDF5 | 1.12.0-1.5.3 |
DeepLearning4j NN | 1.0.0-beta7 |
DeepLearning4j Utility Iterators | 1.0.0-beta7 |
Concurrent | 1.3.4 |
ND4J Common | 1.0.0-beta7 |
ND4J Guava | 1.0.0-beta7 |
ND4J Protobuf | 1.0.0-beta7 |
ND4J Jackson | 1.0.0-beta7 |
OSHI JSON | 3.4.2 |
Apache Commons Math | 3.5 |
Apache Commons Compress | 1.18.0 |
ND4J API | 1.0.0-beta7 |
Byte Units | 0.9.4 |
FlatBuffers Java API | 1.10.0 |
Gson | 2.8 |
Apache Commons Net | 3.1 |
ND4J Protobuf | 1.0.0-beta7 |
Neoitertools | 1.0.0 |
DataVec API | 1.0.0-beta7 |
Apache FreeMarker | 2.3.23 |
Stream Library | 2.9.8 |
OpenCSV | 2.3 |
T Digest | 3.2 |
ND4J Native | 1.0.0-beta7 |
JavaCPP Presets For OpenBLAS | 0.3.9-1-1.5.3 |
JavaCPP Presets For MKL | 2020.1-1.5.3 |
ND4J Native API | 1.0.0-beta7 |
Release 7.36.0.c65.0
September 2021
Bug Fixes
RNI-RNT no longer crashes when performing Russian-English name matching and the Russian name contains white-space only tokens. (RLPNC-5939)
Third-party component updates
No changes to third-party components.
Release 7.35.1.c65.0
September 2021
New
ARM64 support: We now support ARM64 processors. (RLPNC-5912)
Burmese transliteration: We added a Basis Technology-created Folk transliteration scheme for Burmese name matching that is similar to how Burmese names are commonly transliterated to English. (RLPNC-5892)
Improved address matching: We've modified the field weight values to provide more accurate address match scores. Weightings were determined by evaluating US and UK address data. (RLPNC-5893)
Example: "85 Court Road Newton Ferrers, Plymouth PL8 1DE1B Devon, England UK” vs “85 Court Road Newton Ferrers PL8 1DE UK"
Previously: Score: 0.73
Now: Score: 0.81
Improved address overrides: Address overrides are now applied to groups of related address fields, instead of just individual fields. Overrides apply when matching any two fields from the same group. (RLPNC-5899)
New date parameters: We've added a new parameter,
dateOrdering
, which sets the default date representation. It must be one of three valid values,YMD
,DMY
, orMDY
. The default value isMDY
. (RLPNC-5904)Improved Hebrew transliteration: The Hebrew character ח used to be transliterated as “h” in some cases and “kh” in others (if it was followed by a geresh). It is now transliterated as “ch"when not followed by a geresh. The Hebrew character כ used to be transliterated as “h” in some cases and “k” in others (if it has a dagesh). Now, it is transliterated to “ch” in the cases when it used to be transliterated to “h”. (RLPNC-5928)
Example: נחמן
Previously: Nahman
Now: Nachman
Example: מיכל
Previously: Mihal
Now: Michal
Bug fixes
EntityTypes in query: Queries now filter by entity type. Note that indexed names without a specified entity type will only match query names that also don't specify an entity type. (RLPNC-5896)
Example: Create an index with one document: “RIDGEWAY JOHN” as PERSON. Query the index with “Ridgeway School” as ORGANIZATION in Solr.
Previously: Returns "RIDGEWAY JOHN"
Now: No results returned
Release 7.35.1.c65.0
August 2021
New
Improved Hebrew-English name matching:
We've improved the statistical model. (RLPNC-5842)
We changed the default transliteration scheme to FOLK from ISO259-2-1994, which improves matching scores as FOLK more closely matches how people transliterate Hebrew names. (RLPNC-5844)
Example: בִּנְיָמִין גַּנְץ vs. Benjamin Gantz
Previously: Score: 0.8738
Now: Score: 0.9709
We expanded the token overrides for person entity types. (RLPNC-5845)
Example: אלכס vs. Alexander
Previously: Score: 0.8675
Now: Score: 0.9361
We added word embeddings for Hebrew organizations. (RLPNC-5837)
Example: ארגון המזון והחקלאות vs. Food and Agriculture Organization
Previously: Score: 0.6002
Now: Score: 0.7309
Improved Hebrew-Hebrew name matching: We expanded the token overrides for person entity types. (RLPNC-5891)
Example: סולומונתאס vs. סולונאס
Previously: Score: 0.5309
Now: Score: 0.8894
Improved English-English name matching: We added the token override pair Alex/Aleksandar. (RLPNC-5871)
Example: Alex vs. Aleksandar
Previously: Score: 0.4106
Now: Score: 0.8894
Improved matching for identifiers: We improved matching and added support for three new subtypes: IDENTIFIER_DRIVERS_LICENSE, IDENTIFIER_LICENSE_PLATE, IDENTIFIER_NATIONAL_ID_NUM, along with IDENTIFIER_GENERIC. (RLPNC-5852)
Example: NH123456789DL vs. NH123456789DN (as IDENTIFIER_DRIVERS_LICENSE entity type)
Previously: Score: 0.6940
Now: Score: 0.9689
Improved Japanese Segmentation: We've expanded the segmentation dictionary to improve Japanese name segmentation. (RLPNC-5835)
Example: ミロシェヴィッチスロボダン
Previously: [ミロシェヴィッチスロボ ダン]
Now: [ミロシェヴィッチ][ スロボ ダン]
Improved address matching: We've expanded the override tables for UK, U.S., and Canadian addresses. (RLPNC-5886)
Example: houseNumber<47>road<Albert Street>city<Aberdeen>stateDistrict<Aberdeenshire>postCode<AB25 1XT> vs. houseNumber<47>road<Albert Street>city<Aberdeen>stateDistrict<ABD>postCode<AB25 1XT>
Previously: Score: 0.86
Now: Score: 0.96
New API endpoint: We added
com.basistech.names.parameters.ParameterProfileUtils displayParameterUniverses
to list all named parameter universes registered in the system. (RLPNC-5851)
Bug Fixes
Overrides for alphanumeric address fields (houseNumber, unit, poBox, postCode) are now being applied. (RLPNC-5863)
Example: “3710 W Martin Luther King Blvd STE #121” vs. “3710 W Martin Luther King Blvd Suite #121”
Previously: Score: 0.833
Now: Score:0.95
Hebrew tokens containing diacritics are now identified in the override table. (RLPNC-5882)
Example: אֲבִי vs. Abigail
Previously: Score: 0.5299
Now: Score: 0.9361
Release 7.34.0.c64.1
May 2021
New
Added support for Burmese-English name translation. (RLPNC-5662)
Example: မင်း အောင် လှိုင် ⟹ Maang Aaung Lhuing
Added support for Burmese-Burmese and Burmese-English name matching. (RLPNC-5660)
Added support for Hebrew-Hebrew and Hebrew-English name matching. (RLPNC-5339)
Added support for Vietnamese-Vietnamese and Vietnamese-English name matching. (RLPNC-5687)
Improved address matching by improving handling of postal codes. (RLPNC-5639)
Example: houseNumber<123>road<Clifton St>city<Cambridge>state<MA>postCode<02140 1234> vs. houseNumber<123>road<Clifton St>city<Cambridge>state<MA>postCode<02140-1234>
Previously: Score: 0.89
Now: Score: 1.0
Improved address matching by expanding override tables for UK and CA addresses. (RLPNC-5607)
Example: houseNumber<100>road<Main Ave>city<Shellbrook>state<Saskatchewan>postCode<S0J 2E0> vs houseNumber<100>road<Main Ave>city<Shellbrook>state<Sask>postCode<S0J 2E0>
Previously: Score: 0.88
Now: Score: 0.96
Improved Chinese-English name matching by allowing English translations from the list of translations of the Chinese name to be considered when matching a name pair. (RLPNC-5643)
Example: 汤姆 vs. Tom
Previously: Score: 0.37
Now: Score: 0.99
Improved English-English name matching for ORGANIZATION entity type by expanding the overrides list with numbers and their written from 1 to 21. (RLPNC-5644)
Example: Channel One Russia vs. Channel 1 Russia
Previously: Score: 0.54
Now: Score: 0.96
Improved name matching for ORGANIZATION entity type by adding new frequency models in English, Chinese, Arabic, Japanese and Russian. (RLPNC-5416)
Improved name matching for ORGANIZATION entity type by adding new frequency models in English. (RLPNC-5416)
Updated the English frequency model for PERSON entity type by adding birth names from 1920-2019 to the existing model. (RLPNC-5592)
Improved date matching by expanding parsing support for more date formats. (RLPNC-5585)
Example: 2000-12-99 vs. 2000-12-DD
Previously: InvalidArgumentException: Invalid value for day: 2000-12-99.
Now: Score: 0.91
Improved name deduplication by adding support for name overrides so that for example nickname “Mike” is included in the same cluster with “Michael”. (RLPNC-5600)
Upgraded the RNI Solr plugin to support Solr 8.8.1 (RLPNC-5652)
Improved Solr query performance for multi-valued RNI address and name fields by adding support for promising term filtering. Promising term filtering uses knowledge of document frequency at query time to prevent slow queries due to common terms. (RLPNC-5649)
Improved Solr query performance by disabling phrase queries. To enable phrase queries for multi valued fields, set
useSolrPhraseQueries
parameter totrue
. (RLPNC-5789)
Bug Fixes
Fixed a bug where the Solr parameter
minExactCount
was not being honored in Solr queries that use RNI type fields. Being able to set this parameter can significantly improve query performance. (RLPNC-5792)
Third-Party component updates
Component | Old Version | New Version | License |
---|---|---|---|
Apache Lucene Solr | 8.5.1 | 8.8.1 | Apache |
Apache Lucene Core | 8.5.1 | 8.8.1 | Apache |
Apache Commons Lang | 3.9.1 | 3.10.0 | Apache |
Apache Zookeeper | 3.5.5 | 3.6.2 | Apache |
Jetty HTTP2 Client | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty HTTP2 Common | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty HTTP2 HTTP Client Transport | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty Asynchronous HTTP Client | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty Http Utility | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty Utilities | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Jetty IO Utility | 9.4.24.v20191120 | 9.4.34.v20201102 | Apache, EPL |
Metrics Integration with JMX | 4.1.2 | 4.1.5 | Apache |
Component | Version |
---|---|
Apache Commons FileUpload | 1.3.3 |
Restlet | 2.4.0 |
Release 7.33.1.c63.0
January 2021
New Features
Added support for address matching in the RNI Solr plugin 6.6, 7.6 and 8.5. (RLPNC-5264)
Added support for date matching in the RNI Solr plugin 7.6 and 8.5[2]. (RLPNC-5480)
Improved Japanese-English and Japanese-Japanese name matching by expanding the Japanese stop word list for ORGANIZATION entity type. (RLPNC-5466)
Example: コダック合同会社 vs. Kodak Limited
Score Before: 0.7258
Score After: 0.98
Improved Russian-English and Russian-Russian name matching by expanding the Russian stop word list for ORGANIZATION entity type. (RLPNC-5467)
Example: Балтийский федеральный университет имени Иммануила Канта vs. Immanuel Kant Baltic Federal University
Score Before: 0.7140
Score After: 0.7740
Improved character normalization for all supported languages. (RLPNC-5514)
Improved Russian-English and Russian-Russian name matching by adding support for Russian organizations in the entity resolution engine. (RLPNC-5564)
Example: Ура́льские авиали́нии vs. Ural Airlines
Score Before: 0.6648
Score After: 0.98
Improved Korean-English and Korean-Korean name matching by adding support for Korean organizations in the entity resolution engine. (RLPNC-5564)
Example: 현대자동차 vs. Hyundai Motor Company
Score Before: 0.6921
Score After: 0.98
Improved date matching by adding support for dates in yy/mm/dd format. (RLPNC-5562)
Example: 76/01/22 vs 01/22/1976
Before: IllegalArgumentException: Invalid value for month: 76
Score After: 0.95
Release 7.33.0.c62.2
New Features
Improved the accuracy of name matching for ORGANIZATION entity type by integrating name completion with an internal entity resolution engine. Currently it has support for English, Arabic, Chinese and Japanese organizations. (RLPNC-5454)
Example: ソニー株式会社 vs. Sony Corporation
Score Before: 0.7793
Score After: 0.98
Added three new parameters:
doQueryRealWorldIds
,useRealWorldIds
andrealWorldIdScore
which allow to control the entity resolution engine integrated as part of the name completion process.doQueryRealWorldIds
allows you to disable (enabled by default) the query clauses that are looking to match real-world IDs.useRealWorldIds
can be set to false per-profile to disable matching real-world IDs for specific pairs of languages or entity types.realWorldIdScore
controls the match score awarded when two names match due to having matching real-world IDs. (RLPNC-5417)Added support for debug information when matching addresses.
AddressMatchResult.getAddressFieldPairResults
now returns a list of AddressFieldPairResults that describe how each pair of address fields was scored. (RLPNC-5511)Improved segmentation of Japanese names for PERSON entity type. (RLPNC-5520)
Example: スズキタロウ
Before: スズ キタロウ (suzu kitarou)
After: スズキ タロウ (suzuki tarou)
Expanded the Spanish-Spanish token overrides for PERSON entity type. (RLPNC-5539)
Example: Francisco vs. Paco
Score Before: 0.7786
Score After: 0.8723
Release 7.32.3.c62.2
New Features
Added three new parameters:
nameBigramQueryBoost
,nameDoubleMetaphoneQueryBoost
, andnameInitialQueryBoost
which allow you to tweak the weight of their respective query clause boosts in Lucene in order to improve first-pass recall. (RLPNC-5503)
Bug Fixes
Fixed a bug where Arabic-English name translation was resulting in a NullPointerException for certain names. (RLPNC-5509)
Example: ﻤﺣﻤﺩ
Before: NullPointerException
Before: NullPointerException
Release 7.33.2.c63.0
January 2021
Bug Fixes
Fixed a bug when parsing dates in the format dd/mm/yyyy where dd > 12. (RLPNC-5588)
Example: 30-12-2016 vs. 12-30-2016
Before: IllegalArgumentException: Invalid value for day: 2016
Score After: 1.0
Release 7.32.2.c62.2
New Features
Upgraded rws-names web services to support Apache Tomcat 8.5.56 (RLPNC-5493)
Release 7.32.2.c62.2
New Features
Enhanced semantic matching of tokens in organization names through use of word embeddings in Spanish, Arabic and Korean. Note: This drastically increases the size of the SDK package. To reduce the size, the embeddings dictionaries in rlpnc/data/tvec/filtered-vectors can be removed as long as the corresponding language pairs in parameter_profiles.yaml have
useEmbeddings
set to false. (RLPNC-5449)Example for Spanish-English ORG name matching: Astilleros y Talleres del Noroeste vs. Shipyards and Workshops of the Northwest
Score Before: 0.4558
Score After: 0.8838
Example for Korean-English ORG name matching: 아시아나 항공 vs. Asiana Airlines
Score Before: 0.7208
Score After: 0.8672
Example for Arabic-English ORG name matching: الاتحاد العالمي للحفاظ على الطبيعة والمصادر الطبيعية vs. International Union for Conservation of Nature and Natural Resources
Score Before: 0.3263
Score After: 0.7106
Upgraded the native libraries of the native Linux-compatible release of RNI. We are now using CentOS 7 to build these libraries, as CentOS 6 will reach EOL in November 2020. The new BT_BUILD value for the Linux package is amd64-glibc217-gcc48. (RLPNC-5453)
Improved organization name matching by expanding the English stop word list for organization entity type. (RLPNC-5458)
Example for English-English name matching: SUNY Canton vs. State University of New York at Canton
Score Before: 0.8340
Score After: 0.8713
Added a new parameter,
nameGluedQueryBoost
, which allows you to adjust the boost on the query term that looks for an exact match of the normalized name with spaces removed. The default value fornameGluedQueryBoost
is 1.0. (RLPNC-5445)Example: in a 10-million name Lucene index which includes names "PAUL MARTINI" and "JOHN LARKIN", query for each name respectively with artificially "re-tokenized" names such as "PAU LMAR TIN I" and "J O H N LARKIN":
Before: "PAUL MARTINI" and "JOHN LARKIN" result in a first-pass miss
After: "PAUL MARTINI" and "JOHN LARKIN" appear in the top few results when querying
Improved address matching by improving the normalization of the postal code address field. (RLPNC-4967)
Example: road<71-75 Shelton street>city<Covent garden>postCode<WC2H9JQ> vs. road<71-75, SHELTON STREET>city<LONDON>postCode<WC2H 9JQ>
Score Before: 0.5364
Score After: 0.7666
Release 7.32.0.c62.2
New Features
Added support for Hebrew-English name translation. The default transliteration scheme is set to FOLK, additionally ISO259-2-1994 and ICU transliteration schemes are supported as well as a Hebrew-English statistical model intended for translating names of foreign origin. (RLPNC-5446, RLPNC-5340, RLPNC-5342, RLPNC-5432)
Hebrew to English translation with FOLK transliteration scheme example: רוזלינד פרנקלין ⟹ Ruzlind Prenklin
Hebrew to English translation with ISO259-2-1994 transliteration scheme example: רוזלינד פרנקלין ⟹ Rẇzliynd Prnqliyn
Hebrew to English translation with ICU transliteration scheme example: רוזלינד פרנקלין ⟹ Rẇzĕliynĕd Pĕrĕnĕqĕliyn
Hebrew to English translation with statistical model example: רוזלינד פרנקלין ⟹ Rosalind Franklin
Added support for Hebrew vocalization via a dictionary lookup and statistical model. (RLPNC-5389, RLPNC-5390, RLPNC-5388)
Example: בִּנְיָמִין נְתַנְיָהוּ ⟹ בנימין נתניהו
Improved Arabic-English and Arabic-Arabic name matching by expanding the stopword list for person and organization entity types. (RLPNC-5248)
Example for PERSON entity type: محمد vs. نبي محمد
Score Before: 0.4186
Score After: 0.99
Example for ORGANIZATION entity type: البنك الأهلي التجاري vs. ال البنك الأهلي التجاري
Score Before: 0.6512
Score After: 0.99
Bug Fixes
Fixed a bug where Arabic-English translation wasn't returning translations as it used to from the statistical model and therefore affecting Arabic-English name matching as well. (RLPNC-5413)
Example: Blake Lively vs. بليك ليفلي, the Arabic name used to be transliterated as "blayk lifali", whereas now it's actually translated as "blake lively"
Score Before: 0.6812
Score After: 0.99
Release 7.31.1.c62.2
New Features
Upgraded rws-names web services to support Apache Tomcat 8.5.55 (RLPNC-5403)
Expanded the Latin gender model with French, German, Italian, Portuguese, and Spanish names, so it's able to detect the gender of more names from the mentioned origins. (RLPNC-5334)
Release 7.31.0.c62.0
New Features
Upgraded the RNI Solr plugin to support Solr 8.5.1 (RLPNC-5291)
Improved Arabic-English and Arabic-Arabic name matching by improving segmentation of Arabic names and adding an Arabic/English statistical model, gender identification, language model, edit distance scoring, as well as adding support for initials and initialisms. (RLPNC-5269, RLPNC-5244, RLPNC-5256, RLPNC-5366, RLPNC-5255, RLPNC-5254)
Improved address matching by adding support for cross-field matching of addresses, multi-token overrides, normalization process of address fields and expanded address overrides. (RLPNC-4969, RLPNC-5383, RLPNC-5386, RLPNC-5378)
Added support for parsing addresses using the jpostal[4] library. (RLPNC-4571)
Expanded the Japanese-English token overrides for organizations. (RLPNC-5384)
Bug Fixes
Made Arabic normalization robust to incorrect language identification. (RLPNC-5282)
Release 7.30.5.c62.2
February 2020
Bug Fixes
Fixed a bug in address matching where the country field in the address was not being matched correctly to values in the index. Address matching queries using a country code now work correctly. (SUPPO-1459)
Example: Create an index with a primary name, birth date, and 2 addresses. The only field in the address is country. The country values are "Spain" and "Mexico". Search for an address with the value of "Spain".
Previously: No documents are returned.
Now: One document is returned, with a match score of 1.0.
Release 7.30.4.c62.2
February 2020
New Features
For Japanese, added the ability to remove stop words from within parentheses from the name match. (RLPNC-5296)
Previously: (株) would not be removed even if 株 is a stop word.
Now: (株) is removed if 株 is a stop word.
Release 7.30.3.c62.2
January 2020
Bug Fixes
The match score when matching two Arabic names no longer depends on the order of the names. (RLPNC-5258)
Previously:
{"name1": {"text":"مضاوي مشاري سعود ال فرحان ال سعود", "language":"ara", "entityType": "PERSON"}, "name2": {"text": "الاميرة مضاوي بنت مشاري بن سعو", "language":"ara", "entityType": "PERSON"}} {"score":0.56636584}
{"name1": {"text":"الاميرة مضاوي بنت مشاري بن سعو", "language":"ara", "entityType": "PERSON"}, "name2": {"text": "مضاوي مشاري سعود ال فرحان ال سعود", "language":"ara", "entityType": "PERSON"}} {"score":0.81120528}
Now: Both searches will return the same score
Release 7.30.2.c62.2
January 2020
New Features
For Chinese, added the ability to remove stop words from within parentheses from the name match. (RLPNC-5096)
Previously: (株) would not be removed even if 株 is a stop word.
Now: (株) is removed if 株 is a stop word.
Added a parameter
enableDynamicConfigurationEndpoints
to control the dynamic configuration endpoints in the RNI Elasticsearch plugin. They are disabled by default. Set the parameter totrue
in theparameter_profiles.yaml
file, in theany:
profile to turn on the endpoints. This may slow your system down considerably. (RLPNC-5225)
Release 7.30.1.c62.0
December 2019
New Features
When matching a name to a list of names, RNI now returns the search name and the name from the index that matched. We've added two fields to the
MatchResult
section of the response. (RLPNC-5215)leftName
: the search name from the queryrightName
: the name returned from the index
Bug Fixes
Removed unnecessary call to LanguageOfOriginIdentifier during a name's deserialization. There are no user-visible changes from this fix. (RLPNC-5228)
Release 7.30.0.c62.0
November 2019
New Features
We've added a neural based phonetic matching model to improve Katakana-Latin name matching. To enable the model, set
enableSeq2SeqTokenScorer
totrue
for thejpn_eng
profile. (RLPNC-5148)Address matching can now split apart or join tokens as necessary to determine a match score. For example, previously it would get a lower score trying to match Old Colony Avenue to OldColony Avenue. Now it can recognize they are the same name. (RLPNC-5005)
Improved address matching by adding more address overrides specifying abbreviations from English-speaking countries. For example, with an override, the token crossing in an address will match cross, court will match crt. (RLPNC-5201)
Improved Japanese-English matching for organizations by adding more Japanese organizations to the override list. (RLPNC-5199)
Added the ability to turn off translation of Katakana names. When the parameter
katakanaTransliterationsOnly
is set totrue
, Japanese names written in Katakan will only be transliterated. The parameter is off by default. This will improve the speed of matching Japanese names, but may reduce accuracy. (RLPNC-5194)
Bug Fixes
Fixed a bug where the static final constants ScoredTokenData.EXACT_MATCH and ScoredTokenData.SUB_EXACT_MATCH were created with forceOverride set to true. The fix now allows entries in the override files to force the token score even in the odd case where a token scorer decides that the match is exact. (RLPNC-5209)
Release 7.29.3.c61.0
November 2019
Bug fixes
Fixed a bug where translating names from Arabic to English could cause a crash. (RLPNC-5214)
Release 7.29.2.c61.0
October 2019
New Features
Added a new parameter,
addressUnpairedFieldScore
, which allows you to adjust the score of unpaired fields during address matching. (RLPNC-5196)Added new token overrides to the English-English organization token override file. (RLPNC-5200)
Release 7.29.1.c61.0
September 2019
New Features
Updated support for Mac OS X platform from version 10.7+ to version 10.9+. Documents have been updated to reflect the changes. (RLPNC-5161)
Added new token overrides to the English English token override file and new fullname override to Japanese English fullname override file. (RLPNC-5166)
Release 7.29.0.c61.0
August 2019
New Features
Added a new parameter, numericTokenFrequencyRank which allows you to adjust the weight of numeric tokens in names of different entity types where digits don't get normalized away. By default it is disabled (set to 0).
Added Spanish as a category in our text categorization model of the language of origin for Latin script names.
Added two new parameters, hmmScoreBias which gets applied to the final token score returned by the statistical model and hmmScoreLimit which gets applied after hmmScoreBias, then the result is multiplied by this parameter, and it should be a number between 0 and 1 - in effect limiting the score to be below this value. Also, an existing parameter hmmScorerBias, was renamed as hmmNormBias and moved to rlpnc/data/etc/internal_parameter_defs.yaml.
Added a new parameter, exactLatnMatchScore which controls the score returned for exact token matches in Latin script. It defaults to 1.0 because that's the desired behavior for names in native Latin script languages, but for matching Chinese against Chinese, it's set to 0.937 since it yields better results.
Added two new parameters, kanjiMismatchPenalty and notExactMatchPenalty. kanjiMismatchPenalty allows you to adjust the final score when two names are Japanese names written in Kanji characters that produce identical readings. notExactMatchPenalty allows you adjust the score when two names seem to match exactly except for some difference in normalization, this penalty will be applied to prevent them from scoring an exact 1.0.
Releases 7.28 and earlier
New Features and Bug Fixes
New Features and Bug Fixes in 7.28.1.c61.0
New Features
Added new parameters, allowNullValue and ignoreBadData, in rlpnc/data/etc/parameter_defs.yaml. Both parameters will be used by the RNI elasticsearch plugin and are set to false by default.
New Features and Bug Fixes in 7.28.0.c61.0
New Features
The reorderPenalty parameter now controls an exponentially decaying penalty instead of a linear one, in order to improve matching of longer names. (RLPNC-3661)
Normalized some Katakana "small" characters into their full sized counterparts to improve Japanese name matching and translation. (RLPNC-5028)
Normalized Extension A Chinese characters into their variants to improve Chinese name matching and translation. (RLPNC-5045)
New Features and Bug Fixes in 7.27.1.c60.0
Bug Fixes
Fixed shading of Lucene 7 dependencies.
New Features and Bug Fixes in 7.27.0.c60.0
New Features
Added a frequency language model for LOCATION entity type and retrained the language model for PERSON entity type, improving Hungarian name matching. (RLPNC-5003, RLPNC-5004)
Upgraded RNI to use Lucene 7.6.0. (RLPNC-5000, RLPNC-5020)
Upgraded the RNI Solr plugin to support Solr 7.6.0. (RLPNC-5000)
New Features and Bug Fixes in 7.26.1.c60.0
New Features
Continued support for English address matching. (RLPNC-4352)
New Features and Bug Fixes in 7.26.0.c60.0
New Features
Added English address matching. (RLPNC-4329, RLPNC-4342, RLPNC-4351, RLPNC-4354, RLPNC-4812, RLPNC-4931, RLPNC-4941, RLPNC-4942, RLPNC-4943, RLPNC-4966, RLPNC-4968, RLPNC-4970, RLPNC-4973)
Improved Hungarian name matching, including support for multi-letter initials. (RLPNC-4958, RLPNC-4959, RLPNC-4989)
Continued improvement for English/Japanese and English/Chinese organization name matching. (RLPNC-4937, RLPNC-4938, RLPNC-4939, RLPNC-4940, RLPNC-4952, RLPNC-4962, RLPNC-4971)
New Features and Bug Fixes in 7.25.1.c60.0-solr-7
New Features
Upgraded RNI to use Lucene 7.4. (RLPNC-4892)
Upgraded the RNI Solr plugin to support Solr 7.4. (RLPNC-4893)
New Features and Bug Fixes in 7.25.0.c60.0
New Features
Improved matching of Chinese organization names. (RLPNC-4925)
New Features and Bug Fixes in 7.24.2.c59.3
New Features
Continued support for Hungarian/English and Hungarian/Hungarian name matching. (RLPNC-4906)
New Features and Bug Fixes in 7.24.1.c59.3
New Features
Continued support for Hungarian/English and Hungarian/Hungarian name matching. (RLPNC-4744, RLPNC-4754, RLPNC-4821, RLPNC-4879, RLPNC-4886)
Bug Fixes
Fixed a bug involving incorrect order of setting entity type and language of origin which depends on the former during the process of name building. (RLPNC-4858)
Fixed a bug involving duplicate readings produced when transliterating chinese names. (RLPNC-4796, RLPNC-4838)
Fixed a bug which produced a confidence score of 0 for name translations. (RLPNC-4871)
New Features and Bug Fixes in 7.24.0.c59.3
New Features
Began adding support for Hungarian/English and Hungarian/Hungarian name matching. (RLPNC-4728, RLPNC-4734, RLPNC-4735, RLPNC-4736, RLPNC-4737, RLPNC-4738, RLPNC-4739, RLPNC-4740, RLPNC-4741, RLPNC-4742, RLPNC-4743, RLPNC-4745, RLPNC-4751, RLPNC-4757, RLPNC-4789, RLPNC-4818, RLPNC-4828, RLPNC-4840)
Added a detector that differentiates between Katakana of foreign origin vs. Katakana of Japanese origin, which improves accuracy for Japanese-English name matching. (RLPNC-4811)
Bug Fixes
Fixed the greek stop words file to get better matching. (RLPNC-4826)
Fixed the error which occurred if a stopprefix file was added for a language with no stopregex file. (RLPNC-4761)
Parameter Additions and Changes
Added a new parameter, includeExtraKatakanaPersonReadings, in rlpnc/data/etc/internal_param_defs.yaml. If true, it will include the foreign person name readings even when the language-of-origin is unknown. By default it is set to false. (RLPNC-4841)
New Features and Bug Fixes in 7.23.3.c59.2
New Features
Improved segmentation of Persian, Urdu, Pushto, and Dari names written in Arabic script. (RLPNC-4782)
New Features and Bug Fixes in 7.23.2.c59.2
New Features
No new features other than changing the versioning convention of the artifacts to include Basis' parent pom version. (RLPNC-4721)
New Features and Bug Fixes in 7.23.1
New Features
Added co, corp, inc, corporation, and incorporated to stop word list for organizations in English. (RLPNC-4725)
Bug Fixes
Fixed issue in which the same file was being used multiple times at runtime. (RLPNC-4748)
New Features and Bug Fixes in 7.23.0
New Features
Continued adding support for Greek by reviewing the match score adjustments. (RLPNC-4629)
Added a new parameter (see below) that controls the use of different segmentation schemes for Japanese. (RLPNC-4677)
Bug Fixes
Improved segmentation of Arabic names written in Arabic script. (RLPNC-4700)
Parameter Additions and Changes
Added a new parameter, useOldAndNewNameSegmentationForJapanese, that allows multiple segmentation schemes to be applied to Japanese names. (RLPNC-4677)
Adjusted the finalBias parameter for Greek/Greek and Greek/English name pairs to ensure scores for these matches are in line with those of other languages supported by RNI. (RLPNC-4629)
Adjusted parameters for matching names of Arabic origin to English names. (RLPNC-4700)
New Features and Bug Fixes in 7.22.0
New Features
Added the capability to identify language of origin for Latin-script names. Currently this feature will categorize names as Arabic, Chinese, English, Japanese, and Korean. (RLPNC-4546)
Continued adding support for Greek by setting the default transliteration scheme to ISO-843, adding a Greek/English statistical model, gender identification, name overrides, language model, edit distance scoring, as well as adding support for initials and initialisms. (RLPNC-4622, RLPNC-4676, RLPNC-4628, RLPNC-4626, RLPNC-4651, RLPNC-4630, RLPNC-4631, RLPNC-4689, RLPNC-4674)
Improved the accuracy of language identification for Hani-script person names. (RLPNC-4532)
Bug Fixes
Improved segmentation of long Japanese katakana names. (RLPNC-4680)
Parameter Additions and Changes
Adjusted the reorderPenalty in the jpn_jpn_ORGANIZATION profile to 0.070 to improve performance for this profile. (RLPNC-4659)
Enabled useEditDistanceTokenScorer for Greek/Greek and Greek/English. (RLPNC-4665)
New Features and Bug Fixes in 7.21.1
New Features
Improved the behavior of Western Farsi matching by tokenizing sooner in the process. (RLPNC-4576)
New Features and Bug Fixes in 7.21.0
New Features
Added a new parameter (see below) that allows one to tune a penalty applied to pairwise match scores if the two names involved are of different lengths. (RLPNC-4554)
Added a new parameter (see below) that allows one to tweak the resulting match score of the case in which two identical names with unknown field are matched against each other. (RLPNC-4582)
Improved gemination rules of ISO11940-2 to improve Thai translation accuracy. (RLPNC-4579)
Relocated jar files distributed previously under rlp/lib/BT_BUILD to now be in a more central, platform-neutral location of rlpnc/lib/jvm. (RLPNC-4583)
Began adding support for Greek/English and Greek/Greek name matching. Note: Greek support is currently extremely minimal and will be more full in a future release. (RLPNC-4600, RLPNC-4611, RLPNC-4615, RLPNC-4621)
Bug Fixes
Fixed a bug involving the presence of digits in non-person names. Non-person names that contain digits will now have better match performance. (RLPNC-4590)
Parameter Additions and Changes
Added a new parameter, nameLengthMismatchPenalty, in rlpnc/data/etc/parameter_defs.yaml. The penalty is off (0) by default and only overridden for the zho_zho profile to be 0.55. Lower values will penalize the score less, and higher values will apply a more drastic penalty. As part of the work of adding this parameter, we also adjusted the zho_zho deletionScore from 0.250 to 0.314.
Adjusted the expensiveScorerJoinedTokenLimit in the jpn_jpn_VEHICLE profile to 5 to improve performance for this profile.
Added a new parameter, sameNameUnknownFieldMatchInterpolator, in rlpnc/data/etc/parameter_defs.yaml. This parameter affects a rare case in which two identical names with unknown fields are matched against each other. Usually, RNI would score identical names as 1.0. However, unknown tokens have their own penalty that applies, defined by unknownVsUnknownScore. This new parameter interpolates between the would-be score and 1.0. The default value is 1, which means that 1.0 will be returned in these cases. Turning the parameter to 0 will fall back to the would-be score, and values in-between will interpolate between these two numbers.
New Features and Bug Fixes in 7.20.0
New Features
Added a new Thai/English statistical model for matching to improve Thai/English name match performance. (RLPNC-4429)
Added a new Thai name segmentation dictionary, improving segmentation and match performance. (RLPNC-4421)
Improved Thai transliteration, benefiting translation and match performance. (RLPNC-4547)
Enhanced the Thai stop word list, providing better stop word removal from Thai names during matching. (RLPNC-4461)
Tuned the finalBias value for Thai/Thai and Thai/English name pairs to ensure scores for these matches are in line with those of other languages supported by RNI. (RLPNC-4445)
Greatly improved Arabic/Arabic match performance by adding an edit distance metric. (RLPNC-4508)
Improved match performance for name pairs in which the names are identical when spaces are removed. (RLPNC-4495)
Added a few new entries to English/English token overrides. (RLPNC-4529)
Modified the names of RNI's internal Lucene fields so that they are simpler and standardized. (RLPNC-2506)
Added the ability to disable support for individual languages. See internal_param_defs.yaml in rlpnc/data/etc for more information on how to use this feature. (RLPNC-4558)
Removed support for Solr 5. (RLPNC-4531)
New Features and Bug Fixes in 7.19.0
New Features
Added preview support for Thai in name matching and name translation. (RLPNC-4444, RLPNC-4420, RLPNC-4419, RLPNC-4417, RLPNC-4490, RLPNC-4424, RLPNC-4423, RLPNC-4493, RLPNC-4479, RLPNC-4418)
Upgraded RNI to use Lucene 6.6. (RLPNC-4292)
Upgraded the RNI Solr plugin to support Solr 6.6. Removed support for Solr 4. (RLPNC-4450, RLPNC-4456)
Changed default behavior of Chinese names during Chinese / English matching so that they are assumed to be of Chinese origin unless otherwise specified. (RLPNC-4375)
Improved Russian / English name matching in that Russian names now include multiple translations. (RLPNC-4496)
Greatly improved Chinese / Japanese organization name language detection. (RLPNC-4477)
Improved match performance when engEngFastMode is enabled. (RLPNC-4488)
Improved name matching to account for more substring matches. (RLPNC-4498)
Bug Fixes
Fixed a thread contention issue that could slow down large numbers of threads.
New Features and Bug Fixes in 7.18.0
New Features
Upgraded the native libraries of the native Linux-compatible release of RNI. We are now using CentOS 6 to build these libraries, as CentOS 5 has reached EOL. The new BT_BUILD value for the Linux package is amd64-glibc212-gcc44. (RLPNC-4278)
Added a new config parameter, engEngFastMode which improves speed for English-English matching by turning off HMM and simplifying queries. For more information, check the documentation in internal_param_defs.yaml. (RLPNC-4357)
Included simple serialize and deserialize methods on Name and DateSpec. (RLPNC-4344)
Added two new config parameters, doQueryFuzzy and doQueryPhrase which affect components of RNI's internal Lucene queries. (RLPNC-4312)
Improved the automatic inference of BT_BUILD values. (RLPNC-4368)
Added static and deprecated attributes for config parameters. A "static" parameter is one that always has the value loaded in the default parameter profile; setting a static parameter to a different value in other profiles has no effect whatsoever. A "deprecated" parameter is one that we are proposing to eliminate; binding its value to anything other than the default results in a warning. (RLPNC-4193)
Improved the efficiency of when the HMM is used in the case of English-English name pairs. (RLPNC-2988)
Bug Fixes
Fixed the 'ignoreBadData' flag in RNICLI to function as intended. Also improved exception handling in RNICLI when the data of the input contains languages unsupported by RNI. (RLPNC-4319)
New Features and Bug Fixes in 7.17.1
New Features
Reduced the size of the package through the word embeddings datafiles (RLPNC-4245)
New Features and Bug Fixes in 7.17.0
New Features
Enhanced semantic matching of tokens in Organization names through use of word embeddings.
Note: This drastically increases the size of the SDK package. To reduce the size, the embeddings dictionaries in rlpnc/data/tvec/multilingual can be removed as long as the corresponding language pairs in parameter_profiles.yaml have 'useEmbedded' set to false. (RLPNC-4173, RLPNC-4201, RLPNC-4219, RLPNC-4244)
Added the ability for specific token overrides to always override the score between tokens even if a different method of matching generates a higher score. This can be used to prevent specific token pairs from matching. (RLPNC-3951)
Enhanced the segmentation of Japanese Organization names through decomposing compounds. (RLPNC-2910)
Improved the accuracy of fuzzy phonetic matching between Japanese and English. (RLPNC-2444)
Implemented support for multi-token "token" overrides. (RLPNC-4080)
Added token override dictionary for matching Organization names between Japanese and English. (RLPNC-4172)
Ensured compatibility with RLP version 7.15. (RLPNC-4190)
Upgraded Solr plugin to support Solr 6.2. (RLPNC-4179)
Bug Fixes
Fixed an issue introduced in 7.15.0 that caused a significant slowdown in English-English querying and matching. (RLPNC-4217)
Fixed an issue where NameBuilder.hintLanguage() would return the incorrect value. (RLPNC-4189)
Digits are no longer stripped from non-English Organization names. (RLPNC-4211)
Restored missing Katakana segmentation data. (RLPNC-4210)
New Features and Bug Fixes in 7.16.0
New Features
Added support to RNI for Japanese-Chinese, Japanese-Korean, and Korean-Chinese name matching. (RLPNC-3900, RLPNC-4096, RLPNC-4095)
Added a new query parameter, namesToCheckAllowance, which sets the general proportion of names to pass to the high-precision filter. This is used at query time to determine the number of names to check based on the commonality of the query name in the index, allowing for more efficient querying. As a result, generally a higher setting of maximumNamesToCheck can be used. This involved making a breaking change to the INameIndexFilter interface. This behavior was also added to the Solr plugin with a parameter called reRankDocsAllowance. (RLPNC-4059, RLPNC-4150)
Added a new query parameter, scoreToCheckRestriction, that acts as a more efficient replacement for minimumScoreToCheck and improves query speed. The minimumScoreToCheck parameter has been deprecated. This behavior was also added to the Solr plugin with a parameter called scoreToRerankRestriction. (RLPNC-3665, RLPNC-4166)
Enhanced our Japanese name segmentation logic and expanded our inventory of Japanese segmentation data, improving the accuracy of both Japanese name translation and name matching. (RLPNC-4117, RLPNC-4118, RLPNC-4144, RLPNC-4145)
Enhanced the Lucene query logic in our first-pass filter to improve accuracy on sparse fuzzy queries. (RLPNC-4081)
Improved the speed of some non-English name matching by pruning unlikely translation alternatives. This is controlled by a new config paramater, alternativePairsToCheck(RLPNC-4138)
Added a new config parameter, queryAlternativeOriginLanguages, which controls the set of query languages where transliterations of alternate origins are made part of the query. If matching Chinese, Japanese, and Korean names this can be adjusted to improve accuracy at the cost of speed. (RLPNC-4158)
Changed CachedScorer to accept both completed and uncompleted names. Users should no longer have to complete a Name object. In any case, obtainCompletedName and checkComplete methods have been added to Name and the StandardNameIndex.completeName method has been deprecated. (RLPNC-4149)
Modified the method StandardNameIndex.generateHighRecallKeys to no longer mutate the given Name object. (RLPNC-4147)
Bug Fixes
Fixed an issue that could result in an NullPointerException when matching a name consisting solely of a fullwidth semicolon and likely other rare forms of punctuation. (RLPNC-4134)
Setting maximumNamesToConsider to Unlimited (eg. -99) will no longer result in a write lock exception when querying with multiple threads. (RLPNC-3589)
Special characters like '#' are now normalized out of Arabic names to prevent unwanted effects on match scores. (RLPNC-3961)
Completing a name no longer alters the original language-of-origin. This was occasionally causing Names to have different results when completed multiple times. Instead, Names have a derivedLanguageOfOrigin() method. (RLPNC-4132)
New Features and Bug Fixes in 7.15.1
Bug Fixes
Fixed an issue that was causing the Solr 6 plugin to use the slower query that supports multivalued fields when not necessary. (RLPNC-4116)
Fixed a concurrency bug involving parameter profiles used by the HMM. Also fixed in 7.14.1. (RLPNC-4128)
New Features and Bug Fixes in 7.15.0
New Features
Removed support for Java 1.7. Java 1.8 or higher is required. (RLPNC-4077)
Converted many thrown checked exceptions to runtime exceptions. This includes the removal of two exception classes, RNILicenseException and RNTLicenseException. This will likely require code changes. (RLPNC-3982)
Increased our inventory of segmentation and reading data of Japanese Kanji, increasing the accuracy of both Japanese name translation and name matching. (RLPNC-3985, RLPNC-4032, RLPNC-4084)
Improved RNI's handling of Japanese punctuation to better match how it normalizes English punctuation. (RLPNC-4030)
Added a small inventory of common organization and company words to RNI's Japanese-English overrides. (RLPNC-4064)
Added the ability to specify a reRankFilter in the Solrj utility code. (RLPNC-4057)
The rws-names web services have been deprecated in favor of the Rosette API and the Elasticsearch/Solr plugins. (RLPNC-4076)
Upgraded internally to Lucene 6.0. This may require changes to settings of minimumScoreToCheck. (RLPNC-4077)
Added plugin support for Solr 6.0. Support for Solr 4.x has been deprecated. (RLPNC-4078)
Improved public support and documentation for RNI match parameter tuning. (RLPNC-4068)
Added support for custom language model training for RNI. (RLPNC-4028)
Modified Rosette API RNT service to throw an exception when the language of the input name cannot be translated into the specified target language. (RLPNC-4086)
Bug Fixes
New Features and Bug Fixes in 7.14.1
Bug Fixes
Fixed a concurrency bug involving parameter profiles used by the HMM. (RLPNC-4128)
New Features and Bug Fixes in 7.14.0
New Features
Added plugin support for Solr 5.5. (RLPNC-3929)
Added a single parameter to turn off all case-dependent logic in RNI. (RLPNC-3963)
Improved the error message when attempting to match names in an unsupported language with the java only platform. (RLPNC-3997)
Modified the frequency weighting in fielded names. There is now a minimum total weight for each field regardless of how frequent are the tokens it contains. (RLPNC-3984)
Improved support for macro scripts (Eg. Jpan and Kore) in RuleSetTranslator. (RLPNC-3981)
Added method for determining successful installation and configuration of the RNI plugin in an Elasticsearch installation. (RLPNC-3890)
Removed dependency on btrlp.jar and btutil.jar. (RLPNC-3976)
Added support for user supplied low weight tokens. (RLPNC-3950)
Improved accuracy of RNI's frequency weighting by adding character level language model features. (RLPNC-3946, RLPNC-3969)
Improved RNI support for matching unknown field markers. (RLPNC-3949, RLPNC-3968)
Modified closable objects to implement Java's AutoClosable. (RLPNC-3944)
Added support for specifying the use of different RNI language models in parameter profiles. (RLPNC-3817)
Added the ability to configure multiple parameter universes that can be activated dynamically at match time and when querying via the Elasticsearch plugin. (RLPNC-3814, RLPNC-3932)
Improved the accuracy of our Hani script language guesser, especially for Japanese. (RLPNC-3831)
Added plugin support for Elasticsearch 2.2.1. (RLPNC-3862, RLPNC-3959, RLPNC-4000)
Removed Chinese and Korean readings of Japanese Kanji from consideration for matching if the name's language of origin is specified as Japanese. (RLPNC-3874)
Greatly improved the speed of our first-pass filter by optimizing our Lucene query. This also applied to both our Solr and Elasticsearch plugins. (RLPNC-3805, RLPNC-3875, RLPNC-3931)
Moved RNI's tunable matching parameters to a config file rlpnc/data/etc/parameter_defs.yaml. (RLPNC-3692, RLPNC-3938)
Upgraded to Lucene 5.2.1. (RLPNC-3708)
Improved RNI's ability to add missing spaces by adding some fuzziness. For example, when matching RobertJohnson Smith and Robbert Smith(RLPNC-3826)
Added the ability to specify match parameters to the pairwise match demo via the URL. For example:
http://localhost:9022/rnipm/?name1=Robert%20Smith&name2=Bob%20Smith
(RLPNC-3838, RLPNC-3839)Excluded two letter tokens, like Al and Jo, from consideration for the gender conflict penalty as well as adjusted some heuristics that often cause it to be applied incorrectly. (RLPNC-3825, RLPNC-3808)
Boosted scores of subsumptions, where there are deletions in one name and there are no deletions in the other name and each name has more than one non-deleted token. (RLPNC-3769)
Revised the implementation of Persian IC to match the specification. I.e., eliminated variations from the specification that we had introduced in response to customer requests. (RLPNC-3815)
Bug Fixes
Fixed a bug when supplying RNI a license via the API before performing Chinese-Chinese name matching. (RLPNC-3844)
Fixed an issue where the RNI Elasticsearch plugin was not escaping some special characters (eg. '|', '\'). (RLPNC-3852)
Corrected an issue where stop prefixes would not be loaded if the stop regex file was empty. (RLPNC-3853)
Added the ability for the RNI Elasticsearch plugin to verify a document is of the correct type before attempting to rescore it using RNI. The plugin will also report some potential errors with the rescore name such as unescaped curly braces. (RLPNC-3889, RLPNC-3974)
Fixed Chinese matching to treat whitespace the same as middle dot to prevent further segmentation. (RLPNC-3913)
Fixed an error that could occur when translating certain names from English to Russian. (RLPNC-3927)
Added support for some Japanese ideographs used in company names that were being normalized away. (RLPNC-3943)
New Features and Bug Fixes in 7.13.0
New Features
Added an experimental Rosette Name Indexer Elasticsearch[1] plugin for building fuzzy name retrieval and matching applications for persons, locations, and organizations. This Elasticsearch plugin is distributed in a separate package, is java only, and supports English-English name matching. (RLPNC-3747)
Removed support for Java 1.6. Java 1.7 or higher is required. (RLPNC-3506)
Refactored RNI queries to perform approximately 50% faster. (RLPNC-3740, RLPNC-3714, RLPNC-3700, RLPNC-3699, RLPNC-3660, RLPNC-3596, RLPNC-3490, RLPNC-3734)
Dramatically improved the speed of Korean-English and English-Korean name matching. (RLPNC-3658)
Enhanced RNI matching between names with and without data fields. (RLPNC-3644)
Added a mechanism that uses text files to normalize token variants, ensuring high RNI match scores where variants are involved. Included a file (equivalenceclasses_eng_PERSON.txt) to normalize variant spellings and abbreviations of
Muhammad
. (RLPNC-3798, RLPNC-3802)To improve accuracy of RNI match scoring, revised the use of gender penalties so they do not apply to full name overrides, token overrides, or exact token matches. Also removed entries mismatching gender from the English tokens override file: tokens_eng_eng.txt.(RLPNC-3767, RLPNC-3788, RLPNC-3773)
Added names/nicknames to the English tokens override file: tokens_eng_eng.txt.(RLPNC-3787)
Enhanced RNI support for supplying missing spaces when matching names. This strategy improves the score, for example, when matching JohnFitzgerald Kennedy and John Kennedy. (RLPNC-3781)
Enhanced the RNI token scorer to match similar tokens with minor typographical differences, such as Jong and Jxng. This feature adds a
SpanMatch.Reason: STRING_SIMILARITY
. (RLPNC-3772)Added a digital signature to the Windows MSI for installing the RWS-Names web service. (RLPNC-3641)
Added a .bat file (Windows) for running RNICLI. (RLPNC-3795)
Added a Java sample that illustrates the different match phenomena that RNI supports:
MatchPhenomenaSample.java
.For our Solr 4.9+ plugin, added an
RNIReRankQParser
, which also provides support for setting a minimum score to check and for replacing the Solr document score with the RNI match score. (RLPNC-3650, RLPNC-3651)Added Solr 4.9+ plugin support for processing fielded names. (RLPNC-3592)
Added convenience utilities for RNI solrj users. See Javadoc for
com.basistech.rni.solr.index
. (RLPNC-3615)For efficiency and to avoid problems sometimes encountered with fielded names, moved titles to be stripped from names from the stopregexes file (rlpnc/data/rnm/ref/override/stopregexes_eng_PERSON.txt) to stopprefixes (rlpnc/data/rnm/ref/override/stopprefixes_eng_PERSON.txt). Added support for specifying which field a stopregex should be applied to when handling a fielded name. (RLPNC-3800)
To enhance security when using RWS-Names, upgraded Tomcat to version 8.0.21 (RLPNC-3819)
Bug Fixes
Corrected RNI failure to apply stop patterns and stop word prefixes when matching fielded names. (RLPNC-3766)
Fixed the shell script (rnicli.sh) for running RNICLI. (RLPNC-3792)
Corrected a bug introduced in 7.12. with
NameIndexQuery#setTestPrimaryData(true)
. This method no longer returns names whereisPrimary()
isfalse
for both the queryName
and the candidateName
. (RLPNC-3682)Fixed bug loading binary RNI dictionaries when RNI-RNT is installed in a directory with
rnt
in its name. (RLPNC-3793)Revised the RNI Pairwise Matching Demo to report that it cannot generate a score if the names constitute an unsupported language pair and to correctly handle fielded names with empty fields. (RLPNC-3702, RLPNC-3640)
Fixed a problem launching RWS-Names on Mac OS X with the launch.sh script by ugrading Tanuki to 3.5.26. (RLPNC-3819)
Enhanced the classpath so the command-line scripts (
rnicli
andrntcli
) run without logging warnings. (RLPNC-3804)Adjusted the specification of
com.basistech.rni.solr.NameField
and associated subfields to enable the correct display of this field in the Solr schema browser. (RLPNC-3806)The
bt_rni_Name_Store
subfield was incorrectly appearing in Solr plugin query results. This subdfield is no longer indexed as part of the document and stored, hence no longer appears in query results. (RLPNC-3807)
New Features and Bug Fixes in 7.12.0
This is the first release of RNI-RNT Java Only. It supports indexing, querying, and matching names in English, French, German, Italian, Portuguese, and Spanish. At this time, translation (RNT) and support for Arabic, Western Farsi, Dari, Pushto, Urdu, Korean, Chinese, Japanese, and Russian are supported only in the native releases[2]. Over time, we plan to incorporate support for these features and languages in the Java edition.
New Features
Deprecated support for Java 1.6. Java 1.7 is required for RNI interaction with Solr 4.8 and above. (RLPNC-3534)
Added support for MAC OS X v10.7 Darwin 11 (
amd64-darwin11-xcode4
) and 32-bit and 64-bit Windows for Visual Studio 2012 (ia32-w32-msvc110
,amd64-w64-msvc110
).Removed support for platforms that do not support Java 1.7: MAC OS X v10.5 Darwin 9 (
universal-darwin9-gcc40
) and Red Hat Enterprise Linux (ia32-glibc23-gcc32
,ia32-glibc23-gcc34
,amd64-glibc23-gcc34
). (RLPNC-3601)Added a sample RNI Index, Presidents, which contains the names of the presidents of the United States. (RLPNC-3562)
Added RNI support for normalizing
�
toss
for queries and matches in English. For example,Russland
now matchesRu�land
with a score of 0.99. (RLPNC-3560)Added translators for translating from Korean to English that provide FOLK transliterations that more closely resemble the conventional English spelling of Korean names. (RLPNC-3521)
Changed the RNI default
minimumScoreToCheck
from 0 to 0.05, which considerably improves speed without a significant decrease in accuracy. If you observe an undesirable loss in recall, you can change this threshold back to 0. (RLPNC-3513)Moved some English stop words, such as "general" and "mr", to a non-generic stop words file that only applies to PERSON entities, which improves handling of such non-PERSON names as "General Electric" and "MR electric units". (RLPNC-3508)
RNM token matching adjusted to give lower scores when initialisms fail to match (such as "NCTA" and "NICTA"), thereby reducing the number of false positives. (RLPNC-3495)
Dropped support for Solr 3.x. Support for Solr 4.x applies to all versions of 4.x. (RLPNC-3484)
Added an enhanced Solr plugin that supports Solr 4.9 and above and allows RNI to be integrated seamlessly into Solr applications. Our existing Solr 4.x plugin is deprecated. Replaced RNIinSolr4xSample with RNISolrjSample which uses the new plugin and simple
org.apache.solr.client.solrj
calls to demonstrate how to index and query documents with multiple and multivalued name fields. (RLPNC-3465, RLPNC-3556, RLPNC-3603)Added an OFAC index with persons and organization (and an XML source file for the index) to illustrate the type of index with which the enhanced Solr plugin can be used. (RLPNC-3610)
Changed name of the RNI requestHandler to '/RNI' in solrconfig.xml. Solr 4.x expects a '/' in front of the requestHandler name. (RLPNC-3533)
To facilitate the use of RNI and our Solr plugin in the SolrCloud with a possibly heterogeneous collection of documents, removed the requirement that
bt_rni_NAME_UID
be designated as theuniqueKey
and that theshard.qt
parameter (for the custom request handler) be set to/RNI
. (RLPNC-3534)Revised to RNI Pairwise Matching Demo to provide a more usable explanation of the score that is returned and to support fielded names, with match information for each field. (RLPNC-3459, RLPNC-3590)
Improved role of initialisms in RNI first-pass queries to only match based on initials if both the index name and query name has an initial. (RLPNC-1779)
Configured RNI to better handle Cyrillic initialisms, and Cyrillic initials for Russian names are indexed for the first-pass RNI query, which improves recall. (RLPNC-3325, RLPNC-3442)
Added Russian gender data to improve Russian-Russian and Russian-English name matches. (RLPNC-35450
Enabled users to optionally specify language of origin in the RNT Demo (Interactive). (RLPNC-3427)
Enhanced the RNT Demo to process names with characters from multiple supported scripts. The demo uses a macro-script identifier and translator, such as Jpan for a name that contains Kanji, Hiragana, and Katakana characters. (RLPNC-3590)
RNT now honors whitespace in Japanese input when translating names to English. (RLPNC-3423)
Moved Chinese LOCATION and ORGANIZATION stop words to Chinese to English token overrides, and removed the Chinese LOCATION and ORGANIZATION files. As a result, RNI queries still take advantage of matches, and RNI is not sidetracked when one of the tokens appears in a name (such as Taiwan). (RLPNC-3406)
Expanded use of the Arabic to English LOCATION dictionary to be used by RNTAssistant. (RLPNC-3384)
Added support for reverse transliteration From English / Latin / language of origin Chinese / Chinese Telegraph Code to Chinese / Hani / Native. (RLPNC-2638)
Added the
com.basistech.names.parameters.
package to suport the runtime configuration of RNI parameters. (RLPNC-3177).Added a NameCommonService Restful service with support for guessing the language of a string. This functionality is useful in both RNI and RNT. For convenience,
guesslang
supports JSON and plain-string input: e.g., {"text": "John"} or "John". The previous support for this functionality in NameTranslation is deprecated. (RLPNC-3370, RLPNC-3565))Added support to the Restful Webservice to process XML input and produce XML output. (RLPNC-3470)
Enabled matching of names consisting solely of digits, such
07
in the OFAC_PERSONS RNI index. (RLPNC-3584)
Bug Fixes
Fixed RNT error sometimes stripping the space between name elements when translating from Korean to English. (RLPNC-3542)
Fixed a script guesser failure to correctly classify some Japanese strings mixing Kanji and Katakana as Jpan. (RLPNC-3516).
Characters that don't occur in the training data were removed from the English to Arabic RNT model file, eliminating garbage results. (RLPNC-3448)
Eliminated an RNT infinite loop that was leading to an out of memory exception. (RLPNC-3477)
Corrected a bug in assigning weight for scoring to a name element based on its frequency (the greater the frequency the lower the weight) in the associated language model. (RLPNC-3471)
Fixed a memory leak that occurred when running queries with the Rosette Web Services for Names. (RLPNC-3608)
Fixed a TranslationAssistant bug returning Pinyin from English when Chinese Telegraph Code was specified. (RLPNC-3606)
Fixed a TranslationAssistant error dropping the terminal token representing the definite article (
al
) from names in English of Arabic origin, such asAbedin Zain Ul
. (RLPNC-3485)The
nameDataMinimumMatchScore
setting was incorrectly influencing the number of names submitted for the second pass of an RNI query, which in turn could lead to inconsistent results. This anomaly has been eliminated. (RLPNC-3599)Fixed synchronization of RNI web service when opening indexes during initialization. (RLPNC-3620)
Fixed a
StringIndexOutOfBoundsException
error that occurred when handling a fielded name with an empty field. (RLPNC-3633)
New Features and Bug Fixes in 7.11.1
New Features
Added KORDA transliterations of Korean names to our training data to improve RNI matching of Korean names in Korean to Korean names in English. (RLPNC-3444)
Added support for handling the ᆻ jamo in the Korean IC and KORDA transliteration schemes. (RLPNC-3443)
For improved performance, added prefix stop words that use string literals rather than regular expresions to strip prefixes during name matching. A longer stop pattern or prefix takes precedence over shorter patterns or prefixes that the longer pattern or prefix contains. For example, the
lieutenant colonel
stop word prefix is applied where applicable whencolonel
is also a stop word prefix. (RLPNC-3436)
Bug Fixes
With English input of Muhammed and Language of Origin of Pushto,
TranslationAssistant
incorrectly returned the English output alternative asStatisticallyInferred
. The output altenerative is now correctly reported asHumanlyAttested
. (RLPNC-3464)Fixed a bug in the RNI Solr clients and improved performance by eliminating duplicate translations. (RLPNC-3454)
Fixed errors translating Chinese names including special punctuation or delimiter characters such as "|" and ";" to English. For example can now translate 新|w2ㄙ垂O and 画面;广场. Non-Chinese characters are included unchanged in the translation.(RLPNC-33453 and RLPNC-3445)
Fixed an error constructing the correct NameDomain when running RNI queries in the NameIndex web service. (RLPNC-3446)
New Features and Bug Fixes in 7.11.0
Removed support for integrating RNI into a Solr 1.4 Application. (RLPNC-3232)
Added support for making concurrent calls to the RNI web service querying and/or updating the same RNI index. (RLPNC-3397)
Enabled translation from Chinese to English of name containing a bullet (U+2022) in place of the middle dot (U+00B7). The middle dot is used to separate the words in a non-Chinese name. (RLPNC-3395)
Fixed a crash bug that appeared when attempting to translate a single Korean jamo to English. The jamo is now transliterated as the appropriate Latin script letter(s). (RLPNC-3394)
In addition to dictionary lookups, added support to the
com.basistech.rnt.assistant
interactive API to provide statistically inferred English alternatives for Arabic, Korean, Russian, and Chinese input, as well as statistically inferred Arabic, Korean, and Russian alternatives for English input When a result is statistically inferred, that information is included in the result (DataSourceType.StatisticallyInferred
). (RLPNC-3283, RLPNC-3309, RLPNC-3063)Fixed failure to log the exception when the RNI web service cannot open an index. (RLPNC-3389)
Added a
DefaultTranslationPairs
class to enable translations by specifying the source and target language rather than complete language domains. (RLPNC-3372)Added a
hintLanguage
property to theName
object thatNameBuilder
can use as a suggestion when it guesses the language. In standard usage, the hint will be the language already identified for the document containing the names. If the hinted language is compatible with the script, whichNameBuilder
can also guess,NameBuilder
returns the hinted language, otherwise the language it guessed. This feature is also available through the NameTranslation web service. (RLPNC-3369, RLPNC-3311))The RWS-Names web service was sometimes returning UNKNOWN as the language for a name. Now it always guesses a language if the user did not supply it. (RLPNC-3316)
Make the
LanguageOfOrigin
results returned as part of a translation available through the NameTranslation web serviceResultAnnotations
object. (RLPNC-3352)Translations from Russian to English now include language of origin (Russian or English) for each word in each result. (RLPNC-3352
Optimized the script guessing algorithm, approximately doubling the speed. (RLPNC-3314)
Fixed the Solr plugins to include language of origin when adding names to an index. (RLPNC-3305)
Enhanced RNI scoring of name matches when the names are the same except for reordering of regions of the name (such as Carnera Baer Braddock Louis Charles Walcott and Charles Walcott, Carnera Baer Braddock Louis). (RLPNC-3304)
Normalized l with stroke (U+142) for better RNI handling of Polish names (such as Michał, which now matches Michal). (RLPNC-3303)
Corrected exception thrown when user attempts to transliterate a particular person name (اسامة نين اع بن لادن) from Arabic to English. (RLPNC-3301)
Extended the RWS-Names web interface to support pairwise name matching and added the RNI Pairwise Matcher Demo to illustrate this feature. (RLPNC-3012)
Modified the RNI Demo to enable users to specify language of origin and entity type for a query name. (RLPNC-3295, RLPNC-3104)
Established a static method (
RNIConfiguration.setLicenseXML(String licenseXML)
) for passing in the RNI user license, matching the behavior for RNT (RNTEnvironment.setLicenseXML(String licenseXML)
. (RLPNC-3344)To improve accuracy, trained an RNI Russian language model and a Korean language model for determining the frequency of person name tokens. (RLPNC-3289, RLPNC-3241)
Fixed an error handling the Persian ezafe (-e) during IC transliterations. (RLPNC-3273)
Added ability to extract debug info from each of the
MatchResult
objects returned by a query. InNameIndexQuery
callsetIncludeDebugInfo(true)
, then in eachMatchResult
, callgetDebugInfo()
to return a string that describes how the result was derived. (RLPNC-3266)Added a .bat file (Windows) and shell script (Unix) for running RNICLI from the command line without using Ant. (RLPNC-3203)
Increased accuracy of Korean English matching of native names by including more Korean name data in training. (RLPNC-3341)
Added "corp" to our eng_eng_ORGANIZATION token override file so RNI returns a better score when matching an organization name like Korea National Oil Corporation with Korea National Oil Corp. (RLPNC-3340)
Adjusted the phonetic algorithm that RNI uses for generating search keys for Korean names so that 김 provides a better match for Kim. (RLPNC-3334)
Provided more complete normalization of extended Latin characters, such as đ with stroke (U+0111), to improve accuracy handling names with such characters. (RLPNC-3322)
Fixed support for specifing the Korean script
Kore
, an alias for Hangul + Han. (RLPNC-3318)Fixed an RNI threading issue that was causing crashes during initialization. (RLPNC-3315)
Added support for performing the reverse transliteration of Korean names in English using the BGN scheme. (RLPNC-3281)
Enhanced
ISO15924Utils.scriptForString
to know to return a macro-script (Hrkt or Jpan) for mixed script Japanese strings. Hrkt is a mixture of Katakana (Kana) and Hiragana (Hira) (e.g., トイザらス). Jpan is a mixture of Kana, Hira, and Kanji (Hani) (e.g., トヨタ自動車株式会社). (RLPNC-3279)Fixed the accuracy of RNI span matches when matching a hyphenated name in English against a name in Korean. (RLPNC-3277)
NameBuilder
now uses an Unchecked Exception in place ofInvalidNameException
, so users do not have to use a try/catch every time they create a name. (RLPNC-3276)Added
guessLanguage
andguessScript
utility methods toNameBuilder
, so users can determine language and script without having to create theName
object. (RLPNC-3275)Added an English to Russian (Latn, eng, folk to Cyrl, rus, native) translator. (RLPNC-3269)
Added an RNT option to specify which transliteration scheme should be used for BGN when transliterating Korean. If
com.basistech.rnt.options.KorGeographyOption
is set toNORTHKOREAN
(the default), RNT uses McCune-Reischauer . If set toSOUTHKOREAN
, RNT uses Revised Romanization of Korean. (RLPNC-3261)Added support for transliterating Korean into Undiacritized BGN. (RLPNC-3253)
Improved ability of RNI to match truncated names. (RLPNC-3200)
The RNT Japanese translator now use Korean segmentation when given Hani name with Korean language of origin. (RLPNC-3173)
Extended the use of the RNI simililarty score of 1.0, meaning exact match, to indicate the strings are equal, the languages of use and origin match, and the entity types match. (RLPNC-2636)
System.(err|out).print and printStackTrace() statements in RNI and RNT have been replaced with slf4j (logging) calls. (RLPNC-2688)
New Features and Bug Fixes in 7.10.0
The
com.basistech.rni.match.Name
constructors are deprecated. Use thecom.basistech.rni.match.NameBuilder Build()
method to construct and return a name object with all value set via fluent methods. (RLPNC-3234)Removed support for Java 1.5. Java 1.6 or higher is required. (RLPNC-2489)
All translation domain pairs now include language of origin. (RLPNC-3078, RLPNC-2939)
RNI name matching for Chinese to Chinese now includes variants for both simplified and traditional Chinese. (RLPNC-3092)
Added support for translating Korean and English names from English (folk) to Korean. (RLPNC-3211)
When RNI is integrated into a Solr application, RNI will guess language and script if you do not include those settings. (RLPNC-2523)
To improve efficiency, all Name fields in an RNI document are serialized in a JSON object and stored in a single field. (RLPNC-3225)
Added a boolean flag to
com.basistech.rni.index.IndexStoreDataModelFlags
:usingCachingCodec
. Whentrue
, (the defaults tofalse
) this flag instructs Lucene to use our CachingCodec which at read time loads and stores documents in RAM. Warning: This is exceptionally RAM intensive, but may significantly improve performance. (RLPNC-3218)RNI caches query data to enhance query performance. (RLPNC-3223)
Added Windows releases for Microsoft Visual Studio 10. (RLPNC-3163)
New Features and Bug Fixes in 7.9.1
Removed unrequired JAR files. (RLPNC-3185)
Fixed a bug that caused an
ArrayIndexOutOfBoundsException
when indexing names with greater than 6 RNT overrides. (RLPNC-3164)Added support to the RNT web demo for specifying the language of origin. (RLPNC-3103)
Fixed a slowdown in English queries against English names that was introduced in the 7.9.0 release. (RLPNC-3141)
Fixed a slowdown caused by a significant increase in calls for supported translation pairs that was introduced in the 7.9.0 release. (RLPNC-3176)
New Features and Bug Fixes in 7.9.0
Expanded RNI query results to indicate spans (one or more tokens) in the query name and result name that match or do not match. (RLPNC-2878)
Added a RESTful interface to the Rosette Web Services for Names (RWS-Names). The SOAP interface is still in place. The RNI and RNT Web Demos use the RESTful interface. (RLPNC-2184)
Extended RNI Solr plugin to support Solr 4.3. (RLPNC-4.3)
Added support for translating names from English to Chinese. (RLPNC-2174)
Expanded the scope of the normalization translation option to convert Chinese names in traditional script to the simplified Chinese script, and to convert Japanese Kanji variants (including old Kanji) to their standard form. (RLPNC-2914, 2846)
Updated implementation of IC transliteration scheme for Chinese to conform to May 2013 deliberative draft of the IC Chinese Standardized Transliteration System for Personal Names. (RLPNC-3068)
Improved the accuracy of Western Farsi, Dari, and Pushto translations. (RLPNC-2972, 2973, 2976, 3001, 3002)
Added support for transforming BGN to Undiacritized BGN for Arabic, Western Farsi, Dari, and Pashto. (RLPNC-2980)
Added support for translating non-Chinese person names in the Chinese language and to their traditional English representation. (RLPNC-1451)
Added support for translating non-Korean person names in the Korean language and to their traditional English representation. (RLPNC-2979)
Added support for transliterating Korean person names in accordance with the IC standared. (RLPNC-2720)
Added programmatic access to RNI and RNT overrides, which lets you define your own override tables (character streams) in place of the tables in the default override directories. (RLPNC-2689)
In the API documentation, see the following methods in
com.basistech.rni.index
:RNIConfiguration.replaceFullnameScoreOverrideConfiguration
RNIConfiguration.replaceTokenScoreOverrideConfiguration
RNIConfiguration.replaceStopPatternsConfiguration
and the following method in
com.basistech.rnt
:DictionaryService.replaceConfiguration
Deprecated the use of translation domains that combine Latn with a non-Latn-script language, such as Arabic or Chinese. In an upcoming release, language of origin will be used to clarify the nature of the translation. For example, a translation from Arabic, Arabic script, native transliteration to English, Latin script, IC, will be a translation, not a transliteration, if the language of origin is English. (RLPNC-2954)
Extended supported domain pairs to include language of origin for Arabic and Chinese. (RLPNC-3067)
Changed Rosette Web Services For Names default port to 9022. (RLPNC-3048)
Extended RNI Web demo to support language guessing. (RLPNC-3042)
Fixed RNT problem handing some Japanese characters. (RLPNC-3052)
Fixed Null Pointer Exception that occurred with some Solr RNI queries. (RLPNC-3015)
Deprecated
com.basistech.rni.match.setMaximumNumTokens(int maxNumTokens)
. (RLPNC-2788)
New Features and Bug Fixes in 7.8.0
Improved RNI accuracy matching Japanese names, including native names, names of Chinese or Korean origin, and other non-native names with English translations. These improvements in accuracy do entail a slowdown in RNI operations with Japanese. (RLPNC-2606, RLPNC-2608, RLPNC-2422)
Extended RNI support for matching Japanese name variations such as nicknames and cognates (Katakana), and reordered name components. (RLPNC-2088)
Added RNI sample index with Japanese names (professional baseball players) in Japanese scripts and sample queries in Latin script.
Improved the accuracy of Pushto translations using the IC standard. (RLPNC-2800)
Fixed bug transliterating Arabic heh (/U0647) in Pushto names. (RLPNC-2783)
Refactored enforcement of constraints on setting maximum number of names to consider, to check, and to return for RNI queries, thereby enabling support for setting these constraints when using RNI with Solr. (RLPNC-2828)
New Features and Bug Fixes in 7.7.0
Numbers are no longer stripped from entity types other than PERSON. (RLPNC-2185)
Override files for token matches may include a third item for each entry which provides RNI with additional context for handling the override: NICKNAME or COGNATE. If no value is included, the default is NICKNAME. (RLPNC-2050)
Having determined that some classes and methods in
internal
packages should be available to users, and that some publicly available APIs return a class in aninternal
package, we have refactored the API as indicated in the following table. We have also refactored the API for handling a collection of names. (RLPNC-1768)New API
Deprecated API
Usage
com.basistech.rni.index. RNILicenseException
com.basistech.rni.index.internal. RNILicenseException
Thrown for missing or invalid RNI License
com.basistech.rni.index. SupportedDomainPairs
com.basistech.rni.index.internal. SupportedDomainPairs
Provides a map of supported text domain pairs.
com.basistech.rni.match.Name com.basistech.rni.match.MatchScorer. prepNameForCachedScorer(Name n)
com.basistech.rni.match.internal.translate. CommonNameTranslators.translate(Name n)
Provides a
Name
with the the meta-information required to instantiate and return scores with theMatchScorer.CachedScorer
.None
com.basistech.rni.index.internal. INameIndexStore com.basistech.rni.index. StandardNameIndex.getStore()
For internal use only.
None
com.basistech.rni.index.internal. NameIndexFilterComparator com.basistech.rni.index. INameIndexFilter.getComparator(NameIndexQuery query)
For internal use only. It will be removed from the
INameIndexFilter
interface and either removed or made non-public in the classes that implement the interface.To improve performance using linguistic analysis to return high-precision results from a high-recall list of names that match a query name, deprecated the static
com.basistech.rni.index.StandardNameIndex queryList
method in favor of a non-staticcom.basistech.rni.index.StandardNameIndex filterCollection
method. (RLPNC-2560)Per the IC specification for transliterating Person names from Pushto, we provide special handling when the language of origin is Dari. We also provide variant spelling and regional options to control the transliteration. (RLPNC-2652, RLPNC-2677, RLPNC-2448)
Added support for defining a set of text domains that filter results returned by an RNI query. For example, if the RNI index contains names in English, Western Farsi, Arabic, and Pushto, you can set text domains to only return names in English/Latin script, and Western Farsi/Arabic script. In the HTML API documentation, see
com.basistech.rni.index.NameIndexQuery.setTargetNameDomains(Set<TextDomain>)
andcom.basistech.rni.index.NameIndexQuery.testTargetNameDomains(boolean)
. (RLPNC-2718)Removed support for the JDEC-Afghanistan Pushto transliteration scheme. (RLPNC-2734)
Support and an associated sample have been added for running RNI with Solr 4x. (RLPNC-2742)
When the Segmentation Option is turned off for Japanese, RNT now assumes the names it is processing have been segmented by the user (tokens are space delimited). Prior to this fix, RNT treated the entire name as a single token, which produced incomplete results for names with space delimited tokens. (RLPNC-2750)
Verified RLPNC support for Redhat 6.0. For 32-bit platforms, use the
ia32-glibc23-gcc34
package; for 64-bit platforms, use theamd64-glibc23-gcc34
package. (RLP-3649)Improved support for vocalizing Pushto and Western Farsi names. (RLPNC-2512, RLPNC-2232)
Improved handling of the ezafe in Western Farsi, Dari, Pushto (izafat), and Urdu names. (RLPNC-343)
Added RNI support for indexing and querying names identified as Persian (
fas
) or Dari (prs
). Persian is the metalanguage that includes both Western Farsi (pes
) and Dari. Queries for Persian names may return names indexed as Persian or Western Farsi. (RLPNC-2711)Added override file for translation of LOCATION names from Russian to English. The package now contains override files for translating LOCATION names from Arabic, Japanese, and Russian to English. (RLPNC-2176)
New Features and Bug Fixes in 7.6.1
Fixed a bug that prevented Solr 1.4 users from setting the
isPrimary
attribute for a Name. (RLPNC-2667)Fixed the inclusion of incorrect results from name pair (fullname) override files during the high-recall phase of queries. (RLPNC-2693)
Fixed a NullPointerException or the inclusion of results of the incorrect entity type from name pair (fullname) override files. (RLPNC-2676)
New Features and Bug Fixes in 7.6.0
Starting with this release, the RLPNC release number matches the release number of the RLP with which it should be installed (the third number may vary).
Added TranslationAssistant and NameIndex web services. For Windows users, added .NET clients for each of the Rosette Web Services for Names. (RLPNC-2243, RLPNC-2272, RLPNC-2233)
To clarify translation support for Western Farsi and Dari (both members of the Persian macro-language), we have replaced the Persian ISO 639 language code ("fas") with the Western Farsi language code ("pes"). For BGN, which does not distinguish between Western Farsi and Dari, we support the use of all three language codes ("fas", "pes", and "prs") for Dari. (RLPNC-2505)
Improved the vocalization maps for the translation of Western Farsi and Dari names. (RLPNC-2257)
Enforce requirement that the maximum number of names considered in the first pass must be greater than or equal to the maximum number of names evaluated in the second pass and the maximum number of names returned by the query. The code no longer resets these settings under the covers if the user makes a setting that violates this constraint. (RLPNC-2160)
Changed the default setting for testing entity type during queries from false to true. As a result, a query only returns names that match the entity type of the query name (such as PERSON, LOCATION, ORGANIZATION, VEHICLE, or NONE). (RLPNC-2149)
Upgraded to Lucene 3.6. (RLPNC-2470)
Extended support for using RNI with Solr 3.x and Solr 4.x. Added an RNI sample that runs with Solr 3.x, and provided instructions in the documentation for using RNI with the Solr 3.5 Admin Example to post Name documents and perform queries. (RLPNC-2462)
For queries performed with Lucene or Solr 3.x, adopted
DisjunctiveMaxQuery
to improve first-pass scores with names for which multiple alternatives (such as nicknames) are defined in token override files. For a givenminimumScoreToCheck
, this strategy provides more accurate recall for names submitted to the second pass. (RLPNC-2473)Added support for indexing and querying Spanish names. (RLPNC-2478)
Restructured
HighRecallKeys
. TheStandardNameIndex generateHighRecallKeys
method is no longer available andHighRecallKeys
has been refactored. Removed theGenerateAndUseHighRecallKeys
Java sample and theRNICLI -generate-keys
option. Our working assumpation is that RNI providesHighRecallKeys
to assist in Name queries using Lucene or Solr. The underlying structure is in a state of evolution. If you want to use some other infrastructure to store Names and perform queries, please discuss this issue with us. (RLPNC-2499)Added
com.basistech.rni.match.NameBuilder
as the preferred mechanism for creating Name objects.NameBuilder
provides a fluent interface that supports method chaining. (RLPNC-2515)Added language of origin as a Name field. The default value is
LanguageCode.UNKNOWN
. RNT uses this value when translating foreign names from Arabic, Japanese, and Russian. (RLPNC-1717)Added an RNT translation option (
MinimizeOrthographyOption
) to remove short vowel diacritics from Arabic, Western Farsi, Dari, Pushto, and Urdu names in Arabic script. The default setting for this option is false. (RLPNC-2320)Removed RNT translators for Dari and the JDEC-Afghanistan transliteration scheme. (RLPNC-2558)
Extended support for applying the IC transliteration standard for Pushto to include the special rules and special cases defined in the Pashto Standardized Transliteration System for Personal Names, 01 June 2011. (RLPNC-2565)
Adjusted
MatchScorer
settings for each language to establish a Precision/Recall crossover near the 0.55 threshold for all language pairs. The crossover may still vary, depending on the input data. If you are interested in customizing these settings, please contact support@rosette.com. (RLPNC-2555)RLPNC support for Java 1.5 is deprecated. RLPNC users of Java 1.5 should move to Java 1.6. In this release, Java 1.6 is required to use The Rosette Web Services for Names, and to use RNI with Solr 3.x or 4.x. (RLPNC-2486)
Bug Fixes in 4.3.1
Fixed a runtime error that occurred when translating Japanese Katakana names. (RLPNC-2435)
Added missing stop pattern and token override files to improve accuracy of Japanese RNI queries. (RLPNC-2477)
New Features and Bug Fixes in 4.3.0
Upgraded the keys stored with Names. Accordingly you must recreate any existing RNI indexes that you intend to use with this release.
Enhanced support for translating Japanese Kanji names and foreign names in Katakana.
Users can now compile and run the Solr sample (
RNIinSolr14Sample)
without providing the path to an Apache-Solr-1.4 distribution. The required .jar files have been placed in the samples lib directory. (RLPNC-2113)If an override file for RNT translations includes source names with multiple target names, and the file does not include confidence scores, RNT sets the confidence score for each translation to 1 divided by the number of translations for that name. (RLPNC-2122)
To discourage performance degradation processing oversized names (probably bad data), the Name object issues a warning if you exceed 10 tokens for the data in a Name. You can use the static
Name.setMaximumNumTokens(int maxTokens)
method to change or eliminate the limit.RNICLI
andRNTCLI
now include a-maxTokens
parameter. (RLPNC-2124, RLPNC-2204)Extended support for stop regular-expression patterns to apply to all supported languages, for fullname overrides to apply to all supported text domain pairs, and for token overrides to apply to English, Japanese, Chinese, and Russian. (RLPNC-1922, RLPNC-2152, RLPNC-2085)
Enabled multi-threaded RNI update operations. Multiple threads may share an
INameIndexSession
object. The write operation for each update is handled in a single thread, but other portions of an RNI update, such as the name completion required for adding a name, are multi-threadable. (RLPNC-1957)To run the RNI and RNT command-line interfaces or examples, you no longer need to set the
LD_LIBRARY_PATH
orDYLD_LIBRARY_PATH
environment variable on Linux, Solaris, and Mac OS X platforms. (RLPNC-2158)Incorporated the NameTranslation web service into the RNI-RNT SDK. Other Rosette Web Services for Names will be added in future releases. (RLPNC-1577)
New Features and Bug Fixes in 4.2.0
This release requires RLP 7.4.
Documented RNI support for running in a Solr application. (RLPNC-1918)
Added gender as a consideration for matching English names. (RLPNC-1851)
RLPNC is no longer built for the following platforms:
sparc-solaris9-cc58
andsparc-solaris9-cc58-64
. (RLPNC-1950)Added support for translating Japanese Kanji names to English and for segmenting Kanji names. (RLPNC-1947)
Added preliminary support for indexing, querying, and matching names in Russian and Japanese Kanji. (RLPNC-2003)
Refined support for English LOCATION, ORGANIZATION, and VEHICLE entity types. Accordingly, the results when processing these types may differ from the results processing PERSON entities or Name objects with no entity. For most accurate results, specify the entity type when defining a Name. (RLPNC-1941)
Added
-maxToCheck
parameter to RNICLI to specify how many potential candidates the query should check with its high-precision linguistic filter. Use this parameter to adjust the speed/accuracy tradeoff. Also deprecated the-top
parameter for specifying the maximum number of results the query should return. Use-max
. (RLPNC-2032)Optimized handling of duplicate candidate names in the RNI index when running the
com.basistech.rni.index.StandardNameIndexFilter query()
method. (RLPNC-1656)Added support for defining Name objects with data fields for indexing and querying. With this facility, you can potentially enhance the accuracy of queries with English, Japanese, and Chinese PERSON names. Scores are higher when a field in the query is similar to the same field (as determined by the order in which the fields appear) in a candidate index name. Fields have no explicit semantic definition (such as family name or given name). When translating a name with fields, RNT handles the name as a single string with a space between each field. (RLPNC-1864)
Modifed the samples ant script to support building and running the Solr Connector sample, provided you have access to Solr 1.4. (RLPNC-2068)
To improve error handling with interactive translations, RNT now uses
com.basistech.rnt.assistant.InitialInput
to throw acom.basistech.rnt.UnsupportedBasicTranslatorException
if the input or output domain is not supported, or acom.basistech.rnt.InvalidNameContentException
if the input string is empty or not in the correct input script. (RLPNC-1955)In response to customer feedback, modified the content of the stop patterns file and token override file for English. (RLPNC-2075)
Bug Fix in 4.1.1
Fixed RNI entity-type-specific overrides to work as documented. (RLPNC-1935)
New Features and Bug Fixes in 4.1.0
Refined the Dari transliteration maps for vocalization to be in line with and as complete as the Pushto transliteration maps. (RLPNC-1774)
In
StandardNameIndex
deprecatedgenerateHighRecallKeys
in favor of two new methods:generateHighRecallIndexKeys
andgenerateHighRecallQueryKeys
. (RLPNC-1595)Improved accuracy of RNI queries as measured with test data. (RLPNC-1766)
Fixed bugs and improved the accuracy of handling of variable segmentation in RNI's English-to-English matching. (RLPNC-1692)
Added support for designating entity-type-specific overrides for RNI stop patterns, name pair matches, and token pair matches. (RLPNC-1780)
Optimized handling of duplicate names returned by an RNI query. (RLPNC-1656)
Extended out-of-the-box token overrides file for English-to-English matching with cognate or "cousin" name pairs, such as Pierre and Pedro. (RLPNC-1681)
New Features and Bug Fixes in 4.0.2
Fixed error generating Arabic sun letters. (RLPNC-1715)
Improved performance of RNTCLI and fixed a bug in its invocation from the Ant build.xml script. (RLPNC-1711)
Revised implementation of the IC standard for Western Farsi transliteration as detailed in the footnote in the Appendix: Supported Translation Domains.
New Features and Bug Fixes in 4.0.1
Enabled statistical inference for adding diacritization to Dari, Pushto, and Urdu native names not found in the corresponding dictionary. (RLPNC-1622)
Introduced caching of RNI name scores to enhance performance when querying English names against large English language name sets. (RLPNC-1677)
Enabled RNT to return multiple choices for multiple-token input when translating Arabic names. (RLPNC-1650)
Enabled RNTCLI to begin writing output while processing is still in progress. (RLPNC-1648)
Fixed a race condition that sometimes occurred in RNICLI or RNTCLI when processing with multiple threads. (RLPNC-1671)
Fixed an RNT bug handling the complete range of Arabic characters and processing names in Arabic script that end with a Latin character. (RLPNC-1673)
Fixed errors and a potential crash in the IC transliteration of Western Farsi. (RLPNC-1665, RLPNC-1657, RLPNC-1658)
Fixed memory leaks in Arabic name normalization, in the translation of Russian names, and in the procedure for inferring language when not specified by the user. (RLPNC-1661, RLPNC-1663, RLPNC-1660)
New Features and Bug Fixes in 4.0.0
Added constructor for creating a
Name
object with just a String argument (script and language are inferred). You can add theName
to an index, use it in a query, and use it in name matching. Accordingly, RNICLI can now processName
objects that are constructed solely from a String. To translate theName
, you must still supply a target transliteration scheme. (RLPNC-887)Added statistical support for translating foreign (non-Russian) names from Russian to English. (RLPNC-1558)_
Improved recall in RNI first pass key search to do a better job of presenting all potential similarity matches to the linguistics filter. (RLPNC-1459)
Modified the RNICLI utility to enable the addition of names to an existing index, as well as the creation of (and optionally adding names to) an index. (RLPNC-1574)
Enhanced ability to return a reasonable translation to English, rather than no translation, for unknown foreign names, provided the user supplies a sufficiently low translation threshold. (RLPNC-1581)
Enabled statistical inference for adding diacritization to Persian, Pushto, and Urdu native names not found in the Persian, Pushto, or Urdu dictionary. (RLPNC-1597, RLPNC-1622)
Revised the implementation of IC transliteration for Persian to conform to the IC standard for those languages. (RLPNC-64)
Enabled normalization of Arabic native names. (RLPNC-1412)
Renamed the Rosette Name Indexer packages:
com.basistech.rni
contains theRNICLI
command-line interface. The Rosette Name Indexer API, including the utility for loading the contents of an XML gazetteer into an RNI index, is incom.basistech.rni.index
and the name matching API is incom.basistech.rni.match
. As a result, thecom.basistech.rnm
packages no longer exist. (RLPNC-37)For consistency and clarity, replaced "lookup" with "query" and "NameLookupKey" with "PhoneticHighRecallKey" in the RNI API
com.basistech.rnm.index
(RLPNC-1538):Old
New
Comment
INameIndex.lookup
INameIndex.query
The former is deprecated
NameIndexLookupResult
NameIndexQueryResult
The former is deprecated
StandardNameIndex.lookupInList
StandardNameIndex.queryList
The former is deprecated
StandardNameIndex.generateNameLookupKey
generatePhoneticHighRecallKey
Now a
static
methodRemoved the
com.basistech.rlp.pipeline.name
package and the associated sample (NamePipelineSample
). (RLPNC-1522)Deleted
com.basistech.rnt.SimpleTranslatable
. Usecom.basistech.rni.match.Name
as the implementation of thecom.basistech.rnt.ITranslatable
interface. (RLPNC-14927)
New Features and Bug Fixes in 3.4.0
Removed support for the C++ API for name translation. The corresponding sample application and API documentation have also been removed. This change results in a public API that is entirely in Java. (RLPNC-1409)
Added support for using regular expressions in a "stop-words" file to exclude specified strings from indexing and queries. (RLPNC-1486)_
Added support for using fullname files to specify the similarity scores to assign to specified name pairs. (RLPNC-1455)
Added support for using token files with pairs of name elements. When RNI evaluates two names, each of which contains a token from a pair in the tokens file, it enhances the similarity score for the two names. (RLPNC-1457)
Improved Arabic to Arabic name matching. (RLPNC-1447)_
Refined the matching algorithm to guarantee that matches for a given pair of names are commutative. For two names (
a
andb
), the similarity score is identical, whethera
is in the index andb
is in the query, orb
is in the index anda
is in the query. (RLPNC-1446)Improved the handling of variable segmentation in English to English PERSON name matching, such that the match between two names that differ only by the presence or absence of a space (such as Van Dick and VanDick) receives a higher score than in prior releases. (RLPNC-1430)
Added support for handling titles and initials in Arabic to English and English to Arabic translations of PERSON names. (RLPNC-68, 92)
Added support for standardizing names of Arabic origin in Latin script, according to the transliteration standard specified in the target domain. If the name is not of Arabic origin, it is unchanged. (RLPNC-1370)
To better incorporate the interactive translation assistant into RNT, the
com.basistech.xa
package has been deprecated in favor of a new package:com.basistech.rnt.assitant
. TheTranslationAssistant
class in this package replacesTransliterationAssistant
in the deprecated package. (RLPNC-1377)Deprecation of some empty constructors, associated set methods, and name normalization without a
Name
object. (RLPNC-1497, RLPNC-1476)Class
Deprecated
In Favor of
com.basistech.rnm.Name
the empty constructor
Name(String data, LanguageCode language, ISO15924 script)
com.basistech.rnm.Transliteration
the empty constructor,
setSchema
,setScript
, andsetTransliteration
Transliteration(TransliterationScheme scheme, ISO15924 script, String transliteration)
com.basistech.rnm.index.NameIndexQuery
the empty constructor
NameIndexQuery(Name qname)
com.basistech.rnm.index.NameStringNormalizer
static String normalize(String s)
static String normalize(Name n)
In
com.basistech.rnm.Name
, the empty constructor is deprecated in favor ofpublic Name(String data, LanguageCode language, ISO15924 script).
In
com.basistech.rnm.Transliteration
, the empty constructor,setScheme
,setScript
, andsetTransliteration
are deprecated in favor ofpublic Transliteration(TransliterationScheme scheme, ISO15924 script, String transliteration)
.In
com.basistech.rnm.index.NameIndexQuery
, the empty constructor is deprecated in favor ofpublic NameIndexQuery(Name qname)
In
com.basistech.rnm.index.NameStringNormalizer
,normalize
is deprecated in favor ofpublic static String normalize(Name n)
.Plugged a memory leak that sometimes occurred processing Pushto or Dari names. (RLPNC-1460)
All RNI indexes should be rebuilt using this release.
New Features in 3.3.1
Modified the RNICLI utility to take a file pathname as
-name
argument when performing thegenerate-keys
option. Each line in the UTF-8 file contains a name for which RNICLI generates keys.
New Features in 3.3.0
Added support for generating and using high recall keys while storing names in something other than an RNI index, such as a relational SQL database. The
GenerateAndUseKeys.java
sample illustrates this process using HashMaps to store keys and names.
New Features in 3.2.1
TranslationAssistant, data transfer objects are now serializable and
com.basistech.xa.TranslationAssistant
includes a newselect
method to accommodate the processing of serialized data:public Output select(int alternativeIndex, InitialInput initial, Segmentation segmentation)
Renamed a few methods on
com.basistech.xa.Segmentation
(with deprecation).Plugged some holes in the Urdu transliteration schemes. (RLPNC-1359)
Fixed a bug with RNT's command line interface. (RLPNC-1360)
New Features in 3.2.0
Upgraded from Apache Lucene 2.1.0 to 2.9.1. Accordingly, you must recreate any exiting RNI indexes that you intend to use with this release.
Added RNI support for local and distributed transactions. If you use the
INameIndex
interface to perform updates and queries, each operation is automatically executed in its own transaction. To explicitly control local transactions, use theINameIndexSession
interface; for distributed transactions, use theINameIndexTransaction
interface. As a consequence of this change, the APIs for setting and getting batch mode, and flushing updates to an index have been removed, and theINameIndez open()
method no longer includes a boolean argument specifying whether the index is to be opened for updates.Class/Interface
Old
Current
com.basistech.rnm.index.StandardNameIndex
static INameIndex open(String indexPath, boolean openForWrite)
static INameIndex open(String indexPath)
com.basistech.rnm.index.INameIndex
void setBatchMode(boolean batchMode)
Removed
boolean isBatchMode()
Removed
void flush()
Removed
INameIndexSession openSession()
com.basistech.rnm.index.INameIndexSession
void addName(Name name)
Iterable<NameIndexLookupResult> lookup(NameIndexQuery query)
void deleteName(String key
void addEntity(Entity entity)
Entity retrieveEntity(String entityUID)
void deleteEntity(String entityUID)
void optimize()
void commit()
void rollback()
INameIndexTransaction startTransaction()
void endTransaction()
com.basistech.rnm.index.INameIndexTransaction
void commit()
void prepare()
void rollback()
Added two Java samples.
AddNamesSample
illustrates the use of a transaction to add a number of names to an RNI index.DistributedTransactionSample
illustrates a distributed transaction with a two-phase commit involving two RNI indexes.Removed support for the C++ API for name matching. The corresponding sample application and API documentation have also been removed.
Removed the
com.basistech.rnm.index.adv
package from the accessible API. The public API for RNI indexes is in thecom.basistech.rnm.index
packageExtended the TranslationAssistant (RNT interactive mode) Java API to handle Dari and Pushto names in addition to Arabic names. Refactored the API to simplify access to TranslationAssistant functionality. The infrastructure is now in place to support variable segmentation of the input string, but for this release, TranslationAssistant does not yet support overlapping segments.
Added support for retrieving a group of names (even all names in an RNI index) that share some common characteristic other than name similarity.
Changed the means of specifying multi-threaded behavior for the RNI command-line interface.
Fixed a thread-safety problem performing RNT translations in multiple threads. (RLPNC-2551)
Fixed the occasional omission of leading portion of a Persian, Pushto, or Urdu name in Arabic script during RNT translation. (RLPNC-295)
Significantly sped up the operation of creating or opening an RNI index. (RLPNC-1288)
Fixed a failure in some cases to generate output with long foreign names during RNT translation from Arabic to English. (RLPNC-1297)
New Features in 3.1.0
Significant speed improvements in RNI queries. (RLPNC-1081)
Greater separation between scores for good and bad matches when matching names in Chinese Han characters to names in Latin script. (RLPNC-871)
Updated the dictionary of names that appear in English. (RLPNC-113)
RNI queries now return name data results with scores greater than or equal to the value set with
setNameDataMinimumMatchScore
. Prior to this adjustment results had to be greater than the minimum match score to be returned. The minimum match score must be greater than 0.The RNI command-line interface includes an optional parameter for minimum match score and an optional parameter for maximum number of names to return.
The RNI and RNT command-line interfaces support the concurrent processing of multiple files in separate threads.
Added support for segmenting (determining the boundary between) Korean surnames and given names. (RLPNC-338-, RLPNC-1069)
Updated documentation to use ISO639-3 three-letter language codes, rather than ISO639-1 two-letter language codes. The ISO639-3 codes enable finer language distinctions, such as between Western Farsi ('pes') and Dari ('prs'), both members of the Persian macrolanguage, for which the ISO639-3 code is 'fas' and the ISO639-1 code is 'fa'.
Fixed the capitalization of Latin-script translations of Japanese names. (RLPNC-1217)
Added preliminary RNI and name-matching support for excluding titles from names during English-language queries and name matches.
New Featues in 3.0.1
Activated support for handling nicknames in RNI queries and name matches.
Tuned support for handling initials in RNI queries and name matches
Added support for the Windows 64-bit platform and the Linux IA32 glibc23 gcc40 platform.
New Features and Bug Fixes in 3.0.0
Expanded support for performing RNI queries and name matches with names in the English language. In addition to handling matches that involve phonetic/orthographic differences and missing name components, queries and name matches are designed to handle names with initials, nicknames, and word order variations.
Added preliminary RNI and name-matching support for handling names in the Japanese language rendered with the Hiranaga and Katakana scripts. At the current level of support , queries and name matches involving Hiragana and Katakana are most effective when the names have been segmented (the words that make up each name are space delimited).
New transliteration schemes for the translation of Arabic names in Arabic script to Latin script: Extended IC (handles all characters in Arabic script, including characters used in Persian, Urdu, Pushto, and Dari) and undiacritized BGN (removes diacritics or non-ASCII characters from BGN transliterations).
New transliteration schemes for the translation of Korean names in Hangul and Hanja to Latin script: the Revised Romanization of Korean (MOCT) and the Korean Romanization for Data Applications (KORDA).
Added support for translating Russian names in Cyrillic to Latin script in accordance with the ISO 9:1995 transliteration standard.
For RNI queries, changed the default setting of
testNameData
to true, so that users are not required to make this setting for performing name-match queries.Added a sample RNI Index, OFAC_PERSONS, which contains the names in Latin script of individuals in the Office of Foreign Asset Control watch list. (RLPNC-994)
More robust handling of input names that contain non-alphabetic characters. (RLPNC-968)
Removed the restriction on the number of names an RNI query can return. (RLPNC-909)
Improved the performance of RNI queries for Arabic names. (RLPNC-924)
Changed the default setting for
NameIndexQuery.testNameData
fromfalse
totrue
in order to simplify the process of issuing standard RNI queries. (RLPNC-960)Return 0 rather than throw an exception when querying with a Chinese Hani name that contains data in another script. (RLPNC-970)
Modified the statistical model for translating Arabic names, leading to minor differences in the results returned. (RLPNC-974)
New Features and Bug Fixes in 2.2.0
Addition of TranslationAsistant (RNT in interactive mode), a Java API for building interactive applications to transliterate Arabic names.
Clarification of the pairing of language/script combinations for RNI index queries and name matching. Language is language of use in which a name appears, not the language of origin. So, for example, the language of a name of Arabic origin in Latin script, is English, not Arabic. When adding names to an RNI index, performing index queries, or matching names, the script must be a native script for the language. You may use
LanguageCode.UNKNOWN
for the language. A name in any legal language/script pairing may match a name in any legal pairing for that language or English (or unknown) in Latin script, and vice versa. For the details, see the appendix titled Supported Text Domains for Rosette Name Indexer and Name Matching. (RLPNC-819)Fixed memory leak triggered by repeated opening and closing of an RNI index. (RLPNC-917)
New Features and Bug Fixes in 2.1.3
Fixed problem generating match score > 1.0 for some Latin to Latin name pairs. (RLPNC-681)
New Features and Bug Fixes in 2.1.2
Removed support for the Windows 32-bit Visual Studio 7.1 platform. For Windows, use the Windows 32-bit Visual Studio 8.0 release.
RNI-RNT now throws an
UnsupportedNameDomainException
when an application attempts to index or lookup a name with a script/language combination that is not in a supported text domain. (RLPNC-533)Fixed a multithreading problem that appeared sporadically in RNT and RNI. (RLPNC-559)
Improved Latin to Latin name matching to score partial matches for individual words in each name. (RLPNC-355)
Provided higher recall in RNI queries at the cost of speed. In a future release, we plan to provide users with control over the tradeoff between speed and accuracy.
New Features and Bug Fixes in 2.1.1
Improved handling of Korean names when an eumjeol in the name contains 2 (not 3) jamo.
Improved performance of name index (RNI) queries involving names in Latin script.
Upgraded IndexQuerySample.java and the C++ sample applications to extract input data from a file.
Improved RNI handling of names with whitespace and/or hyphens.
New Features and Bug Fixes in 2.1.0
Enhanced support for translating personal names from Arabic to English.
Support for transliterating Japanese names in Hiragana and Katakana.
Options added to RNT for adjusting the performance tradeoff between speed and precision and for turning off the use of statistical methods to establish information that was not found in a dictionary.
New static method for returning a map of all the text domain pairs suported by RNI and name matching:
com.basistech.rnm.index.StandardNameIndex.getMapOfSupportedTextDomainPairs()
. The key for each map entry is a query text domain. The value is a list of reference text domains supported for that query text domain.
New Features in 2.0.0
RNI-RNT 2.0.0 upgrades the Rosette Name Translator (RNT) 1.1.0 and introduces the Rosette Name Indexer(RNI). The Rosette Name Matcher (RNM), with its XML data model, is deprecated. RNI supports the scoring of name matches between a candidate name and target name.
Release 2.0.0 introduces the following features:
RNI name indexes.
Utility for loading the contents of XML gazetteers into RNI indexes.
Translation of non-native personal names in Arabic documents to their standard English form in Latin script.
The Folk transliteration scheme to generate variant names in Latin script from Arabic, Persian, Pushto and Urdu names in Arabic script.
Move from Java 1.4 to Java 1.5.
Renamed the Java
com.basistech.NameMatching.*
packages tocom.basistech.rnm.*
. For RNI functionality, seecom.basistech.rnm.index.*
.New sample applications to illustrate usage of the RNI-RNT APIs.