Skip to main content

Release Notes

Language Identifier

Release Notes

Release 7.23.17.c78.0

June 2025

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-party component updates

Table 72. Updated

Package

Old Version

New Version

Apache Commons IO

2.18.0

2.19.0

Guava

33.4.0-jre

33.4.8-jre

Guava InternalFutureFailureAccess and InternalFutures

1.0.2

1.0.3

Jackson

2.18.2

2.19.0

SnakeYAML

2.3

2.4



Table 73. Added

Package

Version

License

JSpecify

1.0.0

Apache 2.0



Release 7.23.16.c77.0

March 2025

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-party component updates

Table 74. Updated

Package

Old Version

New Version

Apache Commons IO

2.17.0

2.18.0

Apache Log4j

2.24.1

2.24.3

Guava

33.3.1-jre

33.4.0-jre

Jackson

2.17.2

2.18.2

Java Architecture for XML Binding

n/a

2.2.12



Release 7.23.15.c76.0

November 2024

New

  • Java 21 support: Java 21 is now supported. Java 11 and 17 are still supported. (RLIJE-576)

Third-party component updates

Table 75. Updated

Package

Old Version

New Version

Apache Commons IO

2.16.1

2.17.0

Apache Commons Lang

3.16.0

3.17.0

Apache Log4j

2.23.1

2.24.1

fastutil

8.5.14

8.5.15

Guava

33.3.0-jre

33.3.1-jre

SnakeYAML

2.2

2.3

Woodstox

7.0.0

7.1.0



Release 7.23.14.c75.0

September 2024

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 76. ΩUpdated

Package

Old Version

New Version

Apache Commons CLI

1.7.0

1.9.0

Apache Commons Lang

3.14.0

3.16.0

fastutil

8.15.13

8.15.14

Guava

33.2.0-jre

33.3.0-jre

Jackson

2.17.1

2.17.2

Woodstox

6.6.2

7.0.0



Release 7.23.13.c74.0

June 2024

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 77. Updated

Package

Old Version

New Version

Apache Commons CLI

1.6.0

1.7.0

Apache Commons IO

2.15.1

2.16.1

Apache Log4j

2.21.1

2.23.1

args4J

2.33

2.37

Guava

33.0.0-jre

33.2.0-jre

Jackson

2.16.1

2.17.1

Woodstox

4.4.1

6.6.2



Release 7.23.12.c73.0

March 2024

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 78. Updated

Package

Old Version

New Version

Apache Commons CLI

1.2.0

1.6.0

Apache Commons IO

2.15.0

2.15.1

Apache Commons Lang

3.12.0

3.14.0

fastutil

8.15.12

8.5.13

Guava

32.1.3-jre

33.0.0-jre

Guava InternalFutureFailureAccess and InternalFutures

1.0.1

1.0.2

ICU4J

70.1

74.2

Jackson Annotations

2.15.3

2.16.1

Jackson Core

2.15.3

2.16.1

Jackson Databind

2.15.3

2.16.1

Jackson Dataformat XML

2.15.3

2.16.1

Jackson Dataformat YAML

2.15.3

2.16.1

Jackson Old JAXB Annotations

2.15.3

2.16.1



Release 7.23.11.c72.0

December 2023

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 79. Updated

Package

Old Version

New Version

Jackson Annotations

2.15.2

2.15.3

Jackson Core

2.15.2

2.15.3

Jackson Databind

2.15.2

2.15.3

Jackson Dataformat XML

2.15.2

2.15.3

Jackson Dataformat YAML

2.15.2

2.15.3

Jackson Modules: Base

2.15.2

2.15.3

Guava: Google Core Libraries for Java

32.1.2-jre

32.1.3-jre

Apache Commons IO

2.11.0

2.15.0

Apache Log4j API

2.20.0

2.21.1

Apache Log4j Core

2.20.0

2.21.1

Apache Log4j SLF4J Binding

2.20.0

2.21.1

Stax2 API

4.2.1

4.2.2

SnakeYAML

2.0

2.2

LIBLINEAR

2.30

2.44



Release 7.23.10.c71.0

September 2023

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 80. Updated

Package

Old Version

New Version

Jackson Annotations

2.15.0

2.15.2

Jackson Core

2.15.0

2.15.2

Jackson Databind

2.15.0

2.15.2

Jackson Dataformat XML

2.15.0

2.15.2

Jackson Dataformat YAML

2.15.0

2.15.2

Jackson Module: Old JAXB Annotations

2.15.0

2.15.2

Guava: Google Core Libraries for Java

31.1-jre

32.1.2-jre



Release 7.23.9.c70.0

June 2023

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 81. Updated

Package

Old Version

New Version

Apache Log4J

2.19.0

2.20.0

fastutil

8.5.9

8.5.12

Jackson Annotations

2.14.0

2.15.0

Jackson Core

2.14.0

2.15.0

Jackson Databind

2.14.0

2.15.0

Jackson Dataformat XML

2.14.0

2.15.0

Jackson dataformats: Text

2.14.0

2.15.0

Jackson modules: Base

2.14.0

2.15.0

SnakeYAML

1.33

2.0



Release 7.23.8.c69.0

March 2023

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Third-Party Component Updates

Table 82. Updated

Package

Old Version

New Version

Google Guava

26.0-jre

31.1-jre



Release 7.23.7.c68.0

December 2022

Bug Fixes

  • The command line utility RLICmd is functioning correctly. It returned an error in the previous release. (RLIJE-563)

Third-party component updates

This release includes the following third-party component changes:

Table 83. Upgraded

Package

Old version

New version

Apache Log4j

2.17.1

2.19.0

fastutil

8.5.6

8.5.9

Jackson

2.11.1

2.14.0

SLF4J

1.7.33

1.7.36

SnakeYAML

1.30

1.33



Release 7.23.6.c67.0

June 2022

New

  • Java 17 support added: Java 8 and 9 support has been removed. Java 11 and Java 17 are supported. (RLIJE-557)

Release 7.23.5.c66.0

February 2022

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Notice

Java 8 and Java 9 support is deprecated as of this release.

Third-party component updates

This release includes the following third-party component changes:

Table 84. Upgraded

Package

Old Version

New Version

Apache Commons IO

2.7

2.11.0

Apache Commons Lang

2.6

3.12.0

Apache Log4j

1.2.17

2.17.1

ICU4J

59.1

70.1

fastutil

8.4.0

8.5.6

SLF4J

1.7.28

1.7.33

SnakeYAML

1.26

1.30



Release 7.23.4.c65.0

July 2021

Bug fixes

  • Confidence scores for language regions are now always within the range [0,1]. (RLIJE-552)

Release 7.23.3.c64.1

May 2021

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Release 7.23.2.c63.0

March 2021

New

  • New RLICmd option: We've added the option -lineInputDelimiter to specify a delimiter to use with -lineInput other than the default newline. It has no effect when -lineInput is not true. (RLIJE-530)

Bug Fixes

  • We fixed a bug where changing the value of -maxResults could change the results returned in addition to the number of results. This only occurred when there was a small off-script region in the input text. (RLIJE-534)

Release 7.23.1.c63.0

January 2021

New

  • We added -input-json (-ij) as an option to RLICmd to specify that the input is an ADM format file. (RLIJE-533)

  • When specifying -output-json as an option in RLICmd, the resulting ADM now has a data field containing the input data. If the encoding of the data is not recognized by the JVM, the data field will not be populated. (RLIJE-532)

Bug Fixes

  • We fixed a bug where RLICmd would crash when trying to supply a script file (-scriptFile). (RLIJE-532)

Third-party component updates

This release includes the following third party component changes:

Package

Old Version

New Version

Apache Commons IO

2.6

2.7

fastutil

8.3.0

8.4.0

Jackson Annotations

2.9.8

2.11.1

Jackson Core

2.9.8

2.11.1

Jackson Databind

2.9.8

2.11.1

Jackson Dataformat XML

2.9.8

2.11.1

Jackson dataformats: Text

2.9.8

2.11.1

Jackson modules: Base

2.9.8

2.11.1

LIBLINEAR

1.9.5

2.30

SnakeYAML

1.25

1.26

Stax2 API

4.2

4.2.1

Release 7.23.0.c62.2

September 2020

Bug Fixes

  • RLI-JE now correctly identifies the primary language of short documents which contain small fragments of a language in another script. Previously, the language of the fragments might be erroneously detected as the primary language. The lengths of the document's script regions are now taken into account when identifying the primary language. (RLIJE-523)

API Changes

  • LanguageIdentifierBuilder#buildLanguageRegionAnnotator(Annotator, Annotator) has been formally deprecated. It was already marked as being for internal use only. (RLIJE-523)

Release 7.22.2.c62.2

January 2020

This release is for compatibility with other Rosette SDKs. There are no new features or bug fixes.

Release 7.22.1.c62.0

December 2019

New Features

  • We have added -lineInput as an option to RLICmd. With the option enabled, RLICmd will process each line of an input file separately, and each line will get its own set of results. (RLIJE-215)

Bugs Fixed

  • We fixed a bug where the documentation erroneously claimed that RLI-JE supported jpn-Kana in UTF-16BE, UTF-16LE, and UTF-8. (RLIJE-524)

Release 7.22.0.c61.0

August 2019

New Features

  • Added support for Albanian, Bulgarian, Catalan, Croatian, Estonian, Icelandic, Kurdish (Arabic script), Kurdish (Latin script), Latvian, Lithuanian, Macedonian, Polish, Serbian (Cyrillic script), Serbian (Latin script), Slovak, Slovenian, Somali, Tagalog, Ukrainian, Urdu (Arabic script), Uzbek (Cyrillic script), Uzbek (Latin script), and Vietnamese to the short string algorithm.

  • Rosette Language Identifier now returns Malaysian (zsm) instead of Malay (msa).

  • If shortStringThreshold is set, the LanguageRegionAnnotator will utilize short-string detection on sufficiently short script regions.

Release 7.21 and earlier

New Features

Bugs Fixes

Third-Party Components

Known Problems

New Features

Release 7.21.4

  • Running on Java 11 is now supported. (RLIJE-511)

Release 7.21.0

  • Added the option koreanDialects to specify that RLI-JE should return North Korean (qkp) or South Korean (qkr) instead of Korean (kor). By default, this option is turned off. (RLIJE-480)

Release 7.20.0

  • Added support for Indonesian and Malaysian to the short string algorithm. (RLIJE-448)

Release 7.19.0

  • Rescaled confidence scores for the default (not short string) algorithm such that high-confidence results get scores around 0.9 instead of around 0.03. (RLIJE-447)

  • Added the option minNonScriptioContinuaRegionLength to control when short regions of scripts like Latin are merged into regions of scripts like Han. The default is to never merge the regions. To return to the behavior of previous versions, set this option to 10. (RLIJE-454)

Release 7.18.2

  • Exposed more features through OSGi (internal use only). (RLIJE-423)

Release 7.18.1

  • Exposed more features through OSGi (internal use only). (RLIJE-423)

  • Added the -output-json option to RLICmd to write the output ADM as JSON.

Release 7.18.0

  • RLI-JE now returns results for raw input in encodings not supported by the JVM. In previous versions, it threw an exception. (RLIJE-288)

Release 7.17.0

  • Sped up the language region detection algorithm. It now completes in O(n) time. (RLIJE-397)

Release 7.16.0

  • To become file-system-agnostic, the use of Path in the API is now supported. (RLIJE-331)

  • Version of OSGi (internal use only) upgraded. (RLIJE-380)

  • Added -dontBreakRegionOnScriptBoundary to RLICmd. (RLIJE-324)

Release 7.15.2

  • Added debug and trace log messages to the multilingual detection code. (RLIJE-388)

Release 7.15.0

  • Language weight adjustments can now be used to boost any given language, not just demote it. (RLIJE-111)

  • The weight adjustment API now works for the short string algorithm. (RLIJE-204)

  • Refactored RLICmd's command line options. (RLIJE-230)

  • The short string and legacy algorithms now consistently handle cases when a language cannot be determined. Both now return language Unknown. NoMatchException, NotEnoughDataException, and LanguageIdentificationException are deprecated. (RLIJE-272, RLIJE-304)

  • Achieved a modest accuracy gain by changing the internals of the matching algorithm to be more tolerant of noise. (RLIJE-277)

  • Added an option to return only one result per language. See LanguageIdentifierBuilder#uniqueLanguages(boolean). (RLIJE-298)

  • The languageHint and encodingHint methods of LanguageIdentifierBuilder are deprecated. Use the weight adjustment API instead. (RLIJE-320)

Release 7.14.0

  • Profiles for transliterated languages (e.g. Arabic in Latin script) are disabled by default. To enable them, see LanguageIdentifierBuilder#languageWeightAdjustment(LanguageCode, ISO15924, int). (RLIJE-225)

Release 7.13.3

  • Added an option to read models used by the short string algorithm from a JAR file. See LanguageIdentifierBuilder#useModelsInJar(boolean). (RLIJE-276)

Release 7.13.1

  • RLICmd can continue running if it fails to find a file in a provided list of files to analyze. (RLIJE-216)

  • RLICmd can use multiple threads to analyze documents in parallel. (RLIJE-221)

  • icu4j and args4j have been shaded into the rli-je-shaded jar. (RLIJE-237)

  • Basis' common-lib jar has been shaded into the rli-je-shaded jar. (RLIJE-248)

  • RLI-JE now depends on adm-model instead of adm-shaded. With this change, RLI-JE no longer depends on Apache Commons Betwixt or Javassist. (RLIJE-252)

  • The short string detection algorithm is 20% faster than 7.13.0. (RLIJE-254)

Release 7.13.0

  • Added alternative analysis for improved accuracy when detecting the language of short strings. (RLIJE-96)

  • Renamed the license directory from license to licenses to be consistent with other Basis products. (RLIJE-124)

  • Moved the command line utility, RLICmd, from tools/bin to bin. (RLIJE-200)

  • Refactored the identification of Chinese to separate language and script. Chinese language (zho) is now detected when script is Han, Simplified (Hans) and when script is Han, Traditional (Hant). RLI-JE used to identify these variants as Simplified Chinese (zhs) and Traditional Chinese (zht). (RLIJE-152)

Release 7.12.0

  • Added support for identifying language regions in a document that contains blocks of text in multiple languages. (RLIJE-88)

  • Deprecated LanguageIdentifierFactory and LanguageIdentifierin favor of LanguageIdentifierBuilder, which you can use to set options and create Annotator objects that detect language, encoding, and writing script, as well as language regions in multilingual input. This implementation employs the new data model package (com.basistech.rosette.dm), which is used in a variety of Rosette products. (RLIJE-93)

Release 7.11.0

  • Added a factory class (LanguageIdentifierFactory) for creating instances of the LanguageIdentifier. (RLIJE-15)

  • Moved the command line utility from LanguageIdentifier to RLICmd. (RLIJE-32)

  • When returning UTF-16 encoding, RLI-JE now identifies whether the encoding is Little Endian (UTF-16LE) or Big Endian (UTF-16BE). (RLIJE-51)

  • Extended support for specifying the license. LanguageIdentifierFactoryincludes constructors for getting the license from a file, and input stream or an .xml string.

  • With the LanguageIdentifier setLanguageWeight Adjustment methods, added support for reducing the weight associated with a specific language to a percentage of its original weight in order to assist in the detection of other languages in documents that mix multiple languages.

  • Adjusted the default weights assigned to Pushto/Latn and Urdu/Latn to return more accurate results. Lowered the default weight for Serbian/Latn to 0, so that Croation is returned. Pushto/Arab, Urdu/Arab, and Serbian/Cyrillic are not affected by these adjustments. (RLIJE-37)

  • Placed new .jar files in the lib directory. mahout-collections-1.0.jar is no longer used by RLI-JE.

Release 6.5.1.1

  • Replaced trove-2.0.4.jar with mahout-collections-1.0.jar.

  • Removed utilities-no-jni-7.1.jar and replaced rlp-common-1.3.6.jar with btcommon-3.jar.

Release 6.5.1

This is the first release of RLI - Java separated from the Rosette Search Essentials SDK. It matches the support in the C++ implementation of RLI 6.5.1.

Bug Fixes

Bug fixes in 7.21.3

Bug#

Description

RLIJE-503

The shading of dependencies is inconsistent with other Basis products.

Bug fixes in 7.21.2

Bug#

Description

RLIJE-500

RLI-JE depended on Guava 18.0.0, which has a security vulnerability ( CVE-2018-10237 ). Now it depends on Guava 26.0-jre.

Bug fixes in 7.21.1

Bug#

Description

RLIJE-496

When koreanDialects is specified but rootDirectory is not specified, RLI-JE just returns Korean instead of North or South Korean.

Bug Fixes in 7.20.3

Bug#

Description

RLIJE-481

An old version of TensorFlow is included in the package.

Bug Fixes in 7.20.2

Bug#

Description

COMN-234, RLIJE-479

The shading of dependencies is inconsistent with other Basis products.

Bug Fixes in 7.20.0

Bug#

Description

RLIJE-446, RLIJE-456

The distribution package contains Maven POM files in META-INF, which causes problems when it is managed by JFrog Artifactory.

Bug Fixes in 7.19.0

Bug#

Description

RLIJE-435, RLIJE-445

Language region detection may detect multiple regions in a monolingual document if the document's language is confusable with another language. For example, an Indonesian document might be detected as having some regions in Indonesian and some in Malaysian.

Bug Fixes in 7.18.0

Bug#

Description

RLIJE-402

RLI-JE crashes on certain combinations of Han characters when assertions are enabled.

Bug Fix in 7.17.2

Bug#

Description

RLIJE-426

RLI-JE can crash when sharing a classpath with Rosette Entity Extractor (REX) Java Edition.

Bug Fix in 7.17.1

Bug#

Description

RLIJE-412

Language region detection fails on input containing an unpaired surrogate.

Bug Fix in 7.17.0

Bug#

Description

RLIJE-391

Misidentification in the presence of half-width katakana characters. The fix was the removal of the Japanese half-width UTF-8 profile. This also resulted in a small increase in accuracy for the detection of Japanese.

RLIJE-394

Script related misidentifications in the short string identifier. The fix was motivated by some Hebrew and Russian errors, but is more general than these cases.

Bug Fix in 7.16.0

Bug#

Description

RLIJE-386

Language identification with multiple threads was slow due to unnecessary synchronization. This was also backported to 7.15.1.

Bug Fixes in 7.15.0

Bug#

Description

RLIJE-269

RLICmd's -breakRegionOnScriptBoundary could not be set to false.

RLIJE-278

Error message when the short string models are missing is misleading.

Bug Fix in 7.14.1

Bug#

Description

RLIJE-302

Stack overflow when processing UTF-16 uppercase input.

Bug Fixes in 7.13.1

Bug#

Description

RLIJE-202

LanguageIdentifier could throw a NotEnoughDataException when a Cyrillic-script language was hinted and the input text was in an encoding other than UTF-8 or UTF-16. LanguageIdentifier will return the hinted language.

RLIJE-220

LanguageIdentifierBuilder inconsistently set languageWeightAdjustments when a Language-Script pair and the same Language without a script are set to different values. Now a specific Language-Script pair will override a Language without a script.

RLIJE-236

If RLI-JE is used with Basis Technology's REX-JE product, REXCmd throws a NoSuchMethodException if RLI-JE's jars appear earlier in the classpath due to including a shaded version of the Annotated Data Model package.

RLIJE-259

LanguageIdentifier asserts that results are sorted by ngram profile distance, but in some cases of input in Han script other heuristics affect the sorted output. The assertion was removed.

RLIJE-262

RLI-JE could not return more than 5 results. This restriction was removed.

Bug Fixes in 7.13.0

Bug#

Description

RLIJE-170

LanguageRegionAnnotator was throwing an IllegalArgumentException when asked to process a large amount of input. This problem has been fixed.

RLIJE-176

RLI-JE no longer throws an IllegalArgumentException when asked to process an empty string.

RLIJE-178

Fixed the RLICmd utility so it does not issue a warning about initializing the log4j system property.

Bug Fix in 7.12.2

Bug#

Description

RLIJE-171

Fixed the shading of RBL-JE in the rli-je-shaded jar.

Bug Fixes in 7.12.1

Bug#

Description

RLIJE-161

Shaded the third-party dependencies in the rli-je-shaded jar.

RLIJE-164

If LanguageRegionAnnotator.detectRegion is asked to detect a region larger than maxRegion, an IllegalArgumentException is thrown.

Bug Fix in 6.5.1.2

Bug#

Description

RLI-460

Fixed failure of RLI-JE to initialize DataCache in Web Application or Servlet deployments.

Third-Party Components

For a list of third-party licenses for components that are used in Basis Technology products, see ThirdPartyLicenses.txt.

Third-party component updates in 7.21.4

Component

Version

Change

annoy-Java

0.2.5

Removed

Third-party component updates in 7.21.4

Component

Version

Change

annoy-java

0.2.5

New

Jackson Annotations

2.9.8

Version upgrade

Jackson Core

2.9.8

Version upgrade

Jackson Databind

2.9.8

Version upgrade

Jackson Dataformat XML

2.9.8

Version upgrade

Jackson dataformats: Text

2.9.8

Version upgrade

Jackson modules: Base

2.9.8

Version upgrade

Liblinear-java

1.95

Version upgrade

SnakeYAML

1.23

Version upgrade

Third-party component updates in 7.21.2

Component

Version

Change

Google Guava

26.0-jre

Version upgrade

Third-party component updates in 7.21.0

Component

Version

Change

Jackson Annotations

2.9.6

Version upgrade

Jackson Core

2.9.6

Version upgrade

Jackson Databind

2.9.6

Version upgrade

Jackson Dataformat XML

2.9.6

Version upgrade

Jackson dataformats: Text

2.9.6

Version upgrade

Jackson modules: Base

2.9.6

Version upgrade

Woodstox

4.0.5

Version downgrade

Third-party component updates in 7.20.1

Component

Version

Change

Colt

1.2.0

New

Google Guava

18.0

Version upgrade

Jackson Annotations

2.9.4

Version upgrade

Jackson Core

2.9.4

Version upgrade

Jackson Databind

2.9.4

Version upgrade

Jackson Dataformat XML

2.9.4

Version upgrade

Jackson Dataformat YAML

2.7.3

Removed

Jackson dataformats: Text

2.9.4

New

Jackson datatypes: collections

2.9.4

New

Jackson Module JAXB Annotations

2.7.3

Removed

Jackson modules: Base

2.9.4

New

SnakeYAML

1.18

Version upgrade

Woodstox

5.0.3

New

Third-party component updates in 7.18.0

Component

Version

Change

ICU4J

59.1

Version upgrade

Third-party component updates in 7.16.0

Component

Version

Change

Fastutil

6.6.1

Version upgrade

Third-party component updates in 7.15.0

Component

Version

Change

Jackson Annotations

2.7.3

Version upgrade

Jackson Core

2.7.3

Version upgrade

Jackson Databind

2.7.3

Version upgrade

Jackson Dataformat XML

2.7.3

Version upgrade

Jackson Dataformat YAML

2.7.3

Version upgrade

Jackson Module JAXB Annotation

2.7.3

Version upgrade

Third-party component updates in 7.14.0

Component

Version

Change

args4j

2.32

Version upgrade

Apache Commons Lang

2.6

New

Apache Commons Math

 

Removed

ICU4J

55.1

Version upgrade

Jackson Annotations

2.6.2

Version upgrade

Jackson Core

2.6.2

Version upgrade

Jackson Databind

2.6.2

Version upgrade

Jackson Dataformat XML

2.6.2

Version upgrade

Jackson Dataformat YAML

2.6.2

Version upgrade

Jackson Module JAXB Annotation

2.6.2

Version upgrade

SnakeYAML

1.15

Version upgrade

Known Problems

RLI 6.5.1 may occasionally misidentify buffers containing UTF-16 data (e.g., Java Strings). The workaround is to extract a UTF-8 byte array and pass that to detect(byte[] data).

In some cases input consisting of Han Script, LanguageIdentifier may return results that are not sorted by confidence. This reflects heuristics that do not factor into the confidence calculation.