Model Training Suite
Release Notes
Model Training Suite Release 2.0.0
June 2025
Versions included in this release:
Adaptation Studio: 2.0.0
Events Training Server: 1.2.0
PETS: 1.5.2
JETS: 1.2.0
Entity Training Server (RTS): 2.0.14
Analytics Server: 1.33.0
COREF Server: 1.0.1
New
New user interface: We've redesigned the user interface for Adaptation Studio. The side menu bar has been replaced with a top menu bar. The menu items have been reorganized for an improved user experience.
Dutch event extraction: We now support event extraction in Dutch. (TEJ-2452)
Bug Fixes
We fixed a bug where the health check scripts made assumptions about the certificate's name. The health check scripts now support wildcard certificates. (TEJ-2522)
Known Issues
The health check scripts require a version of curl > 7.64.0. When the installed version of curl < 7.64.0, the health check scripts will show an error and they will not properly check for ActiveMQ access.
When running all services, you may see a message:
docker-compose.yaml: `version` is obsolete
. This is done to support older versions of docker compose.Analytics (Rosette) Server returns 500 (server error) rather than 400 (bad request) when Analytics Server determines the language of an incoming sample does not match the language of the event model's workspace. (TEJ-2453)
Adaptation Studio Release 2.0.0
June 2025
New
New user interface: We've redesigned the user interface for Adaptation Studio. The side menu bar has been replaced with a top menu bar. The menu items have been reorganized for an improved user experience.
Events Training Server Release 1.2.0
June 2025
New
Language support: Supported languages are now stored in Events Training Server, not in Analytics Server.
Java 21: Java 21 is now supported.
Entity Training Server Release 2.0.14
June 2025
Note
The Entity Training Server was previously named REX Training Server.
New
Java 21: Java 21 is now supported.
Improved installation: Root extraction is now performed last to eliminate any errors during interactive installation. (TEJ-2715)
Analytics Server MTS Release 1.33.0
Release 1.33.0
March 2025
Note
The minimum supported docker compose version is 2.0.
Installation
Bug fix: We fixed a bug when given a missing license file the installer would continue to prompt for the file but would not give a reason for the prompt. Now a prompt is provided.
Entity Extraction and Linking /entities
New entity types: You can now extract social media entity types using Entity Extractor. The types extracted are HASHTAG, ATMENTION, URL, and EMAIL. To extract these types, set
extractSocialMedia
totrue
in therex-factory-config.yaml
file. (TEJ-2672)Wikidata refreshed: We've updated the knowledge base data. The QID assigned to some extracted entities may differ from previous versions.
Bug fix: We fixed a bug with in-document coreference server. In-document coreference chains of entity mentions will now be correct. (TEJ-2655)
Event Extraction /events
Dutch support: Dutch is now supported for event extraction.
Info /info
New name: The /info endpoint now returns "Babel Street Analytics Server" instead of "Rosette".
Name Similarity /name-similarity
Improved Hebrew-English ORG matching: We've improved name matching for organizations for names containing affixes. (RLPNC-7944)
Example: AL-QAID IN IRAQ vs אלקאעדה עירא
Previously: 0.51
Now: 0.89
Improved Korean name matching: We've improved Korean matching for PERSONS and ORGANIZATIONS by updating the stop word list. (RLPNC-7951)
New parameter: We've added a parameter,
maxExpansions
to control the number of phonetically similar terms considered during the first-pass fuzzy matching. Increasing this parameter can improve first-pass results, ensuring that the correct name will be sent to the second pass, but may impact performance. (RLPNC-7967)Improved stop words: We now support stop word prefixes and stop patterns that contain the forward slash (/) characters. This is especially useful for Indian and Malaysian names that include titles which are acronyms, such as A/P, A/L, S/O, and D/O. (RLPNC-7919)
English to Chinese name translation: The English to Chinese name translation is now implemented in Java instead of C++. You may see some differences in translations. (RLPNC-8003)
New character support: Match now supports CJK Unified Ideographs Extension B, which includes rare and historical Chinese characters. This update ensures that characters from U+20000 to U+2A6DF are correctly recognized and processed, improving compatibility with Chinese, Cantonese, Korean, and Japanese data. (RLPNC-7956)
Bug fix: We fixed a bug in Cantonese where the confidence score was always 0.0 if a token had a special character at the end. (RLPNC-7893)
Bug fix We fixed an issue where left and right names in the explain info could appear in the wrong order (RLPNC-7922)
Bug fix: We fixed an issue where removing stop words produced improper segmentation results for Chinese, leading to poor match scores. (RLPNC-8036)
Bug fix: We fixed a bug in Chinese name translation where a
StringIndexOutOfBoundsException
occurred due to incorrect handling of token dictionary matches. (RLPNC-8009)Bug fix: We fixed a bug with the handling of transliteration schemes in Cantonese to English translations. You can now set the language of origin and language of use to
yue
and receive the correct translation without error. (RLPNC-7865)
Name Translation /nametranslation
New output value: The value
processedByBabel
was added to the response. This is a Boolean value which is set toTrue
when the input value is processed in any way for translation. (RLPNC-7961)Example: If an input of
ABC
returnsabc
, the value will betrue
.English to Chinese name translation: The English to Chinese name translation is now implemented in Java instead of C++. You may see some differences in translations. (RLPNC-8003)
Bug fix: We fixed a bug with the handling of transliteration schemes in Cantonese to English translations. You can now set the language of origin and language of use to yue and receive the correct translation without error. (RLPNC-7865)
Bug fix: We fixed a bug in native Chinese name translation where too many results could be returned if a name had multiple segmentations (RLPNC-8058)
Bug fix: We fixed a bug with the handling of transliteration schemes in Cantonese to English translations. You can now set the language of origin and language of use to yue and receive the correct translation without error. (RLPNC-7865)
Known issue: When translating from English to Chinese, the source and target domain information (
sourceScript
,sourceLanguageOfUse
,targetLanguage
,targetScript
,targetScheme
) are not returned in the results. The translation is returned. (RLPNC-8101)
Ping /ping
New name: The /ping endpoint now returns "Babel Street Analytics Server at your service" instead of "Rosette at your service".
Third-party component updates
Package | Old Version | New Version |
---|---|---|
ANTLR 3 Runtime | 3.5.2 | 3.5.3 |
ANTLR 4 Runtime | 4.5 | 4.13.0, 4.13.2 |
Apache BVal :: Bundle | 3.0.1 | 3.0.2 |
Apache Commons Codec | 1.15 | 1.18.0 |
Apache Commons CSV | 1.11.0 | 1.14.0 |
Apache Commons IO | 2.17.0 | 2.18.0 |
Apache Commons Logging | 1.2 | 1.3.5 |
Apache Commons Text | 1.11.0 | 1.13.0 |
Apache Commons Validator | 1.8.0 | 1.9.0 |
Apache CXF | 4.0.5 | 4.1.1 |
Apache James :: Mime4j | 0.8.11 | 0.8.12 |
Apache Log4j | 2.20.0, 2.24.1 | 2.24.3 |
Apache PDFBox | 2.0.31 | 3.0.4 |
Apache POI | 5.2.5 | 5.4.0 |
Apache Tika | 2.9.2 | 3.1.0 |
asm | 9.7 | 9.7.1 |
AspectJ Weaver | 1.9.22 | 1.9.23 |
Auto Common Libraries | 0.8 | 1.2.1 |
AutoService | 1.0-rc4 | 1.1.1 |
AWS SDK for Java | 1.12.772 | 1.12.782 |
Bouncy Castle | 1.77, 1.78 | 1.80 |
Byte Buddy (without dependencies) | 1.14.16 | 1.15.11 |
ClassMate | 1.5.1 | 1.7.0 |
Dropwizard Metrics | 3.2.6, 4.2.25, 4.2.27 | 4.2.30 |
Eclipse Jetty | 11.0.24 | 12.0.16, 12.0.18 |
Google Guice | 4.0 | 4.2.3 |
Guava: Google Core Libraries for Java | 33.3.1-jre | 33.4.0-jre |
Hibernate ORM - hibernate-core | 6.2.25.Final | 6.6.11.Final |
Hibernate Validator Engine | 8.0.1.Final | 8.0.2.Final |
HikariCP | 5.0.1 | 5.1.0 |
Jackcess | 4.0.5 | 4.0.8 |
Jackson dataformat: CBOR | 2.15.4 | 2.18.3 |
Jackson dataformat: Smile | 2.17.2 | 2.18.2 |
Jackson datatype: Guava | 2.17.2 | 2.18.2 |
Jackson datatype: JSR310 | 2.15.4 | 2.18.3 |
Jackson Jakarta-RS: base | 2.15.4 | 2.18.3 |
Jackson Jakarta-RS: JSON | 2.17.2 | 2.18.2 |
Jackson module: Jakarta XML Bind Annotations (jakarta.xml.bind) | 2.15.4 | 2.18.3 |
Jackson module: Old JAXB Annotations (javax.xml.bind) | 2.17.2 | 2.18.2 |
Jackson-annotations | 2.17.2 | 2.18.2 |
Jackson-core | 2.17.2 | 2.18.2 |
jackson-databind | 2.17.2 | 2.18.2 |
Jackson-dataformat-XML | 2.17.2 | 2.18.2 |
Jackson-dataformat-YAML | 2.17.2 | 2.18.2 |
Jackson-JAXRS: base | 2.15.4 | 2.18.3 |
Jackson-JAXRS: JSON | 2.17.2 | 2.18.2 |
Jakarta Servlet | 6.0.0 | 6.1.0 |
Jakarta Validation API | 3.1.0 | 3.1.1 |
Jandex: Core | 3.0.5 | 3.2.0 |
JavaCPP | 1.5.10 | 1.5.11 |
JavaPoet | 1.9.0 | 1.13.0 |
jaxb-api | 2.3.0 | 2.3.1 |
JBoss Logging 3 | 3.5.3.Final | 3.6.1.Final |
JetBrains Java Annotations | 26.0.1 | 26.0.2 |
JLine | 3.26.1 | 3.26.3 |
jwarc | 0.29.0 | 0.31.1 |
Maven Artifact | 3.3.9 | 3.9.9 |
Micrometer Application Metrics | 1.11.12 | 1.14.5 |
Non-Blocking Reactive Foundation for the JVM | 3.5.17 | 3.7.4 |
Plexus Common Utilities | 3.0.24 | 4.0.2 |
Plexus Interpolation API | 1.26 | 1.27 |
Project Lombok | 1.18.34 | 1.18.36 |
Protocol Buffers [Core] | 3.25.5 | 4.29.3 |
SLF4J | 2.0.16 | 2.0.17 |
Spring Boot | 3.1.12 | 3.4.4 |
Spring Data Core | 3.1.12 | 3.4.4 |
Spring Data JPA | 3.1.12 | 3.4.4 |
Spring Framework | 6.0.21 | 6.2.5 |
Spring Security | 6.1.9 | 6.4.4 |
Spring Shell | 3.1.12 | 3.4.0 |
StringTemplate 4 | 4.3.3 | 4.3.4 |
tomcat-embed-el | 10.1.24 | 10.1.39 |
Package | Version | License |
---|---|---|
AssertJ Core | 3.27.3 | Apache-2.0 |
AutoService Processor | 1.1.1 | Apache-2.0 |
EE10 :: Servlet | 12.0.18 | EPL-2.0 |
Hibernate Commons Annotations | 7.0.3.Final | Apache-2.0 |
JavaBeans Activation Framework API jar | 1.2.0 | (CDDL-1.0 OR GPL-2.0-with-classpath-exception) |
javax.ws.rs-api | 2.1.1 | EPL-2.0 |
jsoup Java HTML Parser | 1.18.3 | MIT |
picocli | 4.7.6 | Apache-2.0 |
Package |
---|
abego TreeLayout Core |
ANTLR 4 Runtime |
AssertJ fluent assertions |
error-prone annotations |
Hibernate Commons Annotations |
io.grpc:grpc-api |
io.grpc:grpc-stub |
javax.ws.rs-api |
Jetty :: Jakarta Servlet API and Schemas for JPMS and OSGi |
TagSoup |
Package | Old Version | New Version |
---|---|---|
aiosignal | 1.3.1 | 1.3.2 |
annotated-types | 0.6.0 | 0.7.0 |
anyio | 4.3.0 | 4.8.0 |
async-timeout | 4.0.3 | 5.0.1 |
attrs | 23.2.0 | 25.3.0 |
blis | 0.7.11 | 1.2.0 |
certifi | 2024.2.2 | 2025.1.31 |
charset-normalizer | 3.3.2 | 3.4.1 |
cloudpathlib | 0.16.0 | 0.21.0 |
confection | 0.1.4 | 0.1.5 |
cymem | 2.0.8 | 2.0.11 |
datasets | 2.18.0 | 3.3.2 |
distlib | 0.3.8 | 0.3.9 |
exceptiongroup | 1.2.0 | 1.2.2 |
fastapi | 0.110.0 | 0.115.11 |
filelock | 3.13.1 | 3.17.0 |
frozenlist | 1.4.1 | 1.5.0 |
fsspec | 2024.2.0 | 2024.12.0 |
h2 | 4.1.0 | 4.2.0 |
hpack | 4.0.0 | 4.1.0 |
huggingface-hub | 0.21.4 | 0.29.3 |
Hypercorn | 0.16.0 | 0.17.3 |
hyperframe | 6.0.1 | 6.1.0 |
idna | 3.6 | 3.10 |
langcodes | 3.3.0 | 3.5.0 |
multidict | 6.0.5 | 6.1.0 |
murmurhash | 1.0.10 | 1.0.12 |
numpy | 1.26.4 | 2.0.2 |
nvidia-cublas-cu12 | 12.1.3.1 | 12.4.5.8 |
nvidia-cuda-cupti-cu12 | 12.1.105 | 12.4.127 |
nvidia-cuda-nvrtc-cu12 | 12.1.105 | 12.4.127 |
nvidia-cuda-runtime-cu12 | 12.1.105 | 12.4.127 |
nvidia-cudnn-cu12 | 8.9.2.26 | 9.1.0.70 |
nvidia-cufft-cu12 | 11.0.2.54 | 11.2.1.3 |
nvidia-curand-cu12 | 10.3.2.106 | 10.3.5.147 |
nvidia-cusolver-cu12 | 11.4.5.107 | 11.6.1.9 |
nvidia-cusparse-cu12 | 12.1.0.106 | 12.3.1.170 |
nvidia-nccl-cu12 | 2.19.3 | 2.21.5 |
nvidia-nvjitlink-cu12 | 12.4.99 | 12.4.127 |
nvidia-nvtx-cu12 | 12.1.105 | 12.4.127 |
packaging | 24.0 | 24.2 |
pandas | 2.2.1 | 2.2.3 |
pip | 22.2.2 | 24.2 |
pipenv | 2022.10.12 | 2024.1.0 |
platformdirs | 4.2.0 | 4.3.6 |
pyarrow | 15.0.2 | 19.0.1 |
pydantic | 2.6.4 | 2.10.6 |
pydantic_core | 2.16.3 | 2.27.2 |
pytz | 2024.1 | 2025.1 |
PyYAML | 6.0.1 | 6.0.2 |
regex | 2023.12.25 | 2024.11.6 |
safetensors | 0.4.2 | 0.5.3 |
scipy | 1.12.0 | 1.13.1 |
setuptools | 69.2.0 | 76.0.0 |
six | 1.16.0 | 1.17.0 |
smart-open | 6.4.0 | 7.1.0 |
spacy | 3.7.4 | 3.8.3 |
srsly | 2.4.8 | 2.5.1 |
starlette | 0.36.3 | 0.46.1 |
sympy | 1.12 | 1.13.1 |
taskgroup | 0.0.0a4 | 0.2.2 |
thinc | 8.2.3 | 8.3.4 |
tokenizers | 0.15.2 | 0.21.1 |
tomli | 2.0.1 | 2.2.1 |
tqdm | 4.66.2 | 4.67.1 |
transformers | 4.38.2 | 4.49.0 |
triton | 2.2.0 | 3.1.0 |
typer | 0.9.0 | 0.15.2 |
typing_extensions | 4.10.0 | 4.12.2 |
tzdata | 2024.1 | 2025.1 |
urllib3 | 2.2.1 | 2.3.0 |
virtualenv | 20.25.1 | 20.29.3 |
wasabi | 1.1.2 | 1.1.3 |
weasel | 0.3.4 | 0.4.1 |
xxhash | 3.4.1 | 3.5.0 |
yarl | 1.9.4 | 1.18.3 |
Package | Version | License |
---|---|---|
aiohappyeyeballs | 2.6.1 | PSF-2.0 |
aiohttp | 3.11.13 | Apache-2.0 |
click | 8.1.8 | License :: OSI Approved :: BSD License |
en_core_web_sm | 3.8.0 | MIT |
Jinja2 | 3.1.6 | License :: OSI Approved :: BSD License |
language_data | 1.3.0 | MIT |
marisa-trie | 1.2.1 | MIT |
markdown-it-py | 3.0.0 | MIT |
MarkupSafe | 3.0.2 | License :: OSI Approved :: BSD License |
mdurl | 0.1.2 | MIT |
propcache | 0.3.0 | Apache-2.0 |
Pygments | 2.19.1 | BSD-2-Clause |
requests | 2.32.3 | Apache-2.0 |
rich | 13.9.4 | MIT |
shellingham | 1.5.4 | ISC |
torch | 2.5.1 | BSD-3-Clause |
wrapt | 1.17.2 | License :: OSI Approved :: BSD License |
Package |
---|
aiohttp |
click |
en-core-web-sm |
Jinja2 |
MarkupSafe |
pyarrow-hotfix |
requests |
torch |
virtualenv-clone |
Package | Old Version | New Version |
---|---|---|
Apache Log4j | 2.23.1 | 2.24.3 |
commons-codec | 1.16.1 | 1.17.2 |
hibernate-validator | 8.0.1.Final | 8.0.2.Final |
Jackson datatype: JSR310 | 2.17.3 | 2.18.2 |
jackson-annotations | 2.17.3 | 2.18.2 |
jackson-core | 2.17.3 | 2.18.2 |
jackson-databind | 2.17.3 | 2.18.2 |
jackson-datatype-jdk8 | 2.17.3 | 2.18.2 |
jackson-module-parameter-names | 2.17.3 | 2.18.2 |
jboss-logging | 3.5.3.Final | 3.6.1.Final |
logback-classic | 1.5.12 | 1.5.16 |
logback-core | 1.5.12 | 1.5.16 |
Micrometer Application Metrics | 1.13.8, 1.14.1 | 1.14.4, 1.14.5 |
Prometheus Java Simpleclient | 1.2.1 | 1.3.6 |
snakeyaml | 2.2 | 2.3 |
Spring Boot | 3.3.6 | 3.4.3 |
Spring Framework | 6.1.15 | 6.2.3 |
tomcat-embed-core | 10.1.33 | 10.1.36 |
tomcat-embed-el | 10.1.33 | 10.1.36 |
tomcat-embed-websocket | 10.1.33 | 10.1.36 |
Model Training Suite Release 1.0.11
July 2024
Versions included in this release:
Adaptation Studio: 1.0.11
Events Training Server: 1.0.11
PETS: 1.5.1
JETS: 1.1.0
REX Training Server: 2.0.13
Rosette Server: 1.30.0
COREF Server: 1.0.10
Note
New model name restriction
Event model names and workspaces can no longer contain a colon (:)
New
Custom Knowledge Base: You can now train a custom knowledge base for entity linking and disambiguation.
Architecture improvements: Events processing code was removed from Rosette Server and moved into Events Training Server (ETS). This was done to separate the releases of Rosette Server from Events Training Server. The major user visible differences from this change are:
The difference between extraction mode and training mode is that some methods are not allowed in extraction mode. These methods are restricted in extraction mode to prevent resource constraining activities like training from occurring.
Both extraction and training modes now require a valid Rosette Server installation. A single Rosette Server installation can be shared between extraction and training.
Known Issues
The health check scripts require a version of curl > 7.64.0. When the installed version of curl < 7.64.0, the health check scripts will show an error and they will not properly check for ActiveMQ access.
When running all services you may see a message:
docker-compose.yaml: `version` is obsolete
. This is done to support older versions of docker compose.Dutch event extraction is not supported by Rosette Server. (TEJ-2452)
Rosette Server returns 500 (server error) rather than 400 (bad request) when Rosette Server determines the language of an incoming sample does not match the language of the event model's workspace. (TEJ-2453)
You cannot call the Relationships endpoint if the indoc co-ref server is enabled globally by default. Rosette Server will throw a null pointer exception. (TEJ-2451)
Adaptation Studio Release 1.0.11
July 2024
New
Note
New model name restriction
Event model names and workspaces can no longer contain a colon (:)
Custom Knowledge Base: You can now train a custom knowledge base for entity linking and disambiguation.
Coref server support: We've added support for the entity extraction indocument coreference server. When creating a new event model, there is now an option to enable using this server for entity extraction.
Private models: Models can now be deployed with a customer id and an app id as part of the workspace name. Models without a customer id or app id are public in Rosette Cloud. (RAS-1965)
Dutch support: You can now train event models in Dutch. Note that event extraction in Rosette Server is not yet supported.
Known Issues
If you deploy a model with an app id and customer id, you cannot redeploy it with the same name without the app id and the customer id.
Events Training Server Release 1.1.0
July 2024
Note
New model name restriction
Event model names and workspaces can no longer contain a colon (:)
New
Private models: Models in ETS can now have a customer id and an app id in the workspace name. (TEJ-2295)
REX Training Server Release 2.0.13
July 2024
Bug Fixes
The
update-rs-for-rts
script no longer updates thewrapper.conf
file in a containerized setup. (TEJ-2391)The
update-rs-for-rts
script now sets the default path to/basis/rs
. Previously it was set to/basis/rosette
.
Rosette Server MTS Release 1.30.0
Note
You can now train event models in Dutch through Adaption Studio. Dutch is not yet supported for event extraction in Rosette Server.
Release 1.30.0
June 2024
Server Documentation
Logging: We've added a section on Log4j and configuring log files.
Names endpoints ( /name-similarity, /name-deduplication, /name-translation, /record-similarity) will now return a 400 error instead of a 500 error when called with the output=rosette
parameter. This parameter is only supported for document (non-names) endpoints.
Health Services /health/services
New status endpoint: We've added a configurable endpoint to report the health information of servers connected to Rosette Server. These servers include:
events-training-server (ETS): server for training event models,
indoc-coref-server (indoc coref): server providing indocument coreference,
rex-training-server (RTS): server for training entity models.
Entity Extraction and Linking /entities
Wikidata refreshed: We've updated the knowledge base data for the provided knowledge base. The QID assigned to some extracted entities may differ from previous versions. You should see large improvements in entity linking. (RWIKI-454, RWIKI-507)
We've made some changes as to how some entity types are linked to the provided knowledge base:
PERSON: Now only real humans are linked as person entities; fictional, imaginary, and mythical humans are not.
PRODUCT: Product entities now exclude most creative works.
Linking improvements: We've changed the conflict resolution algorithm to one which tries to link using the longest possible mentions. You should see better linking, especially in cases where the mention of a popular entity is embedded within the mention of interest. (RWIKI-404)
Example: I studied at the University of Chicago
Previously linked: Chicago
Now linked: University of Chicago
Linking improvements: We've added a heuristic to help stop generic, unnamed entities, such as "mortgage law", from being linked. (RWIKI-475)
New endpoint for supported languages: We've added a new endpoint, entities/indoc-coref-server/supported-languages, which returns the supported languages for indocument coreference. (WS-3176)
Bug fix: We fixed a bug when running REX with multiple threads where stop word files were being loaded and not closed, causing the system to run out of file handles and memory. (TEJ-2347)
Bug fix: We fixed a bug where Chinese characters were normalized when looking up knowledge base artifacts for linking in Japanese. You should see improved entity linking in Japanese. (RWIKI-406)
Bug fix: English terms for half(s) and quarter(s) were removed from the Russian (RUS) and German (DEU) regexes for time. (TEJ-1817)
Bug fix: Aliases are no longer filtered by low normalized link probability; it is now possible to link entities where abbreviations like "MIT", "LA", "WHO", "UN" are the mention text. (RWIKI-389)
Bug fix: We fixed a NullPointerException while writing log entry when processing empty tokens. (TEJ-2361)
Bug fix: We fixed a bug where news media, such as television programs, were typed as ORG. (RWIKI-483).
Known issue: If the indoc-coref-server is enabled, errors may be returned from the /sentiment, /topics, and /relationships endpoints if multiple entity mentions are returned for the same entity. (RQA-1345)
Event Extractor /events
Version requirement: Rosette Server 1.30.0 and future releases will not work with back releases of /events. The event training server (ETS) must be at release 1.1.x or above.
Installation requirement: When installing the event training server (ETS),
RS_URL
must be set in the ETS configuration.
Morphological Analysis /morphology/{morphoFeature}
CLA lexicon: Two terms have been added to the CLA lexicon: 喷码机 inkjet printer and 管理器 manager (in the context of software) (ETROG-3678)
Improved Readings: Readings are now returned for numeric words in Chinese and Japanese when
tokenizerType
is set toSPACELESS_LEXICAL
. (ETROG-3684)Bug fix: When
tokenizerType
is set toSPACELESS_LEXICAL
, Japanese tokens for verbs in lemma form have had their readings fixed to cover the entire token. (ETROG-3640)Example: Input: 食べる
Previous Reading: た
Current Reading: たべる
Bug fix: When
tokenizerType
is set toSPACELESS_LEXICAL
, Japanese lemmatization has been corrected for numeric tokens containing both decimal points and multiplier characters.Example: Input: 2.5亿
Previous Lemma: 2500000000
Current Lemma: 250000000
Bug fix: We fixed a bug where a token whose surface form was the empty string could be returned when
fragmentBoundaryDetection
was set totrue
(the default). (ETROG-3686)
Name Similarity /name-similarity
Japanese translation improvements: We've improved Japanese translation by updating the custom reading dictionary (RLPNC-7539).
Hebrew translation overrides: Added additional overrides for Hebrew translation. (RLPNC-7471)
Non-Latin numeric characters: Numeric characters in certain languages are now normalized to their Latin-script counterparts. Supported languages currently include Thai (RLPNC-7562), Arabic, Burmese, Pashto (RLPNC-7564), Persian (including Iranian and Afghan Persian), Urdu, and Khmer (RLPNC-7565).
Pashto organization improvements: We've added stop words for organizations for Pashto. (RLPNC-6889)
Spanish organization improvements: We've added stop words for organizations for Spanish (RLPNC-6893)
Bug fix: We fixed a bug where translating certain Japanese or Korean names could lead to a memory leak. (RLPNC-7550, 7558)
Bug fix: We fixed a bug with cross-entity-type matching: cross-entity-type match scoring is now commutative. As a result of this, cross-entity-type matching will now ignore entity-type-specific parameters and overrides. (RLPNC-7485).
Bug fix: We fixed a bug where khm-khm was returned as a supported language pair for name translation. (RLPNC-7556)
Bug fix: We fixed a bug where native resources were not appropriately freed after generating Cantonese readings. (RLPNC-7589)
Bug fix: We fixed a bug, that caused the left and right input fields in the explain info to swap places, when a Japanese organization name is matched against a single character, that is normalized away (like '*') and the language of use is not defined on that side. (RLPNC-7554)
Bug fix: We fixed a bug that improved performance handling of Cyrillic-script names. (RLPNC-6999)
Name Translation /name-translation
Japanese translation improvements: We've improved Japanese translation by updating the custom reading dictionary (RLPNC-7539).
Hebrew translation overrides: Added additional overrides for Hebrew translation. (RLPNC-7471)
Bug fix: We fixed a bug where translating certain Japanese or Korean names could lead to a memory leak. (RLPNC-7550, 7558)
Bug fix: We fixed a bug where khm-khm was returned as a supported language pair for name translation. (RLPNC-7556)
Bug fix: We fixed a bug where native resources were not appropriately freed after generating Cantonese readings. (RLPNC-7589)
Bug fix: We fixed a bug that improved performance handling of Cyrillic-script names. (RLPNC-6999)
Record Similarity /record-similarity
Fielded dates supported: We've added support for fielded dates. (RLPNC-7520)
Fielded addresses supported: We've added support for fielded addresses. (RLPNC-7519)
Blank fields supported: We've added support for specifying a
scoreIfNull
for fields in record similarity. You can now specify a value for when a field is missing in a record. (RLPNC-7516, RLPNC-7517)Parameter support improved: We've added support for specifying either a parameter universe or a mapping of parameter names to parameter values. (RLPNC-7497)
Validation improved: We've improved validation of field mappings and field weights. (RLPNC-7545)
Info messages added: Record matching responses now return info messages when default property values are used.(RLPNC-7509)
Error reporting improved: Fields not included in the mapping or fields with unknown types no longer cause an error in record matching. Instead, this information is returned in the "info" block. A record with only non-included fields or fields with unknown types will lead to an error in the response. (RLPNC-7512, RLPNC-7514)
Partial records supported: Record similarity now supports partial request success if some record pairs contain unmapped or unknown fields, or encounter other scoring errors. (RLPNC-7502)
More records and fields supported: We've removed hard limits on the number of records or mapping fields in record-similarity requests (RLPNC-7601)
Relationship Extraction /relationships
Known issue: If the indoc-coref-server is enabled, errors may be returned from the /sentiment, /topics, and /relationships endpoints if multiple entity mentions are returned for the same entity. (RQA-1345)
Sentiment Analysis /sentiment
Known issue: If the indoc-coref-server is enabled, errors may be returned from the /sentiment, /topics, and /relationships endpoints if multiple entity mentions are returned for the same entity. (RQA-1345)
Tokenization /tokens
Bug fix: We fixed a bug where the Chinese word “星期四” would be tokenized incorrectly in certain contexts. (ETROG-3582)
Topic Extraction /topics
Known issue: If the indoc-coref-server is enabled, errors may be returned from the /sentiment, /topics, and /relationships endpoints if multiple entity mentions are returned for the same entity. (RQA-1345)
Third-Party component updates
Package | Version | License |
---|---|---|
Animal Sniffer | 1.18 | MIT |
Eclipse Glassfish | 4.04 | EPL 2.0 |
Eclipse Jetty Servlet API | 5.02 | EPL 2.0/ASL 2.0 |
Error Prone | 2.3.4 | ASL 2.0 |
Netty Project | 4.1.46.Final/4.1.49.Final/4.1.52.Final | ASL 2.0 |
PerfMark | 0.23.0 | ASL 2.0 |
Package | New Version |
---|---|
Apache Commons CLI | 1.7.0 |
Apache Commons Compress | 1.26.1 |
Apache Commons IO | 2.16.0 |
Apache XML Schema | 2.3.1 |
Apache CFX | 4.0.4 |
args4J | 2.37 |
Basis Technology Annotated Data Model | 3.0.1 |
Basis Technology Rosette API Client Library for Java | 1.30.0 |
Basis Technology Rosette Common Java API | 38.0.1 |
Basis Technology TCL Regex | 0.14.12 |
Eclipse Jetty | 11.0.20 |
Google Guava | 33.2.0-jre |
Google Protobuf | 3.25.3 |
Jackson Annotations | 2.17.1 |
Jackson Core | 2.17.1 |
Jackson Databind | 2.17.1 |
Jackson Databind Smile | 2.17.1 |
Jackson Databind XML | 2.17.1 |
Jackson Databind YAML | 2.17.1 |
Jackson Datatype Guava | 2.17.1 |
Jackson JAX-RS | 2.17.1 |
Jackson JAXB Annotations | 2.17.1 |
Jakarta Bean Validation API | 3.1.0 |
Jakarta RESTful Web Services | 4.0.0 |
Jakarta XML Binding | 4.0.2 |
Jakarta XML Web services | 4.0.1 |
Jackson Woodstox | 6.6.2 |
Java Servlet API | 6.0.0 |
Package |
---|
Apache Geronimo Specifications |
Eclipse Extended StAX API |
Jakarta Activation |
Jakarta Annotations |
Jakarta SOAP Implementation with Attachments |
Jakarta Web Services Metadata API |
Package | New Version |
---|---|
Apache Commons Codec | 1.16.1 |
Apache Log4j | 2.23.1 |
Apache Tomcat | 10.1.23/10.1.24 |
Basis Technology Annotated Data Model | 3.0.1 |
Basis Technology Rosette API Client Library for Java | 1.30.0 |
Basis Technology Rosette Common Java API | 38.0.1 |
FasterXML ClassMate | 1.7.0 |
Google Guava | 33.0.0-jre |
HdrHistogram | 2.2.1 |
Jackson Annotations | 2.17.1 |
Jackson Base Modules | 2.17.1 |
Jackson Core | 2.17.1 |
Jackson Databind | 2.17.1 |
Jackson Datatype JSR310 | 2.17.1 |
Jackson Java 8 Modules | 2.17.1 |
Logback | 1.5.6 |
Micrometer | 1.13.0/1.13.1 |
Project Lombok | 1.18.32 |
Prometheus Java Simpleclient | 1.2.1 |
SLF4J | 2.0.13 |
SpringBoot | 3.3.0 |
Spring Framework | 6.0.14 |
Package |
---|
Apache Commons IO |
Model Training Suite Release 1.0.10
May 2024
Versions included in this release:
Adaptation Studio: 1.0.10
Events Training Server: 1.0.10
PETS: 1.5.0
JETS: 1.0.10
REX Training Server: 2.0.12
Rosette Server: 1.29.0
COREF: 1.0.10
New
Gazetteers: You can now compile a text gazetteer to a binary file format enhancing search efficiency and accelerating gazetteer processing. These compiled gazetteers can be deployed to Rosette Server, following the procedure outlined in the Rosette Server User Guide. (TEJ-1942)
Disambiguation model training: MTS now supports training entity linking models for disambiguation of similarly named entities. (RAS-1895)
ActiveMQ: Added support for using ActiveMQ for sending notifications for workspace events rather than using SSE. (RAS-1910)
Bug Fixes
If a trust or key store file is passed to the enable ssl script without a file type, the script will print an error and exit. (TEJ-2133)
Known Issues
When running healthcheck scripts for RAS, RTS and ETS you may see:
curl: option --http0.9: is unknown curl: try 'curl --help' or 'curl --manual' for more information
The health check scripts require a version of curl > 7.64.0. When the installed version of curl < 7.64.0, the health check scripts will show an error and they will not properly check for ActiveMQ access.
When running all services you may see a message:
docker-compose.yaml: `version` is obsolete
. This is done to support older versions of docker compose.
Adaptation Studio Release 1.0.10
May 2024
New
Deploy events model: You can now push a trained events model directly to another server without using the command line. (RAS-1228)
Sidebar added: We've improved product navigation through the addition of a sidebar. (RAS-1772)
Disambiguation model training: RAS now supports training and downloading customer knowledge bases for disambiguation of similarly named entities. This custom knowledge bases can be deployed to Rosette Server, as detailed in the Rosette Server User Guide.(RAS-1895)
Gazetteers: You can now compile a text gazetteer to a binary file format enhancing search efficiency and accelerating gazetteer processing. These compiled gazetteers can be deployed to Rosette Server, following the procedure outlined in the Rosette Server User Guide. (TEJ-1942)
Active MQ: Due to multiple cross-browser and framework receiving issues caused by Server-Sent Events (SSE), SSE has been removed as an eventing mechanism from ETS and RTS to Adaptation Studio. It has been replaced with ActiveMQ. ActiveMQ is hosted in the Adaptation Studio deployment. (RAS-1910)
The modification affected the installation process for RAS, RTS, and ETS by introducing prompts for the host, port, username, and password of the ActiveMQ server. Additionally, the headless installers were enhanced to accommodate these properties
Healthcheck scripts updated: The healthcheck scripts were updated to support checking ActiveMQ accessibility using SSL and non-SSL connections. (TEJ-2275)
Mongo password: The mongo instance deployed with Adaptation Studio now is password protected for enhanced security. (RAS-1508)
Bug Fixes
Many bug fixes and UI improvements.
Events Training Server Release 1.0.10
May 2024
New
ActiveMQ: Added support for using ActiveMQ for sending notifications for workspace events rather than using SSE. (RAS-1910)
REX Training Server Release 2.0.12
May 2024
New
Disambiguation model training: RTS now supports training custom knowledge bases for disambiguation of similarly named entities. (TEJ-2139)
Note
Due to the addition of the knowledge base data files used for entity disambiguation and linking, the physical size of the RTS release has increased substantially.
Learning curves: When samples are added, RTS can be queried for accuracy values. The results are plotted and converted to a .png file which can be downloaded.(TEJ-2140)
Gazetteers: You can now compile a text gazetteer to a binary file format enhancing search efficiency and accelerating gazetteer processing. These compiled gazetteers can be deployed to Rosette Server, following the procedure outlined in the Rosette Server User Guide. (TEJ-1942)
ActiveMQ: Added support for using ActiveMQ for sending notifications for workspace events rather than using SSE. (RAS-1910)
Healthcheck scripts updated: The healthcheck scripts were updated to support checking ActiveMQ accessibility using SSL and non-SSL connections. (TEJ-2274)
Third-party component updates
Package | Old Version | New Version |
---|---|---|
Apache Commons CLI | 1.2 | 1.6.0 |
Apache Commons Collections | 3.2.1 | 3.2.2 |
Apache Commons Compress | 1.24.0 | 1.26.0 |
Apache Commons IO | 2.15.0 | 2.15.1 |
Apache Commons Lang | 3.12.0 | 3.14.0 |
Guava InternalFutureFailureAccess and InternalFutures | 1.0.1 | 1.0.2 |
fastutil | 8.5.12 | 8.5.13 |
Guava: Google Core Libraries for Java | 32.1.3-jre | 33.0.0-jre |
icu4j | 70.1 | 74.2 |
Jackson-annotations | 2.15.3 | 2.16.1 |
Jackson-core | 2.15.3 | 2.16.1 |
Jackson-databind | 2.15.3 | 2.16.1 |
Jacskon-dataformat:Smile | 2.15.3 | 2.16.1 |
Jackson-dataformt-XML | 2.15.3 | 2.16.1 |
Jackson-dataformt-YAML | 2.15.3 | 2.16.1 |
Jackson module: Old JAXB Annotations | 2.15.3 | 2.16.1 |
Package | Version | License |
---|---|---|
ActiveMQ :: All JAR bundle | 5.18.13 | Apache 2.0 |
SWC - Sweble Engine | 1.1.1 | Apache 2.0 |
SWC - Sweble Lazy Wikitext Parser | 1.1.1 | Apache 2.0 |
Package |
---|
org.apiguardian:apiguardian-api |
JUnit Platform Commons |
JUnit Platform Engine API |
JUnit Vintage Engine |
org.opentest4j:opentest4j |
Rosette Server MTS Release 1.29.0
MTS-Specific Release Notes
Indoc coreref: The interactive server was updated to support installation of the indoc coreference server.
SSL updates: The following changes were made to the
enable-rs-ssl.sh
script:You can now use a JKS file without an extension.
The script checks for file expiration dates.
When a non-JKS file is provided, the script exits with a proper error message.
Healthcheck updates: The following changes were made to the
rs-healthcheck.sh
scriptThe script now recognizes expired certificates and returns an error at the end to fix the issues.
When extracting PEM files from JKS files, the permissions are now set correctly.
Release 1.29.0
April 2024
Address Similarity /address-similarity
Bug fix: Fixed the error being returned when matching addresses. Only English and Chinese are supported for address matching; RNI will now throw an unsupported language exception when matching non-English, non-Chinese addresses if
allLanguageSupport
is disabled. (RLPNC-7416)
Entity Extraction and Linking /entities
Indoc Coreference: We've added a server to provide in-document coreference (indoc coref). With indoc coref enabled, all entity mentions, including pronouns, titles, and other references to an entity, are returned in the ADM output. To enable indoc coref, set the option
useIndocServer
totrue
. By default, indoc coref is disabled. We recommend using a GPU for performance when indoc coref is enabled. (TEJ-2244)
Event Extractor /events
New event extractor: Event extractor analyzes unstructured text and extracts event mentions and the roles (event mentions) which add detail to the event. We've included two simple event models, travel and meet, to demonstrate how events works.
Morphological Analysis /morphology/{morphoFeature}
Unicode update: Unicode 15.1 is now supported. (ETROG-3595)
Bug fix: Upper case input text is now supported. Previously, the endpoint would send an error message
Language uen not supported
. (WS-3163)
Name Similarity /name-similarity
Malay support expanded: We have improved name Malay matching by expanding the stop word list. (RLPNC-7175, RLPNC-7176)
Hebrew improved: We have improved name matching and translation for Hebrew by expanding translation override lists. (RLPNC-7234)
Explain info improved:
Sub-elements are now ordered consistently and provide additional detail for any given pairwise match. (RLPNC-7293)
All date matches now return explain info about the parsed date fields, and report the “time distance” for time distance and time proximity matches. (RLPNC-7309)
All address matches now return explain info about tokenization and the final score for each address field to address field match. (RLPNC-7292)
Bug fix: Fixed a bug in which stop words were not being applied properly for Greek. (RLPNC-7144)
Bug fix: Names in Han script with unknown language and unknown language of origin now give appropriate Japanese, Chinese, and Korean readings. (RLPNC-7367)
Bug fix: Fixed token tagging in names that end with a suffix. (RLPNC-7417)
Bug fix: Names that get normalized to empty now return an empty list of Real World IDs. (RLPNC-7202)
Bug fix: Overrides are no longer considered for the gender penalty. (RLPNC-7346)
Bug fix: Confidence scores for fullname overrides are now correctly calculated when at least one token has a confidence score specified. (RLPNC-7456)
Bug fix: Fixed token tagging in names that end with a suffix. (RLPNC-7417)
Name Translation /name-translation
Multiple translations: We've added a parameter,
maximumResults
, to return multiple translations along with their confidence scores. The default is to return a single translation. (RLPNC-7350)Hebrew improved: We have improved name matching and translation for Hebrew by expanding translation override lists. (RLPNC-7234)
Record Similarity /record-similarity
Record Similarity: We've added a new endpoint to compare two lists of records and return a similarity score for each pair. Each record can contain one to five fields of mixed data types. Check it out. (RLPNC-7372)
Semantic Similarity /semantics/{semanticsFeature}
New embeddings for French and Italian: The GEN_2 embeddings (originally released in June2023) are now available for French and Italian. These embeddings provide more accurate results and are debiased compared to the previous embeddings. You may see differences in returned values for these languages. To use the previous embeddings, set
embeddingsMode
toGEN_1
. (RD-2632)Korean embeddings: North Korean and South Korean embeddings are no longer distinguished by default. The default embedding mode (GEN_2) treats them the same. If you need to distinguish between North Korean and South Korean embeddings, set
embeddingsMode
toGEN_1
.
Sentence Taggig /sentences
Bug fix: Upper case input text is now supported. Previously, the endpoint would send an error message
Language uen not supported
. (WS-3163)
Tokenization /tokens
Unicode update: Unicode 15.1 is now supported. (ETROG-3595)
Bug fix: Upper case input text is now supported. Previously, the endpoint would send an error message
Language uen not supported
. (WS-3163)
Third-Party component updates
Package | Version | License |
---|---|---|
CUDA BLAS Library (nvidia-cublas-cu12) | 12.1.3.1 | |
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit (nvidia-cuda-cupti-cu12) | 12.1.105 | |
NVIDIA Runtime Compilation Library and Header (nvidia-cuda-nvrtc-cu12) | 12.1.105 | |
CUDA Runtime (nvidia-cuda-runtime-cu12) | 12.1.105 | |
NVIDIA CUDA® Deep Neural Network (nvidia-cudnn-cu12) | 8.9.2.26 | |
CUDA FFT Library (nvidia-cufft-cu12) | 11.0.2.54 | |
CUDA Random Number Generation Library (nvidia-curand-cu12) | 10.3.2.106 | |
CUDA Linear Solver Library (nvidia-cusolver-cu12) | 11.4.5.107 | |
CUDA Sparse Matrix Library (nvidia-cusparse-cu12) | 12.1.0.106 | |
NVIDIA Collective Communication Library (nvidia-nccl-cu12) | 2.18.1 | |
NVIDIA JIT Linking Library (nvidia-nvjitlink-cu12) | 12.4.99 | |
NVIDIA Tools Extension Library (nvidia-nvtx-cu12) | 12.1.105 |
Package | Old Version | New Version |
---|---|---|
Apache Commons CLI | 1.2 | 1.6.0 |
Apache Commons IO | 2.15.0 | 2.15.1 |
Apache Commons Lang | 3.12.0 | 3.14.0 |
Basis Technology Annotated Data Model | 2.10.0 | 3.0.0 |
Basis Technology Rosette Common Java API | 37.5.3 | 38.0.0 |
Eclipse Jetty | 9.4.53.v20231009 | 9.4.54.v20240208 |
fastutil | 8.15.12 | 8.5.13 |
Guava | 32.1.3-jre | 33.0.0-jre |
Guava InternalFutureFailureAccess and InternalFutures | 1.0.1 | 1.0.2 |
ICU4J | 70.1 | 74.2 |
Jackson | 2.15.3 | 2.16.1 |
Model Training Suite Release 1.0.9
January 2024
Versions included in this release:
Adaptation Studio: 1.0.9
Events Training Server: 1.0.9
REX Training Server: 2.0.11
Rosette Server: 1.28.0
New
SSL scripts: We've updated the enable/disable SSL scripts to work with the new Rosette Server SSL configuration.
docker compose version deprecated: This release of MTS has been tested with docker compose 1.26.0 and 2.21.0. Version 1.26.0 is deprecated and will not be supported in future releases.
Bug fixes
The rs-healthcheck.sh script has been updated to correctly extract the Common Name from certificates.
Adaptation Studio Release 1.0.9
January 2024
New
Event reports: Adjudication reports are now available for event models. (RAS-1648)
Discard sample: We've added a discard button on the annotation page to remove a sample from annotation and training. (RAS-1729)
Adjudicator improvements: When adjudicating, adjudicators can now add new event annotations. (RAS-1548)
Document training/validation ratio: Previously, the default ratio between training and validation documents was always 80/20. Now you can select the appropriate default ratio at project creation or when adding documents to the project. (RAS-1763, RAS-1764)
Bug Fixes
You can no longer set a date in the future in the from field in the Search panel. (RAS-1453)
The
backup_mongo.sh
script now works when SSL is enabled. (RAS-1739)We removed the Adjudicate button from the Annotate page. (RAS-1738)
An error is now returned when component names in the event schema include a dot (.). (RAS-1363)
Events Training Server Release 1.0.9
January 2024
New
Negation model: We've updated the events negation model.
SSL update: We've updated the
enable-ets-ssl.sh
script to support MacOS.Java 8 support: The EntityTypePatcher now supports Java 8.
Bug Fixes
The lock file is no longer kept for deleted workspaces.
We now support workspace IDs that are solely comprised of numbers.
The
generate-keystores.sh
script has been updated to remove a call to the log function.
REX Training Server Release 2.0.11
January 2024
New
SSL update: We've updated the
enable-rts-ssl.sh
script to support MacOS.
Bug fixes
The
rs-healthcheck.sh
script has been updated to correctly extract the Common Name from certificates.The
generate-keystores.sh
script has been updated to remove a call to the log function.
Model Training Suite Release 1.0.8.1
November 2023
Versions included in this release:
Adaptation Studio: 1.0.8.1
Events Training Server: 1.0.8
REX Training Server: 2.0.10
Rosette Server: 1.27.0
New
Install improvements: The Rosette Server install now prompts the user to update Rosette Server for RTS and ETS support as part of the installation. If you don't update when installing, you can still run the scripts to provide RTS and ETS support.
Known issues
The generate keystores script emits a warning:
./generate-keystores.sh: line 26: log: command not found
. This warning can be ignored. It does not affect the functioning of the script.
Adaptation Studio Release 1.0.8.1
November 2023
New
Installation update: The line
platform: linux/amd64
has been removed fromdocker-compose
file because of an incompatibility with docker-compose version 1.26.Template removed: The NER-Rosette (ADM) template, used for importing externally annotated samples, is no longer provided with the release.
Model Training Suite Release 1.0.8
October 2023
Versions included in this release:
Adaptation Studio: 1.0.8
Events Training Server: 1.0.8
REX Training Server: 2.0.10
Rosette Server: 1.27.0
New
SSL update: There are now headless installers to enable and disable SSL.
New flag: All headless installers now have a
--dry-run
flag to validate the.properties
file.
Adaptation Studio Release 1.0.8
October 2023
New
Evaluation reports: We've added accuracy reports for event detection and event extraction, improving the explainability of the model. (RAS-1626)
Event Negation: The Test document page now includes options to extract negation clues with the event. (RAS-1670)
Search bar added: We've added a search bar in the All Projects page. You can now search for projects by name, type, language, and update date range. (RAS-1387)
Improved tentative resolution
New dialog: We've improved the dialogs for resolving tentative extractors and key phrases. There are now separate dialogs for extractors and key phrases and it is clearer how the model is being modified when the tentatives are resolved. (RAS-1669)
All samples updated: When a tentative is resolved in a sample, all samples containing that phrase are also updated. (RAS-1622)
IAA for events: There is now an IAA report for event projects. (RAS-1684)
Improved user management - adjudicators: When a user is added to a project as an adjudicator, all documents are assigned to them by default. (RAS-1708)
SSL update: There are now headless installers to enable and disable SSL.
New flag: All headless installers now have a
--dry-run
flag to validate the.properties
file.
Bug Fixes
Only a single extractor for money (IDENTIFIER:MONEY) is available. Previously, both MONEY and IDENTIFIER:MONEY were available. (RAS-1654)
An event type can no longer be deleted without user confirmation. (RAS-1595)
Events Training Server Release 1.0.8
October 2023
New
Event negation: ETS now includes a polarity classifier to extract negation clues by passing in a
polarity_filter
request option to the /analyze endpoint. The valid values for the option areignore
(default),only_positive
,only_negative
, andboth
. When the option is provided, the polarity classifier is run and the negation clues are added as new fields to the event mention. This feature is only available in English. (TEJ-1951)Event metrics: The /validate endpoint now computes statistics by event type. Additional improvements were made to support enhanced reporting of event metrics. (EDA-281)
SSL update: There are now headless installers to enable and disable SSL.
New flag: All headless installers now have a
--dry-run
flag to validate the.properties
file.New script: We've added a script to add aliases for the TEMPORAL:TIME and IDENTIFIER:MONEY for compatibility with old projects that use TIME and MONEY. This script (
add-legacy-entity-types-for-ets.sh
) is used with the headless Analytics Server installer.
Bug Fixes
We fixed a bug in calculating the macro and weighted scores. The scores displayed on the Project dashboard may now be lower. To update the displayed scores, add a new adjudication sample into the project.
Errors in the swagger documentation have been corrected.
The correct response message is now sent when documents are deleted from projects.
We fixed a bug where some incoming requests were not properly validated.
We fixed a bug where trailing slash ()/ in some requests were not properly handled.
Known issues
Incorrect status returned: If the events endpoint returns a 422 (no workers) or a 400 (negation called on a non-English workspace), a 500 will be returned from Analytics Server. Analytics Server converts most 4xx status codes to a 500.
REX Training Server Release 2.0.10
October 2023
New
SSL update: There are now headless installers to enable and disable SSL.
New flag: All headless installers now have a
--dry-run
flag to validate the.properties
file.
Model Training Suite Release 1.0.7
July 2023
Versions included in this release:
Adaptation Studio: 1.0.7
Events Training Server: 1.0.7
REX Training Server: 2.0.9
Rosette Server: 1.26.0
Adaptation Studio Release 1.0.7
July 2023
New
The task to resolve tentative roles has been moved from the Project Schema page to the Adjudication page. Tentative roles now mark an annotation as needing adjudication. (RAS-1511)
When resolving a tentative role a pop-up window now appears with the extractor types or to reject the tentative role. (RAS-1513)
Adjudication is not complete until all tentative roles have been resolved. (RAS-1514)
We've added a new filter option to the annotation page to view all samples that need adjudication. (RAS-1516)
When you add a document to the project from the Test Document page, it is now assigned to all annotators. (RAS-1506)
The
reset_admin.sh
script now supports python3. (RAS-1510)We've made it easier to assign all documents for annotation or adjudication to a user. There is now a checkbox next to the user's name. Check the box to select all documents. (RAS-1562)
Bug Fixes
We fixed a bug where all projects were not loading on the All Projects page. (RAS-1484)
Accuracy scores are now updated when annotated documents are deleted. (RAS-1439)
Roles which appear in multiple events in a single sample are now properly displayed in all events. (RAS-1541)
The samples counter is now displayed properly for users who only have adjudication tasks assigned. (RAS-1536)
You can now annotate multiple events in a single sample. (RAS-1556)
Events Training Server Release 1.0.7
July 2023
This release is for compatibility with Analytics Server. There are no new features or bug fixes.
Known Issues
When a workspace is deleted, a file named
<workspace>.lock
is left. This file does not pose any risk. This will be fixed in a future release. (EDA-280)
REX Training Server Release 2.0.9
July 2023
Bug Fixes
We fixed a bug where a workspace was missing the models directory. Workspaces are now validated and the models directory is verified. (TEJ-1984)
Model Training Suite Release 1.0.6.2
June 2023
Versions included in this release:
Adaptation Studio: 1.0.6
Events Training Server: 1.0.6
REX Training Server: 2.0.0
Rosette Server: 1.25.1
New
Helm Charts: Model Training Suite can now be deployed via Helm charts. This release includes the file
model-training-suite-helm-1.0.6.2.zip
. (WS-2742)
Bug Fixes
We fixed a bug in the script
update-rs-for-ets.sh
when installing interactively. The script no longer loops without completion. (RQA-893)
Model Training Suite Release 1.0.6.1
June 2023
Versions included in this release:
Adaptation Studio: 1.0.6
Events Training Server: 1.0.6
REX Training Server: 2.0.0
Rosette Server: 1.25.1
Bug Fixes
The script
update-rs-for-ets.sh
and the associated.properties
file have been updated to support the headless installer. (WS-2804)The script
update-rs-for-rts.sh
and the associated.properties
file have been updated to support the headless installer. (WS-2804)This script only needs to be run when using an on-premises version of Rosette Server or a a containerized Rosette Server that was not shipped with Model Training Suite.
Model Training Suite Release 1.0.6
May 2023
Versions included in this release:
Adaptation Studio: 1.0.6
Events Training Server: 1.0.6
REX Training Server: 2.0.0
Rosette Server: 1.25.1
New
Multiple annotators and adjudication are now supported for events.
Adaptation Studio Release 1.0.6
May 2023
New
The view annotations page has been improved for events projects. (RAS-1370)
You can now add comments to events annotations. (RAS-1393)
Multiple annotators can be assigned to each sample for events annotations. (RAS-1412)
Adjudication is now supported for event annotations. (RAS-1476)
You can annotate link IDs for entities, enabling training custom knowledge base linking.
We added the force admin to the headless installer and a
--dry-run
option. Now./install-ras-headless.sh --dry-run
will validate the properties file, output the values that will be used and exit.
Bug Fixes
The password is no longer visible in the client-side debug console. (RAS-1406)
Invalid dates are no longer allowed on the view annotations page. (RAS-1403)
Events Training Server Release 1.0.6
May 2023
Bug Fixes
Entity type
Temporal:Time
is now mapped toTime
for backwards compatibility with older models. (WS-2757)Entity type
Identifier:Money
is now mapped toMoney
for backwards compatibility with older models. (WS-2757)
Known Issues
A security vulnerability (CVE-2023-24329 ) has been identified but is not addressed in this release. The correct fix is still being identified. However, it is only an issue if you are not sanitizing URLs and pass around a URL with leading whitespace. This is not an issue for this product.
REX Training Server Release 2.0.0
May 2023
This release is for compatibility with Rosette Server. There are no new features or bug fixes.
Model Training Suite Release 1.0.5.1
March 2023
Versions included in this release:
Adaptation Studio: 1.0.5.1
Events Training Server: 1.0.5.1
REX Training Server: 1.0.5.1
Rosette Server: 1.24.1
New
Headless installers: Headless installers are now available for all components.
Known Issues
Adjudication is not supported for events. We recommend only having a single annotator for each sample.
You cannot upload ETS model files that were created with older versions of MTS and contain invalid or unknown entity types. If your event model contains invalid or unknown entity types, export the project from Adaptation Studio and then import the project back into the Studio. A new events project will be created which corrects the entity types. Invalid entity types include TIME and MONEY, which were part of some sample schemas.
Adaptation Studio Release 1.0.5.1
March 2023
New
Headless installer: Adaptation Studio can now be installed without human interaction. Instead of user prompts, the installer parameters are taken from the properties file.
Events Training Server Release 1.0.5.1
March 2023
New
Headless installer: Events Training Server can now be installed without human interaction. Instead of user prompts, the installer parameters are taken from the properties file.
REX Training Server Release 1.0.5.1
March 2023
New
Headless installer: REX Training Server can now be installed without human interaction. Instead of user prompts, the installer parameters are taken from the properties file.
Model Training Suite Release 1.0.5
January 2023
Versions included in this release:
Adaptation Studio: 1.0.5
Events Training Server: 1.0.5
REX Training Server: 1.0.5
Rosette Server: 1.24.1
New
Docker support: Scripts have been updated to determine if
docker compose
is installed and to use that rather than the deprecateddocker-compose
command.Rosette Server install: The Rosette Server
conf
directory is now exposed outside the docker container so that settings can be customized using instructions from the MTS System Administrator Guide and the Server User Guide.SSL: Scripts to enable and disable SSL now check the expiration dates of the given keystores and certificates. In addition the scripts no longer require the user to concatenate the key and cert together and instead ask for
--cert --key
.Installation verification: Added a new script
verify-rs-configuration-for-ets.sh
in ETS that will verify that the Rosette Server installation is set up correctly by validating therex-factory-config.cfg
file.Resource management: We've exposed 3 properties in
/rts/rts-docker/.env
to control RTS resource consumption The number of training threads that can be running in the system at any one time.rites inRTS_CONCURRENT_TRAIN_THREADS=2
: The number of training threads that can be running in the system at any one time.RTS_CONCURRENT_SERIALIZE_THREADS=1
: The number of model serialization threads that can be running in the system at any one time.RTS_CONCURRENT_WORDCLASS_THREADS=2
: The number of wordclass generation threads that can be running in the system at any one time.
Bug Fixes
rs-healthcheck
now looks in the correct directory for the.env
file.
Known Issues
Adjudication is not supported for events. We recommend only having a single annotator for each sample.
You cannot upload ETS model files that were created with older versions of MTS and contain invalid or unknown entity types. If your event model model contains invalid or unknown entity types, export the project from Adaptation Studio and then import the project back into the Studio. A new events project will be created which corrects the entity types. Invalid entity types include TIME and MONEY, which were part of some sample schemas.
Adaptation Studio Release 1.0.5
January 2023
New
Browser support: Firefox is now supported. (RAS-460)
New annotation filters:
You can now filter by annotation date. (RAS-1243)
You can now filter NER projects by labels and adjudication status (RAS-1183)
Version information: The Help About window now contains version information for all installed components. (RAS-1259)
Event schema modification: You can now modify the Required flag for roles in existing project schema. (RAS-1284)
Schema categories: When creating schemas, you can now assign them to a category. Categories organize schemas and can make it easier to locate a particular schema. (RAS-1169, RAS-1175)
Link annotation: NER models now supports annotating links: (RAS-1202)
Bug Fixes
Exported versions are now sorted correctly when sorted by date. (RAS-1201)
Undo now correctly reverts states when annotating events (RAS-1190)
Exporting schemas now contain the version number. (RAS-817)
Errors are now caught when importing annotated ADM files. (RAS-1266)
Errors are now caught when importing schemas. (RAS-1226)
Samples are now correctly added for validation. Previously, they were always added to training, even when validation was selected. (RAS-1289)
A user can now be added to a project as an adjudicator only; they don't automatically get added as an annotator. (RAS-1263)
Known Issues
Events Training Server Release 1.0.5
January 2023
New
Improved semantic extractors: Word embeddings are now more precise. (EDA-189)
Improved schema validation: All name collisions are now checked. (EDA-207)
Improved error messages:
Only one error message is returned if a wrong extractor kind is given. (EDA-224)
A human-readable message is now provided in a consistent location. (EDA-204)
Security updates:
Dependencies updated to include security fixes.
This fixes a security hole; previously an attacker could craft a malicious model file and upload it and then ETS could execute arbitrary code from the crafted file. (EDA-192)
Bug Fixes
Role dataspans are now correct when the document contains emojis. (EDA-231)
Model retrieval with concurrent mutation no longer causes an error (EDA-205)
Worker info links are now correct in the eureka dashboard. (EDA-230)
REX Training Server Release 1.0.5
January 2023
New
New endpoint: To enable linking annotations in RAS we added a new endpoint for knowledge base search. Given a search string, the server will retrieve a list of Wikidata QIDs that have matching entries in our alias dictionaries and provide a response containing their Wikidata titles and descriptions (TEJ-1851)
Model Training Suite Release 1.0.4
October 2022
Versions included in this release:
Adaptation Studio: 1.0.4
Events Training Server: 1.0.4
REX Training Server: 1.0.4
Rosette Server: 1.23.0
New
Tagalog support: Tagalog is now supported for named entity recognition (NER) models.
Adaptation Studio Release 1.0.4
October 2022
New
Duplicate documents: When a document is loaded that is identical to a previously loaded document, an error will now be displayed. (ANST-1082)
Schema validation: Schemas are now validated to verify that all referenced entity types are deployed in the attached deployment of Analytics Server. If an entity type does not exist, an error is returned. (WS-2566)
UI improvements: Many improvements have been made to the user interface, incuding improvements for working with large documents.
Bug Fixes
The DNN extractor is no longer used in calls to the /entities endpoint; the statistical model is always used. (ANST-1139)
Documents deleted from events projects are no longer used in training. (ANST-1027)
Comments are now downloaded correctly in ADM files. Previously, the ADM files wouldn't download if the annotations contained comments. (ANST-1103)
Known Issues
Multi-token semantic extractors are not supported. (EDA-165)
Events Training Server 1.0.4
October 2022
New
Schema validation: Schemas are now validated to verify that all referenced entity types are deployed in the attached deployment of Analytics Server. If an entity type does not exist, an error is returned. (WS-2566)
Improved accuracy using POS: Part of speech information is now used to help differentiate between event type roles. (EDA-178)
Improved accuracy - negative events: We've improved identification of samples which do not contain an event. (EDA-163)
Improved performance and decreased memory usage: We've improved performance, especially in validation. (EDA-143, EDA-184)
Bug Fixes
Documents deleted from events projects are no longer used in training. (ANST-1027)
Key candidates are no longer removed by overlapping candidate mentions. (EDA-169)
Semantic extractors now also do an exact comparison. (EDA-166)
Concurrent queries are now handled properly to ensure data consistency. (EDA-197)
REX Training Server Release 1.0.4
October 2022
New
Java 17 support: Java 17 is now supported.
New properties: The following properties are now exposed and configurable in the
./rts-docker/.env
file.The maximum number of training threads at any one time:
RTS_CONCURRENT_TRAIN_THREADS=2
The maximum number of threads serializing models at any one time:
RTS_CONCURRENT_SERIALIZE_THREADS=1
The maximum number of threads creating wordclasses at any one time:
RTS_CONCURRENT_WORDCLASS_THREADS=2
Model Training Suite Release 1.0.3.1
September 2022
Versions included in this release:
Adaptation Studio: 1.0.4.2
Events Training Server: 1.0.3.1
REX Training Server: 1.0.3.1
Rosette Server: 1.22.0
New
Security updates: Components re-released to remove known high or critical vulnerabilities.
Model Training Suite Release 1.0.3
July 2022
Versions included in this release:
Adaptation Studio: 1.0.3
Events Training Server: 1.0.3
REX Training Server: 1.0.3
Rosette Server: 1.22.0
New
The documentation set has been updated.
Adaptation Studio Release 1.0.3
July 2022
New
Initial password: The initial admin password is now entered twice to verify the correct password has been entered. (ANST-920)
Version information: The Help menu now contains an About option which lists the version of the Studio. (ANST-963)
New Project menu: The New Project menu now has two options: Create and Import.
Adjudication reports: The Adjudication reports have been removed from events models. (ANST-1019)
Case sensitive parameter set for events: When using entity extractors for events models the text is always sent as case insensitive. This improves the identification of role candidates. (ANST-1039)
Scripts added:
enable-browser-ras-ssl.sh
anddisable-browser-ras-ssl.sh
scripts added to enable SSL on the browser facing proxy interface of RAS. (ANST-1026)
Bug Fixes
The Help menu is no longer disabled for Annotators and Adjudicators. (ANST-1042)
The View Document button on the Manage page is working properly. (ANST-1051)
The first documented added to the system can now be for validation. Previously, it would upload if for training, but not if it was for validation. (ANST-989)
Correct word class status is now displayed. Previously it displayed Not Ready, even when the word classes were available. (ANST-990)
Only supported languages are available when creating a schema template. (ANST-935)
NER models can now be successfully exported. (ANST-978)
reset_admin
password script now works when SSL is enabled. (ANST-939)ETS, if enabled, is checked by the ras_healthcheck script. (RAS-983)
Known Issues
Adjudication is not supported for events models.
Events Training Server 1.0.3
July 2022
New
Improved document processing: Long documents are now processed by sentence boundaries. (EDA-147)
Improved performance: All event analyses, including identifying candidate roles, event extraction, and validation, are faster.
Improved tokenization: Tokens are now taken from the /morphology endpoint. This has improved tokenization, especially in languages such as Korean. Samples may be tokenized differently than in previous releases. (EDA-129)
Java 17 supported: Java 8 is no longer supported.
Security updates:
Dependencies updated to include security fixes.
Client authentication was enabled. (WS-2058)
Bug Fixes
You no longer lose roles from the event when you lower the semantic threshold. (EDA-127)
All lemmas are now used for morphological extractors. (EDA-118)
REX Training Server Release 1.0.3
July 2022
Bug Fixes
New workspace names can no longer corrupt the workspaces directory. (RQA-387)
A corrupt data file has been repaired enabling Japanese model training.
You can now train entity models in Chinese using the language code
zho
.
Model Training Suite Release 1.0.1
April 2022
Versions included in this release:
Adaptation Studio: 1.0.1
Events Training Server: 1.0.2
REX Training Server: 1.0.2
Rosette Server: 1.21.0
New
The System Administrator Guide has been updated.
Adaptation Studio Release 1.0.1
April 2022
Bug Fix
Log messages that could expose sensitive user data have been removed.
The
DNN
flag is no longer sent when calling the Rosette Server /entities endpoint.
Events Training Server 1.0.2
April 2022
New
API Updates: The list of supported languages reflects whether ETS is in training or extraction mode. Training-only API calls are not allowed in extraction mode. If a training-only API is invoked in extraction mode a standard error message is returned. (RQA-284)
Bug Fixes
Swagger's try me feature now works when running over HTTPS. (WS-2473)
All role mentions in Chinese, Japanese, and Korean are now considered. Previously, if there were two mentions next to each other, one would not be considered.
Minor event extractions improvements have been made to Chinese, Japanese, and Korean.
Model Training Suite Release 1.0.0
April 2022
Versions included in this release:
Adaptation Studio: 1.0.0
Events Training Server: (Java/Python Servers) 1.0.0
REX Training Server: 1.0.2
Analytics Server: 1.21.0
New
Documentation update
The System Administrator Guide has been updated. The Rosette Server install instructions are in the Rosette Server User Guide shipped with the Rosette Server package.
The Adaptation Studio User Guide has been updated to reflect enhancements in the product.
The Developing Models guide has been updated to include guidance on events schemas.
Release notes: The release notes have been restructured. Each server now has its own section.
Rosette Server installation: You can now use an existing, stand-alone installation of Rosette Server, instead of the Docker container. Rosette Server is shipped separately.
Adaptation Studio Release 1.0.0
April 2022
New
Project Management Reports: We've added the following reports to help track the progress of the project:
Annotation progress report
Adjudication progress report
Inter-annotator agreement
IAA history
Import project: You can now import a previously exported project in the Studio. Previously, you had to run a command line script to import a project.
Document Level Annotations: This is a new setting that can be enabled only during initial project configuration. When enabled, annotation is performed on one full document at a time, as opposed to one sample at a time.
Annotation Comments: Annotators can now add a comment to each annotation they make. Adjudicators and managers can see these comments via View Annotations.
Annotator Assignment: A new table of annotators and documents allows managers to edit annotator assignments on a project level. Previously, annotator assignment could only be edited on a document level.
Adjudicator Assignment: A new table of adjudicators and documents allows managers to edit adjudicator assignments on a project level. Previously, adjudicator assignment could only be edited on a document level.
Sample Context Text: When annotating, Adaptation Studio now displays the full document surrounding each sample in light gray.
Adjudication UI: When adjudicating, Adaptation Studio now displays a table of agreeing and disagreeing annotations.
View Annotations UI: We've updated the layout of the samples list by separating the sample text, annotations, and further information/actions.
Filters Panel: We've added filters for annotations containing comments and assigned annotator(s).
Semantic exactor match threshold: You can now set the value for the semantic exactor match threshold when creating a new events project. This value can also be changed after the project is created.
User management: Admin Settings has been renamed User Management.
Extract ADM improvements: You can now export ADMs for all uploaded samples, even if they have no annotations.
Bug Fixes
Tentative extractors no longer show up as tentative once resolved.
Undo on the Adjudicate page returns to the last saved state instead of clearing all annotations.
Export ADM no longer overwrites records with the same name.
The key phrase is now deleted when an event type is deleted.
Morphological extractors are now extracting correctly values that were previously not being extracted.
Multi-token candidates are not captured correctly.
Exported NER model info now contains the correct precision numbers.
When clearing annotations, all selected annotations are now cleared.
Known Issues
Adjudication is not supported for events models.
Events Training Server Release 1.0.0
April 2022
Bug Fixes
Model file upload now checks that the custom profile is installed.
Endpoints now check for an installed language license.
Multiple events with the same key phrase are no longer extracted.
Events lacking required roles are no longer extracted.
Release 0.9.5.4
March 2022
This release contains all components of Rosette Model Training Suite.
Versions included in this release:
Rosette Server: 1.21.0
Events Training Server: 0.0.27.0/0.8.12 (Java/Python Servers)
REX Training Server: 1.0.2
Adaptation Studio: 0.9.5.4
Installation Instructions
This release should be installed in an empty directory. Complete installation instructions are in the System Administrator Guide.
To transfer existing projects models from a previous release, you will need the old release installation directory and the new release installation directory.
Adaptation Studio project data is stored in the
ras/mongo_data_db
directory. To transfer existing projects, copy theras/mongo_data_db
directory from the old installation directory to the newras/mongo_data_db
directory.REX Training Server (RTS) data is stored in the
rts/workspaces
directory. To transfer existing rts data, copy therts/workspaces
directory from the old installation directory to the newrts/workspaces
directory.
New
Sample events project: The release now includes a sample events project. Use the
import_project.sh
script to load the sample project. Section 4.3 in the System Adminstrator Guide has complete instructions for theimport_project.sh
command.Security Updates
REX Training Server security update:
Log4j
has been removed from RTS.ETS security update:
spring boot
,spring cloud
,spring doc
andxstream
upgraded to remove vulnerabilities.
Documentation Updates
Required endpoints: Added section 2.3.1 to the System Administrator Guide detailing required Rosette Server endpoints for events and entities.
Enabling ETS log files: Added section 2.9.4 describing how to enable ETS log files.
Bug Fixes
The ETS swagger port is now correct in the [Try Me] calls.
The timeout for server send events from ETS to RAS was extended to prevent timeouts in long-lived sessions.
The
update-rs-configuration.sh
now honors the HTTP scheme of theeventTrainingServerUrl
setting in theevent-extractor-factory-config.yaml
configuration file.Exact extractors and tentative extractors are now configured correctly when given multiple tokens (words). This includes tentative extractors configured during annotation of documents. Previously created extractors should be recreated if they include multiple tokens.
Duplicate events are no longer extracted from the same key phrase when there are multiple matching extractors.
Accuracy has been improved when evaluating text with uncommon words.
Known Issues
Entity model download: Named entity extraction (REX) model export does not work in an air-gapped installation where SSL certificates are not installed. To download in an air-gapped environment, the SSL certificates must be installed.
Release 0.9.5.3
February 2022
This release contains all components of Rosette Model Training Suite. Only Adaptation Studio has been updated from the 0.9.5.2 release.
Versions included in this release:
Rosette Server: 1.20.4
Events Training Server: 0.0.26.1/0.8.8 (Java/Python Servers)
REX Training Server: 1.0.1
Adaptation Studio: 0.9.5.3
Installation Instructions
This release should be installed in an empty directory. Complete installation instructions are in the System Administrator Guide.
To transfer existing projects models from a previous release, you will need the old release installation directory and the new release installation directory.
Adaptation Studio project data is stored in the
ras/mongo_data_db
directory. To transfer existing projects, copy theras/mongo_data_db
directory from the old installation directory to the newras/mongo_data_db
directory.REX Training Server (RTS) data is stored in the
rts/workspaces
directory. To transfer existing rts data, copy therts/workspaces
directory from the old installation directory to the newrts/workspaces
directory.
New
The Adaptation Studio User Guide has been updated to include a note about disabling popup blocking on Chrome to allow multiple files to download.
Named entity extraction (NER) model training is now supported for Hebrew.
Bug Fixes
The model information file (
rts-model.<modelname>-LE.bin.export-info
) is now downloaded when the model is exported.
Known Issues
When adjudicating entity annotations, skip for now will only skip a single item.
Release 0.9.5.2
February 2022
This release is a complete reinstall to upgrade from 0.9.5.0.
New
Versions included in this release:
Rosette Server: 1.20.4
Events Training Server: 0.0.26.1/0.8.8 (Java/Python Servers)
REX Training Server: 1.0.1
Adaptation Studio: 0.9.5.2
Log4j updates: Updated log4j to version 2.17.1 to implement fixes for the vulnerabilities identified in CVE-2021-44832.
Enhanced logging for Events Training Server:
Application properties now use the standard logger instead of standard out.
2022-01-06 14:04:19.387 ...[omitted]... : *********** ETS Properties *********** 2022-01-06 14:04:19.387 ...[omitted]... : version: 0.0.26.1 2022-01-06 14:04:19.387 ...[omitted]... : build: 2022-01-05 19:18:45 2022-01-06 14:04:19.387 ...[omitted]... : ets.mode: training 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.enable-outgoing-ssl: false 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store: /ets/certs/mbp-thubb-2915.basistech.net.jks 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store file found 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store-password: ******* 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store: /etc/certs/basisca/basistruststore.jks 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store file found 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store-password: ******* 2022-01-06 14:04:19.387 ...[omitted]... : ets.rsUrl (Rosette Server URL): http://memento.basistech.net:8181/rest/v1 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.connectionTimeoutMS: 60000 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.readTimeoutMS: 60000 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.writeBufferSizeKB: 200 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.minimumVersion: v0.8.7 2022-01-06 14:04:19.388 ...[omitted]... : *********** Done ETS Properties **********************
We've reduced the logging verbosity around worker management.
Configuration information is now sent to the logs at startup.
Improved performance:
Event extraction runtime speed and memory usage have been significantly improved for documents containing multiple events.
The number of network calls between components due to eureka registrations has been reduced.
Improved upgrades: The version information for the Events Training Server has been internalized, allowing most upgrades to consist of re-releasing container images only.
Events guidance updates: The documentation now provides guidance for events modeling.
For event extraction, you may only need a few hundred training samples. For performance reasons, we recommend a maximum of 1000 training samples. We also recommend a mix of positive and negative samples. A negative sample is one where the key phrase is not an example of an event you'd like to extract. The exact number will depend on how ambiguous the key phrase is; a more ambiguous key phrase will require more negative examples. At the most, 10% of the samples should be negative examples.
Input documents for event extraction should be no larger than 4K characters.
Bug Fixes
Chinese, Japanese, and Korean are now working correctly for event extraction.
The unused
sanity_check.sh
script has been removed from the release. It was no longer relevant.Spurious errors no longer appear in Adaptation Studio logs.
Events Training Server no longer has occasional
NullPointerException
during worker registration.The debug log for Events Training Server now contains object descriptions instead of object addresses.
Release 0.9.5.1
January 2022
New
Versions included in this release:
Rosette Server: 1.20.3
Events Training Server: 0.0.26.1/0.8.8 (Java/Python Servers)
REX Training Server: 1.0.1
Adaptation Studio: 0.9.5.1
Enhanced logging for Events Training Server:
Application properties now use the standard logger instead of standard out.
2022-01-06 14:04:19.387 ...[omitted]... : *********** ETS Properties *********** 2022-01-06 14:04:19.387 ...[omitted]... : version: 0.0.26.1 2022-01-06 14:04:19.387 ...[omitted]... : build: 2022-01-05 19:18:45 2022-01-06 14:04:19.387 ...[omitted]... : ets.mode: training 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.enable-outgoing-ssl: false 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store: /ets/certs/mbp-thubb-2915.basistech.net.jks 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store file found 2022-01-06 14:04:19.387 ...[omitted]... : ets.ssl.key-store-password: ******* 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store: /etc/certs/basisca/basistruststore.jks 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store file found 2022-01-06 14:04:19.387 ...[omitted]... : ets.trust-store-password: ******* 2022-01-06 14:04:19.387 ...[omitted]... : ets.rsUrl (Rosette Server URL): http://memento.basistech.net:8181/rest/v1 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.connectionTimeoutMS: 60000 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.readTimeoutMS: 60000 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.writeBufferSizeKB: 200 2022-01-06 14:04:19.388 ...[omitted]... : ets.pets.minimumVersion: v0.8.7 2022-01-06 14:04:19.388 ...[omitted]... : *********** Done ETS Properties **********************
We've reduced the logging verbosity around worker management.
Configuration information is now sent to the logs at startup.
Improved performance:
Event extraction runtime speed and memory usage have been significantly improved for documents containing multiple events.
The number of network calls between components due to eureka registrations has been reduced.
Improved upgrades: The version information for the Events Training Server has been internalized, allowing most upgrades to consist of re-releasing container images only.
Events guidance updates: The documentation now provides guidance for events modeling.
For event extraction, you may only need a few hundred training samples. For performance reasons, we recommend a maximum of 1000 training samples. We also recommend a mix of positive and negative samples. A negative sample is one where the key phrase is not an example of an event you'd like to extract. The exact number will depend on how ambiguous the key phrase is; a more ambiguous key phrase will require more negative examples. At the most, 10% of the samples should be negative examples.
Input documents for event extraction should be no larger than 4K characters.
Bug Fixes
Chinese, Japanese, and Korean are now working correctly for event extraction.
The unused
sanity_check.sh
script has been removed from the release. It was no longer relevant.Spurious errors no longer appear in Adaptation Studio logs.
Events Training Server no longer has occasional
NullPointerException
during worker registration.The debug log for Events Training Server now contains object descriptions instead of object addresses.
Upgrade Instructions
Use the following instructions to upgrade from release 0.9.5 to 0.9.5.1.
Load the new Rosette Server image. On each machine running Rosette Server, run:
docker load < rosette-server-enterprise-cp-1.20.3.tar.gz
Edit the Rosette Server configuration. On each machine running Rosette Server, perform the following steps:
Edit the file
<install_dir>/rs/rs-docker/.env
, updating the value ofROSETTE_SERVER_IMAGE
:# Rosette Server information ROSETTE_SERVER_IMAGE=rosette/server-enterprise-cp:1.20.3
Edit the file
<install_dir>/rs/config/com.basistech.ws.transport.embedded.cfg
, setting the value ofworkerThreadCount
to 4. This will improve system performance:# workerThreadCount is the number of threads that are created to do the actual work # in the embedded local transport, for worker residing in the same machine. Default is 2. # It is probably best to not go above 2-3x the number of physical cores on the host machine. workerThreadCount=4
Restart the server. From the
<install_dir>/rs/rs-docker
directory:docker-compose down docker-compose up
Load the new ETS images. On each machine running ETS, run:
docker load < python-events-training-server-v0.8.8.tar.gz docker load < events-training-server-0.0.26.1.tar.gz
Edit the ETS configuration. On each machine running ETS, perform the following steps:
Edit the file
<install_dir>/ets/ets-docker/.env
, updating the values ofETS_IMAGE
and andPETS_IMAGE
:ETS_IMAGE=events-training-server:0.0.26.1 PETS_IMAGE=python-events-training-server:v0.8.8
Edit the file
<install_dir>/ets/config/application.yml
to remove theversion
andbuild
properites from theinfo.app
section. The version information is now contained in the container itself.info: app: name: "Rosette Events Training Server" description: "Rosette Event Extraction and Training Server" version: "x.y.x" ← remove this line build: "1234" ← remove this line
To always restart the proxy image, edit the file
<install_dir>/ets/ets-docker/docker-compose.yml
, adding the linerestart: always
to the proxy service:proxy: restart: always
Note: you must match the indentation of the yml file and use spaces, not tabs.
Restart the server. From the
<install_dir>/ets/ets-docker
directory:docker-compose down docker-compose up
Verify the new ETS images. In the
<install_dir/ets/ets-docker
directory, run:docker-compose images
The output should be similar to the following (note the updated tags):
Container Repository Tag Image Id Size ------------------------------------------------------------------------------------------- ets-server_1 events-training-server 0.0.26.1. e02dda278f04 485 MB pets-worker_1 python-events-training-server v0.8.8 3fc2c8685380 1.786 GB
Release 0.9.5
December 2021
New
SSL support: SSL is supported for all servers.
Inference renamed to extraction: Inference mode for events has been renamed to extraction mode for clarity, based on feedback from users.
Improved install: The install scripts have been improved, including the addition of installation log files.
Improved model upload script: The script to upload an events model for extraction (
ets-upload-model.sh
) now checks if the model already exists on the server. If it does, the user can choose whether to replace the model.New export model info file: An information file is downloaded along with the model when a model is exported. This is for both event and entity models.
Superuser password: We improved the dialogue around setting the RAS superuser password. You are prompted to reset it on the first login.
RAS Password security: Setting of passwords is now more secure.
Single ETS install file: The same ETS install file (
install-ets.sh
) supports both the training and extraction modes of ETS. The script prompts the user for the mode (training or extraction).IAA: Inter-Annotator Agreement (IAA) is supported for event annotation.
Help: The RAS help file has been updated.
Versions included in this release:
Rosette Server: 1.20.0
REX Training Server: 1.0.1
Events Training Server: 0.0.25.5
Adaptation Studio: 0.9.5
Bug Fixes
Samples of 6 words or fewer now return role mentions from entity extractors.
Tentative role extractors are now created as exact extractors. Previously, they were created as morphological extractors. Tentative extractors created for key phrases are still morphological.
Exact transactors support multiple values with tokens.
Required roles are enforced when extracting event mentions.
Clone project is now supported for events projects.
When using the
plan
option to query multiple event models, only the models listed in the plan are queried.Calls to the /events endpoint on Rosette Server extracts events that use entity extractors from custom profiles.
The event server healthcheck endpoint (/ets/health) reports correct status.
ETS no longer runs out of memory with large extraction documents.
Numerous enhancements and bug fixes have been completed.
Known Issues
Adjudication is not supported for event annotation.
Event models from previous releases cannot be opened in this release. Contact Support for assistance if you have old schema or events models you would like to convert.
The headless install capability of ETS, RTS, RS, and RAS is not yet supported.
The RAS healthcheck script reports a false error message of unknown container on network.
When a sample contains multiple event mentions, the order of the mentions in the sample may impact which samples are extracted.
Release 0.9.4
November 2021
New
Multiple event models in a single call: The /events endpoint in Rosette Server supports event extraction from a single model, multiple specified models, or all loaded models. This is documented in section 7.3 of the System Adminstrator Guide.
Custom profile schema support: When creating an events schema template, a custom profile can be selected. This allows a project schema based on the template to use custom entity extractors.
Events metrics: The Project dashboard now displays event metrics (precision, recall, and F1) for events projects.
Documentation enhanced and reorganized: The Model Training Suite documentation set now consists of 3 documents:
Adaptation Studio User Guide
A guide for the managers and annotators using Adaptation Studio describing how to use the tool to create and maintain projects, annotate and train entity and event extraction models, and create event schemas.
Developing Models
A guide for the system architects and model administrators to aid in defining the modeling strategy and understanding the theory of model training. It includes an explanation of event modeling and how to design an event schema in preparation for training event extraction models, as well as guidelines for gathering and preparing data for model training.
System Administrator Guide
A guide for installing and maintaining both the training and production environments of the Rosette Model Training Suite. Included are instructions for moving trained models from the training environment into the production environment, as well as the documentation for the API calls for entity and event extraction. This includes the content previously in the Deploying models guide.
Versions included in this release:
Rosette Server: 1.19.5
REX Training Server: 1.0.1
Events Training Server: 0.0.25.4
Adaptation Studio: 0.9.4
Bug Fixes
The exported events model is now named
<workspaceid>-LE.ets-model
.Numerous enhancements and bug fixes have been completed.
Known Issues
Event models from previous releases cannot be opened in this release. Contact Support for assistance if you have old schema or events models you would like to convert.
Samples of 6 words or less will not return role mentions from entity extractors.
In this release, tentative role extractors are created as morphological extractors. In the next release, they will be exact transactors. Tentative extractors created for key phrases will remain morphological.
In this release, exact transactors do not support multiple tokens. Multiple tokens will be supported in the next release.
Event mentions may be extracted when a required role is missing.
Clone project is not currently supported for events projects. To copy an events project, export and import the project.
When using the
plan
option to query multiple event models, all event models are queried, not just the models listed in the plan.When a sample contains multiple event mentions, the order of the mentions in the sample may impact which samples are extracted.
Calls to the /events endpoint on Rosette Server will not extract any events that use entity extractors when using a custom profile.
SSL is not supported in this release.
The headless install capability of ETS, RTS, RS, and RAS is not yet supported.
Inter-Annotator Agreement (IAA) is not supported for event annotation.
Adjudication is not supported for event annotation.
The help file has not been updated.
When extracting entity mentions from a trained model on the production server, the events server healthcheck endpoint (
/ets/health
) will return a status ofDOWN
. This does not occur if a single Rosette Server is used for both training and production.
Release 0.9.3
September 2021
New
Naming: Rosette Model Training Suite refers to the complete set of annotation, training, and extraction tools. This includes the following products:
Adaptation Studio
REX Training Server
Events Training Server
Rosette Server
Event annotation: Adaptation Studio now supports annotating event mentions.
Events endpoints: The endpoints
/events
and/events/info
endpoints have been added to Rosette Server to support extraction of event mentions.Events Training Server: An events training server has been added to support training event extraction models.
Events Inference Server: An events inference server has been added to support extracting event mentions in Rosette Server.
Update Rosette Server: When installing events, the Rosette Server installation must be updated using the
rs-configuration-update.sh
script. The script must be run on both the training and inference instances of Rosette Server.REX Training Server Modifications and Improvements
New URL: The base URL is now
rts
instead ofmodel
.Renamed endpoints: All endpoints that started with
/rex
(training, annotating, etc.) are now under/workspaces/{workspaceId}
. Endpoints that took a workspace ID as part of their POST request or as a request parameter now use the route to specify a workspace./rts/workspaces/{workspaceId}/train-model
replaced/model/rex/train-model
/rts/workspaces/{workspaceId}/generate-wordclasses
replaced/model/rex/generate-wordclasses
/rts/workspaces/{workspaceId}/annotate
replaced/model/rex/annotate
. The language request parameter is no longer necessary./rts/workspaces/{workspaceId}
replaced/model/rex/{workspaceID}/status
Serialization improvements: The REX Training Server now starts serializing once annotation has paused.
Training sessions: Only 2 training sessions will occur simultaneously.
Info endpoint: The
GET /rts/info/server
now returns the various configuration properties along with the version.Improved status: The status endpoint for workspaces now includes additional information.
Documentation enhanced and reorganized: The Model Training Suite documentation set now consists of 4 documents:
Adaptation Studio User Guide
A guide for the managers, adjudicators, and annotators using Adaptation Studio describing how to use the tool to create and maintain projects, annotate and train entity and event extraction models, and create event schemas.
Developing Models
A guide for the system architects and model administrators to aid in defining the modeling strategy and understanding the theory of model training. It includes an explanation of event modeling and how to design an event schema in preparation for training event extraction models, as well as guidelines for gathering and preparing data for model training.
System Administrator Guide
A guide for installing and maintaining both the training and production environments of the Model Training Suite. Included are instructions for moving trained models from the training environment into the production environment, as well as the documentation for the API calls for entity and event extraction.
Deploying Models
A guide for Rosette model administrators, discussing how to deploy the models generated by the training suite and how to configure Analytics Server for production use. This includes installing a version of Events Training Server for extracting event mentions using previously trained models.
Versions included in this release:
Rosette Server: 1.19.4
REX Training Server: 1.01
Events Training Server: 0.0.24
Adaptation Studio: 0.9.3
Bug Fixes
Numerous enhancements and bug fixes have been completed.
Known Issues
Inter-Annotator Agreement (IAA) is not supported for event annotation.
Adjudication is not supported for event annotation.
SSL is not supported in this release.
Event extraction against multiple event models in a single call in Rosette Server is not supported.
All events endpoints and features are supported in English only.
The help file has not been updated.
The User Guide updates for events and new features are still in progress.
The headless install capability of ETS, RTS, RS, and RAS is not yet supported.
Release 0.9.2
November 2020
New
SSL enabled: SSL can now be enabled between all servers. Scripts are provided to enable and disable SSL.
Upgrade script removed: The
upgrade-rs-0.8-0.9.sh
script has been removed from the installation package. This script only supported upgrading the Rosette Server release when installing 0.9.0.Rosette Server version: The Rosette Server version is 1.17.5.
Bug Fixes
Rosette Server will no longer generate a null pointer exception when the indocCoref processor is disabled and a simple response is requested.
Known Issues
The healthcheck scripts cannot check connectivity in SSL-enabled environments since they lack access to the cacert, certificates, and keys. The healthcheck scripts can be run after installation but before enabling SSL.
When SSL is initially enabled on Rosette Server, the wrapper.log will print an exception. This exception can be safely ignored and is due to a transport rule containing an http route in an SSL enabled environment.
Release 0.9.1
October 2020
Note
When you reinstall the Adaptation Studio package, all components are reinstalled. Any customizations you make to configuration files should be saved and reapplied after installation.
New
Word classes status: The manage page now includes the training status of word classes. You can export a model before word classes are available.
Filter improvements: You can now filter on labels in the View Annotations page.
Adjudication counts modified: The Adjudicated field on the Project Dashboard now includes the total samples adjudicated, both manually and auto-adjudicated.
Case-insensitive models: You can now build case-insensitive models.
Help file: The Help file has been updated to match the User Guide.
Restart policy added: The restart policy was added to all services. The default is
restart: "no"
.Installation Improvements: We've changed the following files in the Adaptation Studio installation.
Renamed services adding
ras_
prefix (ras_server
,ras_proxy
) to disambiguate from other services running on the machine.Removed the
./config
directory from the deployment; it is no longer used.Improved
docker-compose.yml
file by removing incorrect comments and unnecessary volume declarations.Installing SSL now comments out non-used port rather than deleting the declaration.
Rosette Server version: The Rosette Server version is 1.17.4.
Bug Fixes
Export Model now always downloads the model. Previously the model was not always downloaded as expected.
Installation script no longer creates unused nginx files.
The Rosette Server Entity Extractor no longer modifies the
normalized
field for mentions that are extracted by a custom processor.
Known Issues
To force a model to be trained, for example after word classes are completed you may want to retrain a model, you must annotate a couple of samples. The models are automatically training as you annotate samples.
Usage Note
The ad-suggestions
profile deployed with Adaptation Studio is designed for sole use of the Adaptation Studio server. Calling the Rosette Server /entities endpoint using the typical calling conventions and specifying the ad-suggestions
profile will have undetermined results. This is because the ad-suggestions
profile expects hidden parameters to be passed in the call and for an RTS model to have been trained beforehand. Additionally, the response differs from a typical response from the /entities endpoint.
The ad-suggestions
profile can be modified like any other profile to customize the behavior of Adaptation Studio. You'll need to use a separate "testing" profile to test the changes.
Create a new custom profile for testing.
Modify the profile with the customizations (such as gazetteers) you want to implement.
Test the changes with the regular /entities endpoint, specifying the profile.
Once testing is complete, apply the customizations to the
ad-suggestions
profile.
Refer to Section 10.2 of the Adaptation Studio User Guide for information on how to create a custom profile.
Release 0.9
October 2020
Upgrade Script
Rosette Server does not require a full install when upgrading from 0.8 to 0.9. Only the roots for the additional languages need to be added; there is no change to the rest of the installation.
To upgrade Rosette Server from 0.8 to 0.9:
Stop Rosette Server
Unzip the file
rs-installation-0.9.zip
From the directory
rs-installation-0.9
, run the upgrade script:./upgrade-rs-0.8-0.9.sh
Start Rosette Server
New
New languages: Adaptation Studio now supports Arabic, Chinese, Korean, and Russian in addition to English and Japanese.
Use Basis Training data: You now have the option of using the Basis training data to augment the stock model or to build a model from scratch with just the annotations provided by the Studio. Note that you cannot add new entity types when using the Basis training data.
UI support for case sensitive and case insensitive models: When creating a new project you now select whether the trained model will be case sensitive or case insensitive. Note: you cannot build case sensitive models in this release.
User management improvements: Users can now modify their personal information, including password.
Cache management: Dormant models are now automatically ejected from the REX training server memory.
Healthcheck scripts: Scripts are now provided to check the health of each of the servers.
Status improvements: The Manage page now displays system status, model status, and project status.
Online help available: The User Guide is now available from the Adaptation Studio page.
New labels: Adding and modifying labels has been improved.
Reconcile added: The Finalize task on the project menu has been renamed to Reconcile to more accurately reflect the task performed.
Install improvements: Containers are now zipped, so the install is slightly smaller and the containers load about 20% faster.
Improved error handling: Error handling has been improved in the Rosette Server and REX Training Server installers.
Bug Fixes
Unused labels can now be deleted, even if they were previously in use.
All errors that occur while loading documents are now displayed.
RTS_URL
is now updated in thenginx.conf
file after install. The value in the.env
file is now used.
Known Issues
Case sensitive and case insensitive models: You can only build case sensitive models in this release.
Help file: The section Adjudicate in the online Help does not match the section in the User Guide. The User Guide is the latest version.
REX Training Server log file messages: The log file for the REX Training Server may contain multiple
train-model request failed
messages. These are not actual failures. RTS ignores new training requests from a project while there is an active training for that project in process and generates this message. RTS is working as expected.
Release 0.8
September 2020
New Features
Japanese is now supported.
The newly-trained models can now be used along with the standard REX extractions to select samples and make label predictions.
Adaptation Studio supports extracting entities by mixing the newly trained model with all other REX extractions including the standard REX statistical model, gazetteers, and regexes. Which extractions you use depends on your requirements and is configured through the
ad-suggestions
custom profile and may require experimentation.Adaptation Studio now supports model permanence across restarts. If the Rosette Training Server (RTS) crashes at any point in the annotation process, it's able to recover from the crash by reloading the model back into memory from disk.
Known Issues
The training data for the new model comes only from the annotations provided by the user and does not currently use the existing REX model training data.
We recommend waiting fifteen minutes after the last annotation before downloading the model to ensure all annotations have been incorporated into the training of the model. If not provided sufficient time, the downloaded model may be an earlier version that doesn't include the latest annotations.
Release 0.7
August 2020
Supported languages: English.
This release of Adaptation Studio trains new models for REX as annotations are completed, but the newly-trained models are not used by the Active Learning to select samples or make label predictions.
The Export Model option in the Manage page downloads the newly-trained model. Copy this file into Rosette Server to deploy the new model.
The NER-Rosette template is included in this package.