Tokenizer
https://analytics.babelstreet.com/rest/v1/tokens
https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/tokens.curl
Tokens identifies and separates each word into one or more tokens through advanced statistical modeling. A token is an atomic element such as a word, number, possessive affix, or punctuation.
The resulting token output minimizes index size, enhances search accuracy, and increases relevancy.
We offer two algorithms for Chinese and Japanese tokenization and morphological analysis. Prior to August 2018, the default algorithm was a perceptron. To return to that algorithm, set modelType
to perceptron
.
Query Parameters
Name | Value | Description |
---|---|---|
output | rosette | Returns the response in ADM format. |
Note
All input parameters, including the text being analyzed and any relevant options, are defined in the request body.
Request
Option | Type | Description | Default |
---|---|---|---|
| string | Model type to use for Thai analyses. Valid values For Korean input without spaces, use |
|
| string | For Hebrew only, determines the disambiguator used. Valid values are |
|
| Boolean | Indicates whether the analyzers should disambiguate the results. |
|
| Boolean | Indicates the input will be queries, likely incomplete sentences. If true, analyzers may change their behavior. |
|
| string | Selects which part of speech tag sets to return. Valid values are |
|
{ "content": "string", "language": "string", "options": { "modelType": "string", "disambiguatorType": "string", "disambiguate": boolean, "query": boolean, "partOfSpeechTagSet": "string" } }
Response
{ "tokens": [ "string" ] }
Supported languages
GET /tokens/supported-languages
Returns the list of supported languages and scripts for the endpoint, along with whether you have a license for the language.
Response
Field | Type | Description |
---|---|---|
| string | ISO 639 language code |
| string | Four-letter ISO-15924 script code |
| boolean | Indicates if you are licensed for this language |
{ "supportedLanguages": [ { "language": "string", "script": "string", "licensed": boolean } ] }