Morphology

https://analytics.babelstreet.com/rest/v1/morphology/{morphoFeature}

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/morphology_complete.curl

The morphological analysis endpoint provides language-specific tools for returning part of speech, lemmas (dictionary form), compound components, and Han readings for each token in the input.

Append a morphoFeature to the morphology/endpoint to specify which feature you want returned, or complete to return all features.

morphoFeature	Description
complete	Returns all results for all features available for the language of the input text.
lemmas	Returns the lemmas or dictionary forms.
parts-of-speech	Returns the parts of speech where each language has its own set of POS tags.
compound-components	Decomposes Chinese, Danish, Dutch, German, Hungarian, Japanese, Korean, Norwegian, and Swedish compounds, returning the lemmas of each of the components. This can improve recall for search engines.
han-readings	For Chinese tokens in Han script, pinyin transcriptions are returned as the Han reading. For Japanese tokens in Han script (kanji), hiragana transcriptions are returned as the Han reading.

Do you know the language of your input?

If you know the language of your input, include the three-letter language code in your call. This will speed up the response time.

Otherwise, the endpoint will identify the language automatically.

Tip

Try it in the interactive documentation

Complete

https://analytics.babelstreet.com/rest/v1/morphology/complete

You can call the complete set of morphology features and Rosette returns the lemmas, compound components, Han readings, and parts of speech tags for the input text.

The table above shows the features supported for each input language.

Lemmas

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/morphology_lemmas.curl

https://analytics.babelstreet.com/rest/v1/morphology/lemmas

A lemma is the dictionary form of a word. Morphology determines the lemma of each word in the input based on its usage and context.

For example, saw can be used as a noun or a past tense verb.

In the sentence “The carpenter picked up the saw from the workbench,” the lemma “saw”, a noun, is returned.

However, in the sentence “The bird saw the worm in the shade of the tree,” the lemma “see”, the dictionary-form of the verb, is returned.

Parts of Speech

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/morphology_parts-of-speech.curl

https://analytics.babelstreet.com/rest/v1/morphology/parts-of-speech

The morphology endpoint grammatically analyzes text to determine the role of each word within the input. The Parts of Speech feature returns a part-of-speech (POS) tag for each of the words, depending on the context of how it is used.

For example, spoke can be classified as a noun or a verb. “The wheel spoke creaked,” (noun) compared to, “She spoke the truth” (verb).

Compound Components

https://analytics.babelstreet.com/rest/v1/morphology/compound-components

With the decompounding feature, compound words are broken into sub-components and returned as individual elements. This is useful for increasing search relevancy in languages such as German and Korean.

For example, the German compound word “Rechtsschutzversicherungsgesellschaften” means “legal expenses insurance companies”.

The compound components: “Recht”, “Schutz”, “Versicherung”, and “Gesellschaft” are returned.

Han Readings

https://analytics.babelstreet.com/rest/v1/morphology/han-readings

The Han readings feature provides pronunciation information for Han script, in both Chinese and Japanese input text. The algorithm selected will impact the Han reading returned.

For Chinese tokens in Han script, by default pinyin transcriptions are returned using diacritics. Multiple possible readings may be returned for each word. If you call morphology with “美国大选中的”, it returns these Han readings: [[“měiguó”], [“dàxuǎn”], [“zhōng”, “zhòng”], [“de”, “di”, “dī”, “dí”, “dì”]]. Note that the last two tokens each contain multiple possible readings.

For Japanese tokens in Han script (kanji), by default hiragana transcriptions are returned. If you call morphology with “医療番組”, it returns these Han readings: “いりょう”, “ばんぐみ”.

You can specify that the perceptron algorithm should be used instead of the default algorithm by setting the modelType to perceptron.

For Chinese tokens, using the perceptron algorithm, one pinyin transcription per token, using digits, is returned. If you call with “美国大选中的”, it returns these Han readings: “Mei3-guo2”, “da4-xuan3”, “zhong1”, “de0”.
For Japanese, using the perceptron algorithm, katakana transcriptions are returned. If you call with “医療番組”, it returns these Han readings: “イリョウ”, “バングミ”.

Query Parameters

Name	Value	Description
output	rosette	Returns the response in ADM format.

Note

All input parameters, including the text being analyzed and any relevant options, are defined in the request body.

Request

Name	Type	Description
`content`	string	Text to process
`contentUri`	string	URI to accessible content
`language`	string	ISO 639 language code

Notice

content and contentUri are mutually exclusive; only one can be specified per call.

Option	Type	Description	Default
`modelType`	string	Model type to use for Thai analyses. Valid values `default` , `perceptron`, `DNN`. For Korean input without spaces, use `DNN`.	`default`
`disambiguatorType`	string	For Hebrew only, determines the disambiguator used. Valid values are `perceptron`, `DNN`, `dictionary`.	`perceptron`
`disambiguate`	Boolean	Indicates whether the analyzers should disambiguate the results.	`true`
`query`	Boolean	Indicates the input will be queries, likely incomplete sentences. If true, analyzers may change their behavior.	`false`
`partOfSpeechTagSet`	string	Selects which part of speech tag sets to return. Valid values are `basis` and `upt16`.	`upt16`

{
  "content": "string",
  "language": "string",
  "options": {
    "modelType": "string",
    "disambiguatorType": "string",
    "disambiguate": boolean,
    "query": boolean,
    "partOfSpeechTagSet": "string"
  }
}

Response

{
  "tokens": [
    "string"
   ],
  "posTags": [
    "string"
  ],
  "lemmas": [
    "string"
  ],
  "compoundComponents": [
     "string"
  ],
  "hanReadings": [
     "string"
  ]  
}

Supported languages

GET /morphology/supported-languages

Returns the list of supported languages and scripts for the endpoint, along with whether you have a license for the language.

Response

Field	Type	Description
`language`	string	ISO 639 language code
`script`	string	Four-letter ISO-15924 script code
`licensed`	boolean	Indicates if you are licensed for this language

{
  "supportedLanguages": [
    {
      "language": "string",
      "script": "string",
      "licensed": boolean
    }
  ]
}

Babel Street Analytics API

Morphology

Do you know the language of your input?

Tip

Complete

Lemmas

Parts of Speech

Compound Components

Han Readings

Query Parameters

Note

Request

Notice

Response

Supported languages

Response

Search results