Skip to main content

Babel Street Analytics API

Categorizer

https://analytics.babelstreet.com/rest/v1/categories

https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/categories.curl

Rosette Categorizer analyzes the contents of a string in English and returns the contextual category represented by the string. The possible categories are the Tier 1 contextual categories defined by the IAB Tech Lab Content Taxonomy.

Both single label and multi-label categorization are supported. By default, the categorizer is set to multi-label and returns all relevant category labels with a raw score above a threshold. If no categories exceed the threshold, an empty set is returned.

In addition to a raw score, between negative infinity and infinity, each category label is returned with a confidence score between 0 and 1. The sum of the confidence scores for each category in a given document = 1. The confidence scores reflect the likelihood that a given category label is accurate relative to all other possible category labels. For every document, all twenty-one possible category labels are assigned a confidence score, and the sum of all these scores is one. If multiple labels are relevant to a given document, the confidence scores for each category can be relatively low.

Before analyzing, Categorizer filters out stop words and punctuation, such as “the” “?” “a” “it”, to increase the accuracy of the analysis.

Table 11. Categories

Arts & Entertainment

Travel

Business

Automotive

Education

Careers

Food & Drink

Family & Parenting

Hobbies & Interests

Health & Fitness

Law, Gov’t & Politics

Home & Garden

Personal Finance

Pets

Real Estate

Religion & Spirituality

Science

Sports

Society

Technology & Computing

Style & Fashion



Note

You can train your own categorization models using our Categorizer Field Training Kit. Contact support for more information.

Query Parameters

Name

Value

Description

output

rosette

Returns the response in ADM format.

Note

All input parameters, including the text being analyzed and any relevant options, are defined in the request body.

Request

To return only a single category label per document, set {"options": {"singleLabel": true}} and only the category with the highest confidence score is returned. When using this option, a category will always be returned, along with a raw score and a confidence score.

Option

Type

Description

Required

Default

scoreThreshold 

number

Categories with raw score values above this number are returned.

no

-0.25

singleLabel 

boolean

When true, returns only the highest scoring category.

no

false

{
  "content": "string",
  "language": "string",
  "options": {  
    "scoreThreshold": -0.25,
    "singleLabel": "false"
  }
}

Response

For each category where the raw score exceeds the scoreThreshold, the Categorizer returns the category name, a confidence score, and the raw score. The confidence scores of all categories, even those not displayed, adds up to 1.

{
  "categories": [
  {
     "label": "string", 
     "confidence": number,
     "score": number
  }
 ]
}

Supported languages

GET /categories/supported-languages 

Returns the list of supported languages and scripts for the endpoint, along with whether you have a license for the language.

Categorizer currently only supports English language input. If you are processing English text and want to ignore case, specify the language code uen.

Response

Field

Type

Description

language

string

ISO 639 language code

script

string

Four-letter ISO-15924 script code

licensed

boolean

Indicates if you are licensed for this language

{
  "supportedLanguages": [
    {
      "language": "string",
      "script": "string",
      "licensed": boolean
    }
  ]
}