Categorizer
https://analytics.babelstreet.com/rest/v1/categories
https://raw.githubusercontent.com/rosette-api/curl-examples/develop/examples/categories.curl
Rosette Categorizer analyzes the contents of a string in English and returns the contextual category represented by the string. The possible categories are the Tier 1 contextual categories defined by the IAB Tech Lab Content Taxonomy.
Both single label and multi-label categorization are supported. By default, the categorizer is set to multi-label and returns all relevant category labels with a raw score above a threshold. If no categories exceed the threshold, an empty set is returned.
In addition to a raw score, between negative infinity and infinity, each category label is returned with a confidence score between 0 and 1. The sum of the confidence scores for each category in a given document = 1. The confidence scores reflect the likelihood that a given category label is accurate relative to all other possible category labels. For every document, all twenty-one possible category labels are assigned a confidence score, and the sum of all these scores is one. If multiple labels are relevant to a given document, the confidence scores for each category can be relatively low.
Before analyzing, Categorizer filters out stop words and punctuation, such as “the” “?” “a” “it”, to increase the accuracy of the analysis.
Arts & Entertainment | Travel |
Business | Automotive |
Education | Careers |
Food & Drink | Family & Parenting |
Hobbies & Interests | Health & Fitness |
Law, Gov’t & Politics | Home & Garden |
Personal Finance | Pets |
Real Estate | Religion & Spirituality |
Science | Sports |
Society | Technology & Computing |
Style & Fashion |
Note
You can train your own categorization models using our Categorizer Field Training Kit. Contact support for more information.
Query Parameters
Name | Value | Description |
---|---|---|
output | rosette | Returns the response in ADM format. |
Note
All input parameters, including the text being analyzed and any relevant options, are defined in the request body.
Request
To return only a single category label per document, set {"options": {"singleLabel": true}}
and only the category with the highest confidence score is returned. When using this option, a category will always be returned, along with a raw score and a confidence score.
Option | Type | Description | Required | Default |
---|---|---|---|---|
| number | Categories with raw score values above this number are returned. | no | -0.25 |
| boolean | When true, returns only the highest scoring category. | no | false |
{ "content": "string", "language": "string", "options": { "scoreThreshold": -0.25, "singleLabel": "false" } }
Response
For each category where the raw score exceeds the scoreThreshold
, the Categorizer returns the category name, a confidence score, and the raw score. The confidence scores of all categories, even those not displayed, adds up to 1.
{ "categories": [ { "label": "string", "confidence": number, "score": number } ] }
Supported languages
GET /categories/supported-languages
Returns the list of supported languages and scripts for the endpoint, along with whether you have a license for the language.
Categorizer currently only supports English language input. If you are processing English text and want to ignore case, specify the language code uen
.
Response
Field | Type | Description |
---|---|---|
| string | ISO 639 language code |
| string | Four-letter ISO-15924 script code |
| boolean | Indicates if you are licensed for this language |
{ "supportedLanguages": [ { "language": "string", "script": "string", "licensed": boolean } ] }