Skip to main content

Babel Street Analytics API

Event Extractor

https://analytics.babelstreet.com/rest/v1/events

curl -s -X POST \
    -H "X-BabelStreetAPI-Key: your_api_" \
    -H "Content-Type: application/json" \
    -H "Accept: application/json" \
    -H "Cache-Control: no-cache" \
    -d '{"content": "John traveled to London last Thursday."}' \
      "https://analytics.babelstreet.com/rest/v1/events"

An event is a dynamic situation that unfolds. Event extraction analyzes unstructured text and extracts event mentions. An event model is trained to extract specific types of events. To use the endpoint, you must first train a model to extract the event types you are interested in. Events are dependent on both the structure of your data, as well as the information you are interested in extracting. There is no standard or default model for event extraction.

An event mention consists of a key phrase and one or more role mentions.

  • A key phrase is a word or phrase in the text that evokes the given event type.

  • Roles are entity mentions. i.e. people, places, times, and other mentions, which add detail to the key phrase. Roles have a name indicating the type of role.

As an example, let's consider a trip event:

Bob flew from Boston to Los Angeles.

The key phrase is flew. Other lemmas of flew would also be identified as key phrases: flying and flies, for example.

The roles are:

  • Bob, traveler

  • Boston, origin

  • Los Angeles, destination

The key phrases (flew) and roles (traveler, origin, destination) were all defined in advance and a model trained to extract them. The event mention would identify the role mentions: Bob, Boston, Los Angeles.

The event type for flying could have other roles defined, such as when (a date or time). Not all roles must be extracted for all event mentions. The schema, which defines the key phrases and roles, defines which roles are required. If a role is required, the event will not be extracted without a role mention.

Sample event model

The ability of a model to extract events depends on how the model was trained.

  • How well the schema describes the events you want to extract. The schema defines the event key phrases, as well as the roles that describe the event. Only defined key phrases and roles will be extracted from a sample.

  • How similar the structure of the data is to the data the model was trained on. You will get better results if the data the model was trained on is similar to the input documents.

It is expected that you will train an event model for your specific use case. The events endpoint includes a sample model trained on simple sentences describing travel and meeting events for demo purposes only.

curl -X POST "https://analytics.babelstreet.com/rest/v1/events" \
-H "accept: application/json" \
-H "X-BabelStreetAPI-Key: <your_api_key"\ 
-H "Content-Type: application/json" \
-d'{"content":"John flew to London"}' 

Query Parameters

Name

Value

Description

output

rosette

Returns the response in ADM format.

Note

All input parameters, including the text being analyzed and any relevant options, are defined in the request body.

Request

Name

Type

Description

Required?

content

string

Text to process

Required

language

string

Three-letter ISO 693-3 language code

Optional

Important

Input documents for event extraction should be no larger than 4K characters.

Do you know the language of your input?

If you know the language of your input, include the three-letter language code in your call. This will speed up the response time.

Otherwise, the endpoint will identify the language automatically.

While events will identify the language automatically, if the language is misidentified, the correct events model will not be used. We recommend you include the language code in your call, where possible.

If no language is provided, and events is unable to auto-detect it, an endpoint may provide a “Language xxx is not supported” error, where xxx indicates the language was not determined.

Option

Type

Description

Required?

workspaceId

string

The id of a single events workspace.

Optional

plan

string

A list of languages and workspaces. Allows multiple event models to be used in a single call.

Optional

negation

string

Determines whether to evaluate the event for negation.

ignore, both, only_positive, only_negative

English Only

Optional

Either workspaceId or plan can be provided as an option. Both cannot be used in the same call. When using plan, the workspaceId is provided within the plan.

Response

{
  "events": [
    {
      "eventType": "string",
      "mentions": [
        {
          "startOffset": 0,
          "endOffset": 0,
          "roles": [
            {
              "startOffset": 0,
              "endOffset": 0,
              "name": "string",
              "id": "string",
              "dataSpan": "string",
              "confidence": "string",
              "extractorName": "string",
              "roleType": "string"
            }
          ],
          "polarity": "string",
          "negationCues": [
            {
              "startOffset": 0,
              "endOffset": 0,
              "dataSpan": "string"
            }
          ]
        }
      ],
      "confidence": 0,
      "workspaceId": "string"
    }
  ]
}

Event negation

Note

The negation option is only available for English models.

The base event algorithm extracts events when a key phrase and any required role mentions are detected in the document. It does not recognize whether the event happened or didn't happen, also known as the polarity of the event. For example, in a travel event, the following two sentences will both be extracted by the key phrase "travel":

  • John[TRAVELER] traveled[KEYPHRASE] to London[DESTINATION].

  • Charles[TRAVELER] didn't travel[KEYPHRASE] to Paris[DESTINATION].

In the example above, "didn't" is an example of a negation cue. The existence of the cue signifies the event is negated.

You can choose to include or ignore negation when you call the events endpoint. The negation option has 4 values:

  • Ignore: (default) Returns all events and the negation cue (didn't in the above example) isn't included in the response.

  • Both: Returns all events, positive and negative, with the negation cue included in the response.

  • Only_positive: Returns only positive events. An empty negation cue may be included in the response.

  • Only_negative: Returns only negative events; a negation cue will be returned.

By default, if you do not pass in a negation parameter, the sentences above return the same event values.

When both, only_positive, or only_negative options are selected, the polarity is included in the response, with the negation cue, if it exists.

The following example had negation set to both in the request.

{
  "events": [
    {
      "eventType": "flight_booking_schema_new_schema.TRAVEL",
      "mentions": [
        {
          "startOffset": 0,
          "endOffset": 23,
          "roles": [
            {
              "startOffset": 0,
              "endOffset": 4,
              "name": "TRAVELER",
              "id": "T0",
              "dataSpan": "John",
              "confidence": 0.90569645,
              "extractorName": "flight_booking_schema_new_schema.per_title",
              "roleType": "flight_booking_schema_new_schema.PER_TITLE"
            },
            {
              "startOffset": 5,
              "endOffset": 13,
              "name": "key",
              "id": "E1",
              "dataSpan": "traveled"
            },
            {
              "startOffset": 17,
              "endOffset": 23,
              "name": "DESTINATION",
              "id": "Q84",
              "dataSpan": "London",
              "confidence": 0.6654963,
              "extractorName": "flight_booking_schema_new_schema.location-entity",
              "roleType": "flight_booking_schema_new_schema.location"
            }
          ],
          "polarity": "Positive",
          "negationCues": []
        }
      ],
      "confidence": 1,
      "workspaceId": "650c4c891c39afa1b071dae3"
    },
    {
      "eventType": "flight_booking_schema_new_schema.TRAVEL",
      "mentions": [
        {
          "startOffset": 25,
          "endOffset": 55,
          "roles": [
            {
              "startOffset": 25,
              "endOffset": 32,
              "name": "TRAVELER",
              "id": "T2",
              "dataSpan": "Charles",
              "confidence": 0.72164702,
              "extractorName": "flight_booking_schema_new_schema.per_title",
              "roleType": "flight_booking_schema_new_schema.PER_TITLE"
            },
            {
              "startOffset": 40,
              "endOffset": 46,
              "name": "key",
              "id": "E2",
              "dataSpan": "travel"
            },
            {
              "startOffset": 50,
              "endOffset": 55,
              "name": "DESTINATION",
              "id": "E3",
              "dataSpan": "Paris",
              "extractorName": "flight_booking_schema_new_schema.location-entity",
              "roleType": "flight_booking_schema_new_schema.location"
            }
          ],
          "polarity": "Negative",
          "negationCues": [
            {
              "startOffset": 33,
              "endOffset": 39,
              "dataSpan": "didn't"
            }
          ]
        }
      ],
      "confidence": 0.89116663,
      "workspaceId": "650c4c891c39afa1b071dae3"
    }
  ]
}

Extracting from multiple event models

The events endpoint can support event extraction from multiple event models in a single call.

  • Each event extraction model is for a single language. 

  • A model is identified by a workspaceId

  • A plan specifies a list of event models (identified by workspaceId) to be used to extract event mentions. The models are listed by language.

Through the plan options the user can specify a list of event extraction models to be used when extracting event mentions from a document. 

If no workspaceId or plan is specified, then all events models in the instance are used for extraction.

Only models matching the language of the content are called. This can be explicitly set by passing the language code in the call or events will identify the language. Each model is called serially. The response time will increase as additional models are added to the search. It is still faster, however, than making multiple individual calls to each event model.

For each event mention extracted, the response will include the workspaceId of the model which extracted the event mention. Each entity extracted will include the customProfileId (if any) which the extracted entity came from.

Example 1. Single event model request

Only a single event extraction model is called.

{
  "content": "string",
  "language": "string",
  "options": {
    "workspaceId": "string"
  }


Example 2. All event models request

All event extraction models that match the language of the content string are called. Multiple event mentions may be returned, from different event models.

{
  "content": "string",
  "language": "string",
}


Example 3. Multiple event model request

Multiple event extraction models are called in a single request. Only the models where the languageCode matches the language of the content string are called. Multiple event mentions may be returned, from different event models.

{
  "content": "string",
  "language": "string",
  "options": {
    "plan": {
       "string": [
         "string"
        ]
    }
  }

The following example requests events extracted from the content string using the english (eng) language models mult-1, mult-2, and mult-3.

{content": "I want flights from Boston to New York",
  "language": "eng",
  "options":
 {  "plan": {
      "eng": ["multi-1", "multi-2","multi-3"
      ]
    }
  }


Event schema

GET /events/info

GET /events/info?workspaceId={wid}

The event schema defines the event types you are extracting. It includes key phrases, roles, role types, and extractors.

For each key phrase and role, there is a role-type. A role type is made up of one or more extractors. Extractors are reusable components which define the rules and techniques to identify roles and key phrases.

The supported extractor types are:

  • Entity: A list of entity types. You can use the standard, pre-defined entity types or train a custom model to extract other entity types. The custom model must be loaded in Server to define an entity extractor with custom entity types.

  • Exact: a list of words or phrases. Exact will match any words on the list, whether they are identified as entity types or not. For example, you could have a list of common modes of transportation, including armored personnel carrier and specific types of tanks.

  • Morphological: A list of words. When a word is added to this list, it is immediately converted to and stored as its lemma. Words with the same lemmatization will match. For example, a morphological extractor for go will match going, went, goes, gone.This is the only extractor type valid for key phrases.

  • Semantic: A list of words or phrases. Any word whose meaning is similar to one of these words will match. For example, an extractor of meeting will match assembly, gathering, conclave. Word vector similarity is used to identify similar words. While a semantic extractor can be defined by a phrase, it will only identify single words as candidate roles.

You cannot modify the schema for a trained model. You can view it through the /events/info endpoint.

  • GET /events/info  Returns the list of all models currently installed in the system along with the schemas used to create the models.

  • GET /events/info?workspaceId={wid} Returns the schema used to create the model, where wid is the workspace identifier for the particular events model.

Supported languages

You can specify the language of your input with the three-letter language code. If you do not specify the language, then the endpoint automatically detects it.

  • Arabic (ara)

  • Chinese (zho)

  • English (eng)

  • German (deu)

  • Hungarian (hun)

  • Japanese (jpn)

  • Korean (kor)

  • Russian (rus)