Event Extractor
https://analytics.babelstreet.com/rest/v1/events
curl -s -X POST \ -H "X-BabelStreetAPI-Key: your_api_" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "Cache-Control: no-cache" \ -d '{"content": "John traveled to London last Thursday."}' \ "https://analytics.babelstreet.com/rest/v1/events"
An event is a dynamic situation that unfolds. Event extraction analyzes unstructured text and extracts event mentions. An event model is trained to extract specific types of events. To use the endpoint, you must first train a model to extract the event types you are interested in. Events are dependent on both the structure of your data, as well as the information you are interested in extracting. There is no standard or default model for event extraction.
An event mention consists of a key phrase and one or more role mentions.
A key phrase is a word or phrase in the text that evokes the given event type.
Roles are entity mentions. i.e. people, places, times, and other mentions, which add detail to the key phrase. Roles have a name indicating the type of role.
As an example, let's consider a trip event:
Bob flew from Boston to Los Angeles.
The key phrase is flew. Other lemmas of flew would also be identified as key phrases: flying and flies, for example.
The roles are:
Bob, traveler
Boston, origin
Los Angeles, destination
The key phrases (flew) and roles (traveler, origin, destination) were all defined in advance and a model trained to extract them. The event mention would identify the role mentions: Bob, Boston, Los Angeles.
The event type for flying could have other roles defined, such as when (a date or time). Not all roles must be extracted for all event mentions. The schema, which defines the key phrases and roles, defines which roles are required. If a role is required, the event will not be extracted without a role mention.
Sample event model
The ability of a model to extract events depends on how the model was trained.
How well the schema describes the events you want to extract. The schema defines the event key phrases, as well as the roles that describe the event. Only defined key phrases and roles will be extracted from a sample.
How similar the structure of the data is to the data the model was trained on. You will get better results if the data the model was trained on is similar to the input documents.
It is expected that you will train an event model for your specific use case. The events endpoint includes a sample model trained on simple sentences describing travel and meeting events for demo purposes only.
curl -X POST "https://analytics.babelstreet.com/rest/v1/events" \ -H "accept: application/json" \ -H "X-BabelStreetAPI-Key: <your_api_key"\ -H "Content-Type: application/json" \ -d'{"content":"John flew to London"}'
Query Parameters
Name | Value | Description |
---|---|---|
output | rosette | Returns the response in ADM format. |
Note
All input parameters, including the text being analyzed and any relevant options, are defined in the request body.
Request
Name | Type | Description | Required? |
---|---|---|---|
| string | Text to process | Required |
| string | Three-letter ISO 693-3 language code | Optional |
Important
Input documents for event extraction should be no larger than 4K characters.
Do you know the language of your input?
If you know the language of your input, include the three-letter language code in your call. This will speed up the response time.
Otherwise, the endpoint will identify the language automatically.
While events will identify the language automatically, if the language is misidentified, the correct events model will not be used. We recommend you include the language code in your call, where possible.
If no language is provided, and events is unable to auto-detect it, an endpoint may provide a “Language xxx
is not supported” error, where xxx
indicates the language was not determined.
Option | Type | Description | Required? |
---|---|---|---|
| string | The id of a single events workspace. | Optional |
| string | A list of languages and workspaces. Allows multiple event models to be used in a single call. | Optional |
| string | Determines whether to evaluate the event for negation.
English Only | Optional |
Either workspaceId
or plan
can be provided as an option. Both cannot be used in the same call. When using plan
, the workspaceId
is provided within the plan.
Response
{ "events": [ { "eventType": "string", "mentions": [ { "startOffset": 0, "endOffset": 0, "roles": [ { "startOffset": 0, "endOffset": 0, "name": "string", "id": "string", "dataSpan": "string", "confidence": "string", "extractorName": "string", "roleType": "string" } ], "polarity": "string", "negationCues": [ { "startOffset": 0, "endOffset": 0, "dataSpan": "string" } ] } ], "confidence": 0, "workspaceId": "string" } ] }
Event negation
Note
The negation
option is only available for English models.
The base event algorithm extracts events when a key phrase and any required role mentions are detected in the document. It does not recognize whether the event happened or didn't happen, also known as the polarity of the event. For example, in a travel event, the following two sentences will both be extracted by the key phrase "travel":
John[TRAVELER] traveled[KEYPHRASE] to London[DESTINATION].
Charles[TRAVELER] didn't travel[KEYPHRASE] to Paris[DESTINATION].
In the example above, "didn't" is an example of a negation cue. The existence of the cue signifies the event is negated.
You can choose to include or ignore negation when you call the events endpoint. The negation
option has 4 values:
Ignore
: (default) Returns all events and the negation cue (didn't in the above example) isn't included in the response.Both
: Returns all events, positive and negative, with the negation cue included in the response.Only_positive
: Returns only positive events. An empty negation cue may be included in the response.Only_negative
: Returns only negative events; a negation cue will be returned.
By default, if you do not pass in a negation
parameter, the sentences above return the same event values.
When both
, only_positive
, or only_negative
options are selected, the polarity is included in the response, with the negation cue, if it exists.
The following example had negation set to both in the request.
{ "events": [ { "eventType": "flight_booking_schema_new_schema.TRAVEL", "mentions": [ { "startOffset": 0, "endOffset": 23, "roles": [ { "startOffset": 0, "endOffset": 4, "name": "TRAVELER", "id": "T0", "dataSpan": "John", "confidence": 0.90569645, "extractorName": "flight_booking_schema_new_schema.per_title", "roleType": "flight_booking_schema_new_schema.PER_TITLE" }, { "startOffset": 5, "endOffset": 13, "name": "key", "id": "E1", "dataSpan": "traveled" }, { "startOffset": 17, "endOffset": 23, "name": "DESTINATION", "id": "Q84", "dataSpan": "London", "confidence": 0.6654963, "extractorName": "flight_booking_schema_new_schema.location-entity", "roleType": "flight_booking_schema_new_schema.location" } ], "polarity": "Positive", "negationCues": [] } ], "confidence": 1, "workspaceId": "650c4c891c39afa1b071dae3" }, { "eventType": "flight_booking_schema_new_schema.TRAVEL", "mentions": [ { "startOffset": 25, "endOffset": 55, "roles": [ { "startOffset": 25, "endOffset": 32, "name": "TRAVELER", "id": "T2", "dataSpan": "Charles", "confidence": 0.72164702, "extractorName": "flight_booking_schema_new_schema.per_title", "roleType": "flight_booking_schema_new_schema.PER_TITLE" }, { "startOffset": 40, "endOffset": 46, "name": "key", "id": "E2", "dataSpan": "travel" }, { "startOffset": 50, "endOffset": 55, "name": "DESTINATION", "id": "E3", "dataSpan": "Paris", "extractorName": "flight_booking_schema_new_schema.location-entity", "roleType": "flight_booking_schema_new_schema.location" } ], "polarity": "Negative", "negationCues": [ { "startOffset": 33, "endOffset": 39, "dataSpan": "didn't" } ] } ], "confidence": 0.89116663, "workspaceId": "650c4c891c39afa1b071dae3" } ] }
Extracting from multiple event models
The events endpoint can support event extraction from multiple event models in a single call.
Each event extraction model is for a single language.
A model is identified by a
workspaceId
.A plan specifies a list of event models (identified by
workspaceId
) to be used to extract event mentions. The models are listed by language.
Through the plan
options the user can specify a list of event extraction models to be used when extracting event mentions from a document.
If no workspaceId
or plan
is specified, then all events models in the instance are used for extraction.
Only models matching the language of the content are called. This can be explicitly set by passing the language
code in the call or events will identify the language. Each model is called serially. The response time will increase as additional models are added to the search. It is still faster, however, than making multiple individual calls to each event model.
For each event mention extracted, the response will include the workspaceId
of the model which extracted the event mention. Each entity extracted will include the customProfileId
(if any) which the extracted entity came from.
Only a single event extraction model is called.
{ "content": "string", "language": "string", "options": { "workspaceId": "string" }
All event extraction models that match the language of the content string are called. Multiple event mentions may be returned, from different event models.
{ "content": "string", "language": "string", }
Multiple event extraction models are called in a single request. Only the models where the languageCode
matches the language of the content string are called. Multiple event mentions may be returned, from different event models.
{ "content": "string", "language": "string", "options": { "plan": { "string": [ "string" ] } }
The following example requests events extracted from the content string using the english (eng
) language models mult-1
, mult-2
, and mult-3
.
{content": "I want flights from Boston to New York", "language": "eng", "options": { "plan": { "eng": ["multi-1", "multi-2","multi-3" ] } }
Event schema
GET /events/info
GET /events/info?workspaceId={wid}
The event schema defines the event types you are extracting. It includes key phrases, roles, role types, and extractors.
For each key phrase and role, there is a role-type. A role type is made up of one or more extractors. Extractors are reusable components which define the rules and techniques to identify roles and key phrases.
The supported extractor types are:
Entity: A list of entity types. You can use the standard, pre-defined entity types or train a custom model to extract other entity types. The custom model must be loaded in Server to define an entity extractor with custom entity types.
Exact: a list of words or phrases. Exact will match any words on the list, whether they are identified as entity types or not. For example, you could have a list of common modes of transportation, including armored personnel carrier and specific types of tanks.
Morphological: A list of words. When a word is added to this list, it is immediately converted to and stored as its lemma. Words with the same lemmatization will match. For example, a morphological extractor for go will match going, went, goes, gone.This is the only extractor type valid for key phrases.
Semantic: A list of words or phrases. Any word whose meaning is similar to one of these words will match. For example, an extractor of meeting will match assembly, gathering, conclave. Word vector similarity is used to identify similar words. While a semantic extractor can be defined by a phrase, it will only identify single words as candidate roles.
You cannot modify the schema for a trained model. You can view it through the /events/info
endpoint.
GET /events/info
Returns the list of all models currently installed in the system along with the schemas used to create the models.GET /events/info?workspaceId={wid}
Returns the schema used to create the model, wherewid
is the workspace identifier for the particular events model.
Supported languages
You can specify the language of your input with the three-letter language code. If you do not specify the language, then the endpoint automatically detects it.
Arabic (
ara
)Chinese (
zho
)English (
eng
)German (
deu
)Hungarian (
hun
)Japanese (
jpn
)Korean (
kor
)Russian (
rus
)