Entity Extractor

Introduction

Overview

Entities are the key actors in your text data: the organizations, people, locations, products, and dates mentioned in documents. Babel Street Analytics uncovers these entities, delivering structure, clarity, and insight to your data with adaptability, easy deployment, and consistent accuracy and performance across a broad range of languages and text genres.

Entity Extractor (Entity Extractor) ingests text and identifies people, locations, and organizations, in addition to many other entity types including product, date/time, URL, and email. These entities can be used to add structured metadata to a document or in downstream natural language processing (NLP) tasks, such as extracting themes and ideas, sentiment analysis, and relationship extraction.

Entity Extractor is deployed in Analytics Server as the /entities endpoint.

Entity Extraction Entity Extractor comes with multiple entity extraction processors along with a linker processor to link entities to a knowledge base. In case of conflicting entities, a redactor decides which entity extraction result “wins.” Entity Extractor has extensive customization features, including adding new entity patterns to the pattern-matching processor and new entity lists to the exact match processor. You can add a custom processor to systematically process Entity Extractor results. Numerous configuration settings let you fit Entity Extractor to your specific use case.

Entity Linking Entity Extractor has an entity linking processor which can identify the real-world entities extracted from the text as well as disambiguating between different entities with the same name. Entity linking can determine not only that "Tim Cook" is a person, but it can also determine who "Tim Cook" is and disambiguate between multiple possibilities. For example, is he the CEO of Apple or a political science scholar? The entity linking processor looks at the context of each extracted entity to link entities against Wikidata. Entity Extractor supports linking to other public knowledge bases as well as your organization's custom knowledge bases.

Adaptation & Customization Entity Extractor gives you a good start, but as with any natural language processor, you will need to configure and adapt Entity Extractor to your specific task for best results. Model Training Suite allows you to train a model on your domain-specific data or to add new entity types to the statistical model.

The statistical model is context-sensitive, meaning it identifies entities based on the context it appears in and thus can find names of people even if the name has been misspelled. It can also be trained on data that is more representative of your business case. Contact your sales representative or analyticssupport@babelstreet.com for more information on Model Training Suite.

Architecture

Basic Entity Extraction with Entity Extractor:

Using Babel Street Base Linguistics, Entity Extractor processes plain text input into sentences and tokens.
Entities are extracted by running the tokens through the statistical processor or DNN, regexes, and gazetteers. If the linker is enabled, the tokens are also run through the linker processor to link entities to a knowledge base.
Reject regexes and gazetteers may remove entities from the output. Some adjacent entities may be combined by the joiner into a single result. The final entities are selected by running the extractor results through the redactor.
The final extracted entities are returned as output.

Processors for entity extraction

Entity Extractor uses multiple complementary methods to identify entity mentions in the input text: statistical models, pattern matching, and exact matching. With Entity Extractor version 7.32, we added a deep neural network model which is currently in beta. Pattern-matching and exact matching processors can run in parallel with the statistical or the deep neural network processors, but the statistical and deep neural network processors cannot be used simultaneously.

Statistical Processor: The statistical processor that uses contextual features of the input to identify entities. Using computational linguistics, it has been trained on a body of annotated news stories to extract a variety of entities in a number of languages.
Pattern Matching Processor (regular expressions): Regular expressions (regexes) are a good way to identify language-specific entities and generic entities that appear in a variety of languages. You can modify the standard regexes that we supply, and add your own regexes.
Exact Matching Processor (gazetteers): Gazetteers (entity lists) return exact matches to a predefined list. The Entity Extractor distribution includes gazetteers for each language and a number of entity types, and a cross-language gazetteer for corporation names (as the name of the corporation does not generally change when it enters international markets). You can modify the standard gazetteers that we supply and add your own gazetteers to extract new entities or entity types.
Deep Neural Network Processor: This processor uses a model trained using a deep neural network. It is slower than the statistical processor, but has shown an error reduction of about 10% for English and Arabic and 30% for Korean, as measured by F-Score, for extracting person, location, organization, and titles. The model is trained on the same data as the statistical model. The model is based on an LSTM neural network and is backed by the TensorFlow library.
Note
The deep neural network processor is currently available in English, Arabic, Hebrew, and Korean.
Name Classifier Processor: This processor predicts entity types for text that lacks the syntactic context of complete sentences. It can extract entities from structured text, such as list items and tables, which typically contains text fragments instead of full sentences.

Redaction: When two processors return the same or overlapping entities, the redactor chooses an entity based on the length of the competing entity strings. You can also configure the redactor to choose which same-length mention to return based on entity type and/or processor.

Processors to customize results

These processors run on the extracted entities to further customize the results.

Joining: You can use a configuration file and the API to establish rules for joining adjacent entities into one (such as joining titles with personal names).

Rejections: You can define regexes and gazetteers to reject entities that otherwise may be returned.

Indoc Coref: In a single document, Entity Extractor chains together mentions that refer to the same entity (i.e., in-document coreference).

Optional functionality

Linker Processor: This processor extracts and links entity mentions to a knowledge base of known entities, each with a unique ID. This processor is disabled by default. Entity Extractor is shipped with a prepackaged default knowledge base linking entity mentions to a Wikidata QID. You can replace the default entity knowledge base with a custom knowledge base.

Notice

Currently, the linker performs its own entity extraction and does NOT use entities found by the default entity extraction processors (statistical, pattern-matching, exact-matching). Therefore, the linker processor’s entities will not necessarily match those from the default entity extraction processors.

Pronominal Resolver: Entity Extractor tries to resolve pronouns with their antecedent entities. This processor is disabled by default. The pronominal resolver is only available for English.

Standard entity types

Entity Extractor is pre-trained to extract the following entity types.

LOCATION
- A city, state, country, region, or other location that contains both a population and a government.
- A geographic place such as a body of water, mountain, park, or address.
- A structure such as a building or monument.
ORGANIZATION
- A corporation, institution, government agency, or other group of people defined by an established organizational structure.
PERSON
- A human identified by name, nickname, or alias.
TITLE
- Appellation associated with a person by virtue of occupation, office, birth, or as an honorific.
NATIONALITY
- Reference to a country or region of origin, such as American or Swiss.
RELIGION
- Reference to an organized religion or theology as well as its followers.
IDENTIFIER:CREDIT_CARDNUM
IDENTIFIER:DISTANCE*
IDENTIFIER:EMAIL
IDENTIFIER:LATITUDE_LONGITUDE*
IDENTIFIER:MONEY
IDENTIFIER:CURRENCY_AMT and IDENTIFIER:CURRENCY_TYPE
- If CURRENCY is enabled, MONEY extractions will be replaced with CURRENCY_AMT and CURRENCY_TYPE whenever possible (both AMT and TYPE can be extracted). If the extracted value cannot be split, MONEY may be extracted instead.
- To enable CURRENCY, set regexCurrencySplit to true. By default, it is set to false.
IDENTIFIER:PERSONAL_ID_NUM
IDENTIFIER:PHONE_NUMBER
IDENTIFIER:URL
IDENTIFIER:UTM*
- Geographical coordinates, expressed with the Universal Transverse Mercator System.
TEMPORAL:DATE
TEMPORAL:TIME

Entity types marked with a * are not returned by default. Activate them by instructing Entity Extractor to load the supplemental regexes in each language’s supplemental directory.

When the call includes {"options": {"includeDBpediaTypes": true}, Entity Extractor supports additional top-level entity types and over 700 additional types drawn from the DBpedia ontology. Entity linking must be enabled to return DBpedia entity types.

Adding new entity types

There are several ways to train Entity Extractor to extract entity types beyond the standard set.

Create new gazetteers (i.e., entity lists).
Create new regexes for entities that fit a pattern, such as telephone numbers.
You can train a new statistical model to extract different entity types using the Model Training Suite. Contact your sales representative or analyticssupport@babelstreet.com for more information on model training.

Language support

The following tables describe the entity types returned by the different processors for each supported language.

Key to processor used to identify each entity type:

S = statistical processor
G = exact matching processor (gazetteer)
R = pattern matching processor (regex)
L = entity linking available
D = deep neural network processor

Table 15. Statistical, Exact Match (Gazetteer) Extracted Entities, and Linked Entities

Language (ISO code)	Entity Type
	LOC	ORG	PER	PROD	TTL	NAT	REL
	Arabic `ara`	S/G/D/L	S/G/D/L	S/D/L	L	S	G	G
Chinese, Script-insensitive `zho`	S/G/L	S/G/L	S/L	L	S	G	G
Chinese, Simplified `zhs`	S/G/L	S/G/L	S/L	L	S	G	G
Chinese, Traditional `zhs`	S/G/L	S/G/L	S/L	L	S	G	G
Dutch `nld`	S/L	S/G/L	S/L	L	G
English `eng`	S/G/L/D	S/R/G/L/D	S/L/D	S/L	S	G	G
French `fra`	S/L	S/G/L	S/L	L	S
German `deu`	S/L	S/G/L	S/L	L	S
Hebrew `heb`	S/L/D	S/G/L/D	S/L/D	L
Hungarian `hun`	S/G/L	S/G/L	S/G/L	S/L	S
Indonesian `ind`	S/G/L	S/G/L	S/L	L
Italian `ita`	S/L	S/G/L	S/L	L	S
Japanese `jpn`	S/L	S/G/L	S/L	L	S	G	G
Korean `kor`	S/D/L	S/G/D/L	S/D/L	L	S	G	G
Malay, Standard `zsm`	S/G/L	S/G/L	S/L	L
Pashto `pus`	S/L	S/G/L	S/L	L	S
Persian `fas`	S/L	S/G/L	S/L	L	G	G	G
Portuguese `por`	S/L	S/G/L	S/L	L	S
Russian `rus`	S/L	S/G/L	S/L	L	S	G	G
Spanish `spa`	S/L	S/G/L	S/L	L	S
Swedish `swe`	S/L	S/G/L	S/L	L	S	S/G	S/G
Tagalog `tgl`	S/G/L	S/G/L	S/L	L
Urdu `urd`	S/L	S/G/L	S/L	L	G
Vietnamese `vie`	S/L	S/L	S/L	L	G	G	G

The following entity types are not returned by default:

Table 16. Rule-based Extracted Entities

Language (ISO Code)	Entity Type
Language (ISO Code)	CC#	Dist	EM	LATLNG	MONEY/CURRENCY	PERS ID	TEL#	URL	UTM	DATE	TIME
Arabic `ara`	R	R	R	R	R	R	R	R	R	R	R
Chinese, Script-insensitive `zho`	R	R	R	R	R	R	R	R	R	R	R
Chinese, Simplified `zhs`	R	R	R	R	R	R	R	R	R	R	R
Chinese, Traditional `zhs`	R	R	R	R	R	R	R	R	R	R	R
Dutch `nld`	R	R	R	R	R	R	R	R	R	R	R
English `eng`	R	R	R	R	R	R	R	R	R	R	R
French `fra`	R	R	R	R	R	R	R	R	R	R	R
German `deu`	R	R	R	R	R	R	R	R	R	R	R
Hebrew `heb`	R	R	R	R	R	R	R	R	R	R	R
Hungarian `hun`	R	R	R	R	R	R	R	R	R	R	R
Indonesian `ind`	R	R	R		R	R	R	R	R	R	R
Italian `ita`	R	R	R	R	R	R	R	R	R	R	R
Japanese `jpn`	R	R	R	R	R	R	R	R	R	R	R
Korean `kor`	R	R	R	R	R	R	R	R	R	R	R
Malay, Standard `zsm`	R	R	R		R	R	R	R	R	R	R
Pashto `pus`	R	R	R	R	R	R	R	R	R	R	R
Persian `fas`	R	R	R	R	R	R	R	R	R	R	R
Portuguese`por`	R	R	R	R	R	R	R	R	R	R	R
Russian `rus`	R	R	R	R	R	R	R	R	R	R	R
Spanish `spa`	R	R	R	R	R	R	R	R	R	R	R
Swedish `swe`	R	R	R	R	R	R	R	R	R	R	R
Tagalog `tgl`	R	R	R		R	R	R	R	R	R	R
Urdu `urd`	R		R		R	R	R	R	R
Vietnamese `vie`	R	R	R		R	R	R	R		R	R

Getting started

Install Analytics Server as described in the Analytics Server User Guide.

Using the SDK in an OSGi Bundle

Note

These steps are only necessary when Entity Extractor is configured to use linking.

The linker uses Java's ServiceLoader to dynamically discover and load connectors. This functionality does not work well when Entity Extractor is embedded in an OSGi service bundle without additional configuration, and Entity Extractor may fail on initialization. To use Entity Extractor inside an OSGi bundle, we recommend Apache's SPI Fly, a reference implementation of the OSGi ServiceLoader Mediator specification. Follow these steps to configure your OSGi project for Entity Extractor:

Visit https://aries.apache.org/modules/spi-fly.html and follow the instructions to include SPI Fly's Dynamic Weaving OSGi bundle and associated dependencies in your project by either using Maven to manage its dependency or by manually downloading and including them in your project.

Add the following lines to MANIFEST.MF file of the OSGi bundle that embeds the SDK:

Require-Capability: osgi.serviceloader; 
filter:="(osgi.serviceloader=com.basistech.rosette.flinx.api.service.KnowledgeBaseVariantFactory)";
cardinality:=multiple,osgi.extender; 
filter:="(osgi.extender=osgi.serviceloader.processor)",osgi.extender;
filter:="(osgi.extender=osgi.serviceloader.registrar)"
Provide-Capability: osgi.serviceloader;
osgi.serviceloader=com.basistech.rosette.flinx.api.service.KnowledgeBaseVariantFactory
SPI-Consumer: *

Configuring the Entity Extractor

The entity extraction endpoint (https://localhost:8181/rest/v1/entities) comes fully configured to extract entities. This guide explains how to modify the configuration of the extractor for your use case.

General configuration options

Parameter	Description	Default
`rootDirectory`	A Entity Extractor root directory contains language models and necessary configuration files.	`${rex-root}`
`rblRootDirectory`	The directory containing the RBL root for Entity Extractor to use.	`${rex-root}/rbl-je`
`snapToTokenBoundaries`	Regular expressions and gazetteers may be configured to match tokens partially independent from token boundaries. If true, reported offsets correspond to token boundaries.	`true`
`maxEntityTokens`	The maximum number of tokens allowed in an entity returned by Statistical Entity Extractor. Entity Redactor discards entities from Statistical Entity Extractor with more than this number of tokens.	`8`
`customProcessors`	Custom processors to add to annotators. See Creating Custom Processors for more details on custom processors.	`null`
`customProcessorClasses`	Register a custom processor class.	`null`
`excludedEntityTypes`	Entity types to be excluded from extraction.	`null`
`dataOverlayDirectory`	An overlay directory is a directory shaped like the `data` directory. Entity Extractor will look for files in both the overlay directory and the root directory, using files from both locations. However, if a file exists in both places (as identified by its path relative to the overlay or root data directory), Entity Extractor prefers the version in the overlay directory. If Entity Extractor finds a zero-length file in the overlay directory, it ignores both that file and any corresponding file in the root data directory.	`null`
`calculateSalience`	If true, entity chain salience values are calculated. Can be overridden by specifying `calculateSalience` in the API call.	`false`
`retainSocialMediaSymbols`	The option to retain social media symbols ('@' and '#') in normalized output	`false`
`keepEntitiesInInput`	The option to keep existing annotated text entities.	`false`
`structuredRegionProcessingType`	Configures how structured regions will be processed. It has three values: `none`, `nerModel`, and `nameClassifier`.	`none`
`regexCurrencySplit`	Determines if money values should be extracted as `MONEY` or `CURRENCY_AMT` and `CURRENCY_TYPE`. If true, Entity Extractor tries to extract `CURRENCY` instead of `MONEY`.	`false`

Handling structured regions

The Entity Extractor statistical model is trained to extract entities from unstructured text, where the model uses the syntactic context in sentences to help identify entities and entity types. But not all data is unstructured. Often input documents contain some sections of structured text, such as tables and lists, along with the unstructured text. Structured text usually does not contain full sentences and is often missing the syntactic context that Entity Extractor expects. This can lead to noisy results and false positives.

In addition to sentences and token, the Babel Street Base Linguistics processor identifies structured and unstructured regions. For structured regions, Entity Extractor disables the statistical processor. The text in structured regions is still processed by the rule-based processors (gazetteers and regexes) and the linker. Additionally, for some languages, another extractor, the name classifier, can extract entities from structured regions of text.

By default, structured regions are processed the same as unstructured regions.

To change how structured text is processed, set structuredRegionProcessingType in the rex-factory-config.yaml file. You can also set the value in the call as an option. It has three values:

none: (default) Disables the statistical/DNN models from processing structured regions. When set to none, Entity Extractor does not attempt to extract entities from structured regions using the statistical processor or DNN models. The rule-based extractors (gazetteers, regex) and the linker are used to process structured regions.
nerModel: Processes the entire document as unstructured text. Structured regions are processed the same as unstructured regions.
nameClassifier: Disables the statistical/DNN models from processing structured regions and enables the name classifier on the structured regions.

You can enable the Apache Tika processor to extract lists and tables for contentUri (HTML) input by setting the enableStructuredRegion option to true as the default in the rex-factory-config.yaml file or in the call as an option:

"options": {"enableStructuredRegion": true}

Some structured regions may contain enough syntactic context for the statistical/DNN models to accurately extract entities. You can set a minimum number of tokens required in a structured region to override the structured region processor setting. If the number of tokens in the region exceeds this minimum, the region will be processed with the statistical/DNN models. The default value is 0. With this default, all structured regions are processed as defined by the structuredRegionProcessingType.

To set the minimum number of tokens, change the value of RegionProcessingSentenceTokensMin in the rosette-factory-config.yaml file.

Fragment boundary detector

Note

Disabling the fragment boundary detector will classify the entire text as unstructured. This has a similar effect to setting structuredRegionProcessingType to nerModel.

Entity Extractor detects entities within sentences. By default, Entity Extractor uses a fragment boundary detector to identify structured regions, adding sentence boundaries at tabs, newlines, and multiple whitespace characters (such as 3 or more spaces) in text fragments, such as lists and tables. This enables the detection of multiple entities in text fragments that do not form standard sentences. Consider the following text:

George Washington
John Adams
Thomas Jefferson

Without the fragment boundary detector, the statistical model identifies the preceding text as a single PERSON entity. With the fragment boundary detector, the statistical model identifies three separate PERSON entities.

Turn off the fragment boundary detector in the rex-factory-config.yaml file.

#Regular expressions and gazetteers may be configured to match tokens 
#partially independent from token boundaries. If true, reported offsets 
#correspond to token boundaries.
snapToTokenBoundaries: false

Note

While the fragment boundary detector improves Entity Extractor's performance on tables, lists, and other non-prose content, Entity Extractor is, by design, tuned for prose and may not return high accuracy results on content with significant non-prose elements.

Overlay data directory

If your project has a set of unique data files that you would like to keep separate from other data files, you can put them in their own directory, also known as an overlay directory. This is an additional data directory, which takes priority over the default Entity Extractor data directory.

The overlay directory must have the same directory tree as the provided data directory. If an overlay directory is set, Entity Extractor searches both it and the default data directory.

If a file exists in both places, the version in the overlay directory is used.
If there is an empty file in the overlay directory, Entity Extractor will ignore the corresponding file in the default data directory.
If there is no file in the overlay directory, Entity Extractor will use the file in the default directory.

To specify the overlay directory use:

Create an overlay directory:
```
<install-directory>/my-data
```
Add the overlay directory to the rex-factory-config.yaml file:
```
dataOverlayDirectory:
  <install-directory>/my-data
```

Example 21. Turn Off a Specific Language Gazetteer

Create an overlay directory:
Add an empty file (gaz-LE.bin ) to the overlay directory:
```
my-data/gazetteer/eng/accept/gaz-LE.bin
```
Add the overlay directory to the rex-factory-config.yaml file:
```
dataOverlayDirectory:
  <install-directory>/my-data
```

The default English gazetteer will not be used in calls.

Example 22. Use a Custom German Reject Gazetteer

In the above example, add a reject gazetteer file:

my-data/gazetter/deu/reject/reject-names.txt

Entity salience

Entity Extractor can return a salience score for each extracted entity. Salience indicates whether the entity is important to the overall scope of the document. Returned salience scores are binary, either 0 (not salient) or 1 (salient). The decision is made according to several parameters, such as frequency, distance from document start, etc. Salience is not calculated by default.

To include the salience in a result for by call, add the option to the request:

"options": {"calculateSalience": true}

Or to get the salience by default, set the calculateSalience parameter to true in the rex-factory-config.yaml file.

#An option to calculate entity-chain salience values.
calculateSalience: true

Retrieving Base Linguistics configuration

Entity Extractor internally uses Babel Street Base Linguisticsto analyze the text before processing it. If the user application already uses Base Linguistics for other purposes, it's possible to save processing time and have Entity Extractor annotate pre-toxenized documents by passing Entity Extractor's annotator annotate function a tokenized AnnotatedText instance instead of a string. However, if the user's instance of Base Linguistics and Entity Extractor's internal instance of Base Linguistics are configured differently, Entity Extractor's results might be affected.

To solve the problem, EntityExtractor provides a getBaseLinguisticsParameters function that returns the set of Base Linguistics options Entity Extractor uses internally, given a language. This function should be called after the EntityExtractor has been otherwise configured. It returns an EnumSet of keys to the values Entity Extractor configures them to.

Tip

Entity Extractor provides a sample (rex-je-<version>/samples/RBLParametersSample.java) which demonstrates how to retrieve RBL parameters from Entity Extractor and use RBL directly to process documents before running the Entity Extractor extractor.

Modifying entity extraction processors

Entity Extractor provides multiple processors for extracting entities. You can optimize Entity Extractor for your entity extraction tasks by configuring the processors. Examples of the modifications you can make include:

Removing one or more processors
Adding gazetteers or gazetteer entries for selecting or rejecting entities
Adding regex files or individual regex entries
Adding custom processors
Customizing the statistical model with Model Training Suite.

Each processor has its own set of parameters to customize its behavior.

Selecting processors

By default, Entity Extractor uses all the processors. You can select to use a subset of the processors. For example, you can decide to return only entities extracted by statistical analysis.

Entity Extractor includes the following processors:

statistical: Entity extractor processor using a statistically-trained model
deepNeuralNetwork: Entity extractor processor using a model trained using a deep neural network
acceptGazetteer: Rule-based entity extractor based on gazetteers
acceptRegex: Rule-based entity extractor based on regular expressions
kbLinker: Entity extractor based on a knowledge base of known entities
redactor: Chooses an entity when multiple processors extract the same or overlapping entities
joiner: Joins adjacent entities into a single entity
rejectGazetteer: Rule-based entity rejector based on gazetteers
rejectRegex: Rule-based entity rejector based on regular expressions
indocCoref: Chains together mentions that refer to the same entity (in-document coreference)
pronominalResolver: Pronomial resolver
baseLinguistics: Extracts hashtags, urls, atmentions, and emails

The order of execution of the processors is determined internally and cannot be changed. Some processors are prerequisites for other processors. Entity Extractor will throw an exception if the processor list is missing a required processor.

Edit the rex-factory-config.yaml file, modifying the list of active processors for an entity extraction run.

Example 23. Return Statistical Entities Only

#List the set of active processors for an entity extraction run.
#All processors are active by default. This method provides a way 
#to turn off selected processors. The order of the processors cannot be changed. 
#Note that turning off redactor can cause overlapping and unsorted 
#entities to be returned.
#Default processors:
#acceptGazetteer,
#acceptRegex,
#rejectGazetteer,
#rejectRegex,
#statistical,
#indocCoref,
#redactor,
#joiner
#
processors: 
   statistical

Note

The redactor chooses among the entities when processors extract the same or overlapping entities. Turning off the redactor will return all entities found by all processors. This can cause overlapping and unsorted entities to be returned.

Statistical processor

The statistical processor uses models based on computational linguistics and human-annotated training documents. You can add other statistical models to improve extraction for your use case.

You can train a new statistical model to extract different entity types or to improve the results of the statistical model using the Model Training Suite. Contact your sales representative or analyticssupport@babelstreet.com for more information on model training.

Statistical model based extractions can return confidence scores for each entity. Confidence scores correlate well with precision and may be used for thresholding and removal of false positives. Confidence is calculated by default if linking is enabled. Otherwise, use the calculateConfidence parameter to enable confidence scores. To set a threshold value, use the confidenceThreshold parameter.

Table 17. Statistical Processor Parameters

Parameter	Description	Default
`calculateConfidence`	If true, entity confidence values are calculated. Can be overridden by specifying `calculateConfidence` in the API call.	`false`
`confidenceThreshold`	The confidence value threshold below which entities extracted by the statistical processor are ignored.	`-1.0`
`statisticalModels`	Additional files used to produce statistical entities for the given language. You may pass multiple statistical models. The parameter should be formatted in trios of values specfying language, case-sensitivity and the model file, separated by commas. Case-sensitivity can be `automatic`, `caseInsensitive` or `caseSensitive`. For example, setting two models for case-sensitive English and Japanese might look like : `eng,caseSensitive,english-model.bin,jpn,automatic,japanese-model.bin`	`null`
`caseSensitivity`	The capitalization (aka 'case') used in the input texts. Processing standard documents requires caseSensitive, which is the default. Documents with all-caps, no-caps or headline capitalization may yield higher accuracy if processed with the caseInsensitive value. Can be `automatic`, `caseSensitive` or `caseInsensitive`	`caseSensitive`

Adding a custom statistical model

Custom trained entity extraction models can be added to Entity Extractor, replacing or supplementing the standard model shipped with the product. Use Model Training Suite (MTS) to train the new models.

You must choose whether to extract entities using both the new and the default statistical models together, which we call model mixing, or if you want to exclusively use the new statistical model.

With model mixing, Entity Extractor runs both the new and the default models in parallel and uses the redactor module to adjudicate the overlapping results.

Note

You can customize the redactor to favor output from the new statistical model(s).

The trained models are moved from MTS to the production instance of Entity Extractor through the following steps:

Export the entity extraction model from MTS.
Rename the model.
Tip
Model Naming Convention
The prefix must be model. and the suffix must be -LE.bin. Any alphanumeric ASCII characters are allowed in between.
Example valid model names:
- model.fruit-LE.bin
- model.customer4-LE.bin
Copy the model into the default data directory in the Entity Extractor root folder.

Deep neural network processor

Entity Extractor has a deep neural network (DNN) model that can be used in place of the statistical model for selected languages. By default, the statistical models is used rather than the DNN model. You can customize which model is used.

The deep neural network processor is using TensorFlow 2.3.1 (Java version 0.2.0). Ubuntu Linux 14.04+, Windows 7+, and MacOS 10.11+ are fully supported, but you should be able to run the processor successfully on other modern Linux flavors as well. To use the processor on platforms which are not otherwise supported, or to improve the speed on supported platforms, you can replace the TensorFlow library shipped with the product with one that’s built from source.

To make use of GPUs, you should download tensorflow-core-platform-gpu and add it to the top of your classpath.

To select which model will be used, set the modelType option in your calls. The default value for modelType is statistical. To enable the deep neural network model, provide DNN for the modelType. Example:

{"content": "your_text_here", "options": {"modelType": "DNN"}}

Currently, Entity Extractor has DNN models for the following languages:

Arabic (ara)
English (eng)
Hebrew (heb)
Korean (kor)

Important

The deep neural network model and the statistical model cannot be used together. When selected, the DNN replaces the statistical model.

Name classifier

Entity Extractor has a name classifier which can be used in place of the statistical model for structured regions. The name classifier is a machine learning model that tries to predict an entity type for an input string. It processes the entire structured region (the input string) as a single entity, predicting a label (PERSON, LOCATION, ORGANIZATION, or NONE) for the string. It works best on tables cells or list items where the entire entry is a single entity. If a structured region contains more text than the entity mention itself, the name classifier will usually label it as NONE.

To enable the name classifier for structured regions, set structuredRegionProcessingType to nameClassifier in the rex-factory-config.yaml file.

Currently, Entity Extractor supports the name classifier processor for the following languages:

Arabic
English
French
German
Hebrew
Japanese

Each language has its own configuration file, data/name_classifier/<lang>/<lang>_config.yaml, where <lang> is the 3 letter language code. The labelScoreThresholds field determines the chance that a classifier will label a phrase with a given entity type. Lowering the threshold will label more phrases, which will find more true positives, but may also identify more false positives.

To disable an entity type completely, remove or comment out the corresponding entry from the <lang>_config.yaml file. Example:

# labelScoreThresholds
# Set the model score thresholds for each entity type.
# To turn off an entity from the model, comment it out.
# The accuracy of the current ORG model is too low and so it is better to turn it off for now.
labelScoreThresholds:
  PER: 1.2
  LOC: 3.2
#  ORG: 5.2

Note

Currently, the ORG entity type is excluded for all languages. LOC is enabled for English and Japanese only.

Accept gazetteer

A gazetteer is a list of exact matches in a predefined closed class. For example, you can use a gazetteer to match all the countries in the world, as there is a precise and unambiguous list of countries. An entry would count as ambiguous if it has multiple possible meanings, such as "Apple", which could be either an ORGANIZATION or a fruit. The gazetteers are very fast at extracting entities. If you are searching for specific words or phrases in your data, a custom gazetteer is a good way to find them quickly.

Entity Extractor is shipped with default gazetteer files which you can modify. Gazetteer files are located in a subdirectory of the data directory, defined by language using the three-letter ISO-639-3 language code. A directory which applies to all languages, uses xxx for the language code. For example:

<install-directory>/roots/rex-<version>/data/gazetteer/eng/reject/
<install-directory>/roots/rex-<version>/data/gazetteer/xxx/accept/

By default, the data files are located in the <install-directory>/roots/rex-<version> directory. If you want your custom files to be in a separate location, use an Overlay data directory.

Table 18. Accept Gazetteer Parameters

Parameter	Description	Default
`allowPartialGazetteerMatches`	The option to allow partial gazetteer matches. For the purposes of this setting, a partial match is one that does not line up with token boundaries as determined by the internal tokenizer. This only applies to accept gazetteers.	`false`
`acceptGazetteers`	Additional gazetteer files used to produce entities for the given language.	`null`

Creating a custom gazetteer

You can create your own, custom gazetteers. To create a custom gazetteer, put the new file in the appropriate location in the data/gazetteer tree.

language-specific: data/gazetteer/<lang>/accept
all languages: data/gazetteer/xxx/accept

A gazetteer file:

Is a .txt file encoded in UTF-8.

Each comment line is prefixed with #.
The first non-comment line is TYPE[:SUBTYPE], where TYPE is required and SUBTYPE is optional. The type is applied to the entire gazetteer and defines the entity type name for output. TYPE and SUBTYPE may be predefined or user-defined.

Gazetteer entries and potential matches are space normalized to treat any whitespace between words as a single space. This enables the gazetteer to match entities with differences in whitespace.

Tip

To improve performance, text gazetteers can be compiled to a binary gazetteer using build-binary-gazetteer in the ./scripts directory or with Model Training Suite. The binary gazetteer file name must end with -LE.bin.

Example 24. Gazetteers to Track Infectious Diseases

To track common infectious diseases, create a gazetteer like this:

# File: infectious-diseases-gazetteer.txt
#
DISEASE:INFECTIOUS
tuberculosis
e. coli
malaria
influenza

A single gazetteer may not be enough; you can create as many gazetteers as you need. To search for the scientific names of the infectious disease, you can create a file like this:

# File: latin-infectious-gazetteer.txt
#
DISEASE:INFECTIOUS
Mycobacterium tuberculosis
Escherichia coli
Plasmodium malariae
Orthomyxoviridae

To track certain diseases by their causes:

# File: infectious-bacterial-gazetteer.txt
#
DISEASE:BACTERIAL
Escherichia coli
E. coli
Staphylococcus aureus
Streptococcus pneuminiae
Salmonella

Or to track the drugs used to treat them:

# File: antimicrobial-drugs-gazetteer.txt
#
DRUG:ANTIMICROBIAL
methicillin
vancomycin
macrolide
fluoroquinolone

Tip

By default, the data files are located in the <install-directory>/roots/rex-<version> directory. To install custom gazetteer files in a separate directory, use an Overlay data directory.

Partial gazetteer matches

By default, gazetteer matches must match token boundaries in the input text. You can enable partial matches that do not start and/or do not end on token boundaries. You can also set individual regexes to return partial matches by including allow-partial-matches="yes" in a regex.

Partial matches require in-document coreference to be disabled. As a result, the mentions will not be grouped into entities.

#An option for document entity resolution (also known as entity chaining).
indocType: NULL

Tip

We do not recommend that you enable partial matches. It adds processing time and may match more than you expect. An entry such as "red" in a COLOR gazetteer will match "Frederick" in the input text.

Chinese gazetteers

Entity Extractor can analyze both simplified and traditional Chinese language documents. The following three language codes for are all used for Chinese:

Chinese (zho)
Simplified Chinese (zhs)
Traditional Chinese (zht)

zho is the Chinese language code; it applies to both simplified and traditional Chinese. Gazetteers using zho as the language code apply to documents with a language code of zhs or zht. Users should include both simplified and traditional Chinese words in the zho gazetteer, so that it will work for all Chinese language codes.

Example 25. Adding a Simplified and Traditional Word for "lion"

{"language": "zho",
"configuration": {
"entities": { "ANIMAL": [ "狮子", "獅子" ] }
}
}

Adding dynamic gazetteers

You can use the API to dynamically add gazetteer entries to the /entities endpoint. The REST endpoint is:

https://localhost:8181/rest/v1/entities/configuration/gazetteer/add

Parameters:

language: The 3 letter language code of the new values. For example, to add an English value, the language would be eng. To add the value to all languages, the language code is xxx. The language must be supported by the /entities endpoint.
entity type: The type of the entity. For example, PERSON, LOCATION, ORGANIZATION, or TITLE. The entity type must already exist in the system.
values: One or more values to be added to the gazetteer.
profileId (Optional): Custom profile id

Example 26. Dynamically adding a gazetteer entry as a string

In this example, we're adding the companies New Corp and Best Business, to the entities gazetteer for all languages (xxx).

curl --request POST \
--url http://localhost:8181/rest/v1/entities/configuration/gazetteer/add \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{"language": "xxx", "configuration":{"entities":{ "COMPANY": ["New Corp", "Best Business"]}}}'

Example 27. Dynamically adding a gazetteer entry to a custom profile

In this example, we're adding the same data as above, to the profile named group1.

curl --request POST \
--url http://localhost:8181/rest/v1/entities/configuration/gazetteer/add \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{"language": "xxx", \
"configuration":{"entities":{ "COMPANY": ["New Corp", "Best Business"]}}, "profileId": "group1"}'

Example 28. Dynamically adding a gazetteer entry as a file

In this example, the new values are in a file called new_companies.json:

{"language": "xxx", "configuration": {"entities":{ "COMPANY": ["New Corp", "Best Business"] } } }

The cURL command to add the file values:

curl --request POST \
--url http://localhost:8181/rest/v1/entities/configuration/gazetteer/add \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '@new_companies.json'

Caution

Dynamic gazetteer entries are held completely in memory and state is not saved on disk. When Analytics Server is brought down, the contents are lost. To save the new entries, add the new values to the related gazetteer file before restarting Analytics Server.

Accept regex

Regular expressions (regexes) are used for finding entities which follow a strict pattern with a rigid form and infinite combinations, such as URLs and credit card numbers. In the default Entity Extractor installation the regex files are:

language specific: data/regex/<lang>/accept/regexes.xml where <lang> is the ISO 693-3 language code
cross-language: data/regex/xxx/accept/regexes.xml

You can modify these files to add new patterns to extract the same entity type.

Table 19. Accept Regex Parameters

Parameter	Description	Default
`acceptRegularExpressionSets`	Additional files used to produce regex entities.	`null`
`supplementalRegularExpressionPaths`	The option to add supplemental regex files, usually for entity types that are excluded by #default. The supplemental regex files are located at `data/regex/<lang>/accept/supplemental` and are not used unless specified.	`null`
`regexCurrencySplit`	When set to true, Entity Extractor will attempt to split entities extracted with the regex engine of type IDENTIFIER:MONEY into two entities: IDENTIFIER:CURRENCY_AMT and IDENTIFIER:CURRENCY_TYPE. These types represent the amount of the currency (50,000) and the currency type ($), respectively	`false`

To extract new entity types that have predictable patterns, add a new XML regex file, either the language-specific (<lang> ) or generic (xxx) location. Entity Extractor uses the Tcl regex format for defining the regex patterns.

Entity Extractor modifies the regex matcher so that \n in a regex expression matches straight new lines (\n), carriage returns (\r), or a combination of both (\r\n). Regardless of what is matches, offsets and lengths in the result will match the input document.

By default, the data files are located in the <install-directory>/roots/rex-<version> directory. If you want your custom files to be in a separate location, use an Overlay data directory.

Creating a new regex

Each regex is defined in a regexp, which may contain a lang attribute and may refer to define elements.

The lang attribute designates the language for the regex. If the regex applies to text in any language, there is no lang attribute. For example, all the regexes in data/regex/eng should include lang="eng". The regexes in data/regex/xxx do not include the lang attribute, since they apply to text in any language.

A define element contains a regex and a name attribute. By naming the regex, you can include the regex in multiple regexp files.

Example 29. Defining a Regular Expression: time_ampm

Define the regular expression in a define statement:

<define lang="eng" name="time_ampm">(?:[pa]\.?\s?m\.?)</define>

Use the regular expression in a regexp statement:

<regexp lang="eng" type="TEMPORAL:TIME">...${time_ampm}...<regexp>

When Entity Extractor evaluates the regexp statement, it follows these steps:

When ${time_ampm} appears in a regexp lang="eng" element, Entity Extractor looks for a define name="time_ampm" lang="eng" statement.
If it does not find the element, Entity Extractor looks for a define name="time_ampm" element without the lang attribute.
If it does not find such an element, an error occurs.

If you include an id attribute setting, that value is returned as the "subsource" of an entity returned by this regexp.

Supplemental regexes

Entity Extractor is shipped with supplemental regexes which are not activated by default. The supplemental regexes are located in the data/regex/<lang>/accept/supplemental directory.

#The option to add supplemental regex files, usually for entity types that are excluded by
default. The supplemental regex files are located at data/regex/<lang>/accept/supplemental and
are not used unless specified.

supplementalRegularExpressionPaths: 
  - data/regex/eng/accept/supplemental/geo-regexes.xml

Table 20. Supplemental Regexes by Language

Language (ISO Code)	Currency	Date	Distance	Geo	License-Plate	Numbers	Org	Time
Arab `ara`	X	X	X	X	X	X		X
German `deu`	X	X	X	X		X		X
English `eng`	X	X	X	X		X	X	X
Farsi `fas`	X	X	X	X		X		X
French `fra`	X	X	X	X		X		X
Hebrew `heb`	X	X	X	X		X		X
Hungarian `hun`	X	X	X	X		X		X
Hindu `ind`	X	X	X	X				X
Italian `ita`	X	X	X	X		X		X
Japanese `jpn`	X	X	X	X		X		X
Korean `kor`	X	X	X	X		X		X
Dutch `nld`	X	X	X	X		X		X
Portuguese `por`	X	X	X	X		X		X
Pursian `pus`	X	X	X	X		X		X
Russian `rus`	X	X	X	X		X		X
Spanish `spa`	X	X	X	X		X		X
Swedish `swe`	X	X	X	X		X		X
Tagalog `tgl`	X	X	X	X				X
Upper-case English `uen`	X	X	X	X		X		X
Vietnamese `vie`	X	X	X	X				X
Simplified Chinese `zhs`	X	X	X	X		X		X
Traditional Chinese `zht`	X	X	X	X		X		X
Malay, Standard `zsm`	X	X	X	X				X

Joiner

The joiner combines adjacent entities into a single entity, based on the joiners rules. Entity Extractor then returns the single entity.

The configuration file for joining adjacent entities is in data/etc.

Table 21. Joiner Parameters

Parameter	Description	Default
`joinerRuleFiles`	File containing additional joiner rules.	`null`
`runJoinerPostRedactor`	Run the joiner after the redactor, instead of before.	`false`

The file neredact-config.xml specifies the rules for joining adjacent entities. Adjacent TITLE entities are joined into a single TITLE entity. The joiner elements for joining TITLE and PERSON entities into a PERSON entity are commented out by default.

<neredactconfig>
  <joiners>
    <joiner left='TITLE' right='TITLE' joined='TITLE'/>
<!-- Not joined by default
    <joiner language='eng' left='TITLE' right='PERSON' joined='PERSON'/>
    <joiner language='jpn' left='PERSON' right='TITLE' joined='PERSON'/>
-->
  </joiners>
</neredactconfig>

Rules can optionally specify a language, in which case they will apply only to entities of that specific language. If a language is not specified, the rule will apply for any language.

Entities are considered adjacent if they are separated by no more than 5 whitespace characters.

For example, to join "Barack Obama" and "President" in "Barack Obama, President", the joiner rule is:

<joiner left='PERSON' adjacency-regex=',\s+' right='TITLE' joined='PERSON'/>

The joiner runs before the redactor, as of release 7.46.2. To run the joiner after the redactor, set the parameter runJoinerPostRedactor to true in the rex-factory-config.yaml file.

Redactor

The redactor determines which entity to choose when multiple mentions for the same entity are extracted. The redactor first chooses longer entity mentions over shorter ones. If the length of the mentions are the same, the redactor uses weightings to select an entity mention.

Different processors can extract overlapping entities. For example, a gazetteer extracts "Newton", Massachusetts as a LOCATION, and the statistical processor extracts "Isaac Newton" as a PERSON. When two processors return the same or overlapping entities, the redactor chooses an entity based on the length of the competing entity strings. By default, a conflict between overlapping entities is resolved in favor of the longer candidate, "Isaac Newton".

Tip

The correct entity mention is almost always the longer mention. There can be examples, such as the example of "Newton" above, where the shorter mention is the correct mention. While it might seem that turning off the option to prefer length is the easiest fix, it usually just fixes a specific instance while reducing overall accuracy. We strongly recommend keeping the default redactorPreferLength as true.

The redactor can be configured to set weights by:

entity type
processor

Configuring the redactor

The configuration file for setting redactor weights is in <data/etc.

Set weight by entity type

Each of the ne-type elements in ne-types.xml defines weightings for a specified entity type. For example, to assign weights for IDENTIFER entities:

    <ne_type>
        <name>IDENTIFIER</name>
        <subtypes>
            <name>EMAIL</name>
            <name>URL</name>
            <name>DOMAIN_NAME</name>
            <name>IP_ADDRESS</name>
            <name>PHONE_NUMBER</name>
            <name>FAX_NUMBER</name>
            <name>PERSONAL_ID_NUM</name>
            <name>CREDIT_CARD_NUM</name>
            <name>MONEY</name>
            <name>PERCENT</name>
            <name>NUMBER</name>
        </subtypes>
        <weight name="statistical" value="9" />
        <weight name="gazetteer" value="10" />
        <weight name="regex" value="10" />
    </ne_type>

This assigns weights for the IDENTIFIER entities. They are also weighted by processor.

Set weights by processor

The processor weights are relative values; they do not have to add up to any specific value. For example, to favor gazetteer entries over regexes, and favor both over values returned by statistical analysis, you could set the weights as follows:

        <weight name="statistical" value="1" />
        <weight name="gazetteer" value="10" />
        <weight name="regex" value="5" />

Some processors offer subsources to identify specific instances. The kb-linker processor returns a subsource indicating the knowledge base the extraction originated in. To set a weight to a specific subsource set the name property to PROCESSOR:SUBSOURCE. For example, to favor your custom knowledge base (myKB) over other extractions but keep other linker extractions low, you could set the weights as follows:

        <weight name="kb-linker:MyKB" value="20" />
        <weight name="kb-linker" value="1" />

When you define new entity types for gazetteers and regexes, you should add those entity types to ne-types.xml if you want to control how the redactor resolves conflicts. Types that do not appear in this file receive weights of 10 for all three processors.

For an entity type with subtypes, the settings apply to all the subtypes.

Reject gazetteer

Instead of adding entities to extract when matched you can define a list of entities to reject when matched. These are reject gazetteers.

The format of a reject gazetteer is identical to the format of an accept gazetteer except the wildcard (*) is allowed in the entity type. As with accept gazetteers, they are arranged by language.

language-specific: data/gazetteer/<lang>/reject
all languages: data/gazetteer/xxx/reject

If, for example, it is for rejecting German entities, put it in data/gazetteer/deu/reject. If it is for rejecting entities in multiple languages, put it in data/gazetteer/xxx/reject.

Table 22. Pronominal Resolution Parameters

Parameter	Description	Default
`rejectGazetteers`	Additional gazetter files used to reject entities for the given language.	`null`

Example 30. Reject Gazetteer

The following .txt file in data/gazetteer/eng/reject, rejects the PERSON entity named "George Watson" when processing English documents.

PERSON
George Watson

A wildcard entity type would match any types. The value "George Watson" would be rejected from all entity types, not just PERSON.

*
George Watson

Reject regex

A typical regex is used to identify entities of a specified entity type. You can also define a regex to reject entities; that is whenever the pattern is identified, the entity is rejected as the defined type. Reject regexes follow the same format as accept regexes with the addition that the wildcard (*) is allowed for the entity type.

Place your reject regex files in the following directories:

language-specific: data/regex/<lang>/reject
all languages: data/regex/xxx/reject

Table 23. Reject Regex Parameters

Parameter	Description	Default
`rejectRegularExpressionSets`	Additional regex files used to reject entities.	`null`

For example, a file to reject German entities, is named data/regex/deu/reject. Files rejecting entities in multiple languages go in data/regex/xxx/reject.

Example 31. Regex to Reject a Location

The following .xml file in data/regex/eng/reject rejects Baltimore as a LOCATION entity when processing English documents.

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE regexps PUBLIC "-//basistech.com//DTD RLP Regular Expression Config 7.1//EN"
                          "urn:basistech.com:7.1:rlpregexp.dtd">

<regexps>
  <regexp lang="eng" type="LOCATION">Baltimore</regexp>
</regexps>

Note

Lookbehind assertions are not supported.

In-document coreference

Within a document, there may be multiple references to a single entity. In-document coreference (indoc coref) chains together all mentions to an entity.

The indoc coref server is an additional server which must be installed on your system for Server.
By default, indoc coref is disabled.
To enable indoc coref for a call, set the option useIndocServer to true.
The response time will be slower when indoc coref is enabled. We recommend using a GPU with indoc coref enabled.
To see which languages support indoc coref, use the /entities/indoc-coref-server/supported-languages endpoint.Supported languages - Indoc coref

Pronominal resolution

If resolvePronouns is enabled (it is disabled by default), Entity Extractor will try to resolve pronouns with the corresponding antecedent entities.

Pronominal resolution is supported for English only.

Table 24. Pronominal Resolution Parameters

Parameter	Description	Default
`resolvePronouns`	When true, resolve pronouns to person entities.	`false`

Social media extractors

As of release 7.56.0.c77.0, Entity Extractor can extract the following social media entity types, improving entity extraction from social media content. The extractor utilizes the part of speech identified for the entity by Babel Street Base Linguistics to recognize social media types.

ATMENTION
EMAIL
HASHTAG
URL

To enable social media, set extractSocialMedia to true in the rex-factory-config.yaml file.

Table 25. Social Media Parameters

Parameter	Description	Default
`extractSocialMedia`	Extracts social media entity types, instead of person, location, or organization entity types.	`false`

When enabled, these entities replace person, organization, or location entity types for atmention and hashtag.

Table 26. Social Media Entity Examples

Text	Entity type extracted
	Social media enabled	Social media disabled
@microsoft	`ATMENTION`	`ORGANIZATION`
#boston	`HASHTAG`	`LOCATION`
JoeSmith@gmail.com	`EMAIL/URL`	`IDENTIFIER:EMAIL/URL`
https://docs.babelstreet.com/Extract/en/entity-extractor.html	`EMAIL/URL`	I`IDENTIFIER:EMAIL/URL`

Creating Custom Processors

Entity Extractor has a plugin architecture that allows users to create custom processors that can be inserted into the Entity Extractor pipeline at two points.

At the preExtractor phase - a custom processor may insert additional text pre-processing after input, but before tokenization and sentence breaks (either provided by Base Linguistics or the user’s own tokenizer and sentence breaker).
At the preRedaction phase - a custom processor may insert corrections or modifications to output from the default extractors (statistical, regex, gazetteer), using the full information and context that the default extractors have access to (e.g., plain text data, sentence boundaries, tokens, full list of entities extracted and the source extractors which found them, boundaries, and processor types, etc.)

Pre-Extraction Custom Processors: For Additional Text Pre-Processing

Custom processors at the preExtractor phase can provide additional text pre-processing. For example, if the files contain boilerplate, footers, and navigation bar text that are not the target of the analysis, including these parts of the document in the analysis may trip up the tokenization process and thus decrease the overall quality of extraction results. A preExtractor custom processor can strip footers of emails or add metadata to the target files.

Pre-Redaction Custom Processors: For Correcting/Modifying Extractor Output

Custom processors at the preRedaction phase are run after the default processors (statistical, gazetteer, regex) and any filters (reject files for regex and gazetteer) have run, but before the redactor. A custom processor at the preRedaction phase receives all information and context of the intermediate results from the output of the default extraction processors, and can make modifications to those results before the redactor phase adjudicates conflicts between the results from statistical, gazetteer, and regex processors.

Only entities and metadata attributes fields can be updated with the pre-redaction custom processor. If the custom processor attempts to make changes in forbidden fields, specifically data (input), token, or sentence attributes, the specified changes will be ignored and a warning will be logged.

Examples of cases that are correctable with a custom processor include:

Reject a mention as an entity: Cases where Entity Extractor incorrectly extracts a mention that is not an entity can be excluded from the new list of entity results.
Correcting the entity type: If, for example, your dataset consists of personal letters, and you have high confidence that after a closing such as “Love,” or “Sincerely yours,” the entity that follows should be a PERSON, but Entity Extractor is identifying it as an ORGANIZATION.
Modifying entity boundaries: If, for example, Entity Extractor is incorrectly extracting “Hi” as part of a PERSON entity, as in “Hi Joe” instead of just extracting “Joe”.

The code sample in our public github repository at https://github.com/rosette-api/custom-processor-sample includes a custom processor called SampleCustomProcessor.java which corrects an entity type.

Note

Filters (reject files) vs. Pre-Redaction Custom Processors

The reject files for regexes and gazetteers simply filter out a list of words or a pattern-matched set of words the user does not want to extract as entities. These reject functions operate without considering the context in which these words appear. By contrast, custom processors at the preRedaction phase have access to the entire context in which an extracted entity appears, and thus can implement smarter rules.

Implementing the Custom Processor

You can implement the CustomProcessor and Annotator interfaces in Java in your own JAR and register them via the extractor’s setCustomProcessors. Your custom processor is the factory of the annotator implementation and thus should be familiar with the requirements of your annotators, and provide them with the correct parameters for the language and the phase requests. The Annotator is the interface to the ADM (i.e., annotated text) and based on the custom processor it manipulates the ADM and outputs it to the next phase.

Walk-Through Example of preRedaction Phase Custom Processors

A custom preRedaction annotator receives entity mentions from all extraction processors, after reject processors run and before redactor and coref processors run. It can reject (remove) entity mentions, modify entity types or adjust entity mention offsets. These modifications will affect the input of the next processors in the pipeline. For example coref would not consider chaining together PERSON and ORGANIZATION mentions into the same entity, so a mention whose entity type was changed from ORGANIZATION to PERSON by a custom processor would only be chained to other PERSON entities. After the Redactor phase, the rest of the pipeline runs as usual.

The steps to create a custom processor:

Copy the custom processor java code into the $ROSAPI_HOME/launcher/bundles directory.

Edit the $ROSAPI_HOME/launcher/config/rosapi/rex-factory-config.yaml file:

Add the custom processors to the customProcessors section:

#Custom processors to add to annotators. 
customProcessors:
    - personContextAnnotator
    - boundaryAdjustAnnotator
    - metadataAnnotator

#Register a custom processor class. 
customProcessorClasses:
    - sample.SampleCustomProcessor

Run Analytics Server.

Entity linking

Entity linking provides a mechanism for disambiguating the identity of similarly named entities mentioned in a document. For example, “Rebecca Cole” is the second African-American woman to become a doctor in the United States and also the name of an Australian professional basketball player. Linking helps establish the identity of the entity by disambiguating common names and matching a variety of names, such as nicknames and formal titles, with an entity ID.

Linker processor

To link entities to a knowledge base, Entity Extractor uses a statistical disambiguation model trained on a knowledge base. The linker processor is delivered with a model based on a default Wikidata knowledge base. If the entity exists in Wikidata, then Entity Extractor returns the Wikidata QID, such as Q1 for the Universe, in the entityId field. Once enabled, the linker can also return:

If the linker is disabled (the default), a random string is returned as the entityId. The string starts with a "T" (temporary id) followed by a random number, which is unique per document.

In addition to the default Wikidata knowledge base, you can train a disambiguation model for a custom knowledge base using the Model Training Suite. The custom knowledge base model can replace or run in parallel with the default knowledge base.

Linker Processor Files The linker processor is packaged as part of the standard Entity Extractor distribution. The linker files are in the subdirectory data/flinx.

By default, the linker processor both extracts and links entity candidates. These functions are separate from the default Entity Extractor entity extraction performed by the statistical, pattern-matching, and exact-matching processors.

You can choose to link the candidates from the statistical, pattern-matching, and exact-matching processors instead of using the linker processor to extract candidates. Set the parameter linkMentionMode to entities to use the other processors, not the linker processor. By default, linkMentionMode is set to text, in which case the linker processor extracts the candidate entities from the text.

Important

If you use the linker processor to extracts entities, the entities from the linker processor may differ from those returned by the statistical, pattern-matching, and exact-matching processors. The redactor will resolve any overlapping or conflicting entity results.

Table 27. Linker Configuration Parameters

Parameter	Description	Default
`kbs`	Custom list of Knowledge Bases for the linker, in order of priority	`null`
`linkEntities`	The option to link mentions to knowledge base entities with disambiguation model. Enabling this option also enables `calculateConfidence`.	`false`
`calculateConfidence`	If true, entity confidence values are calculated. Can be overridden by specifying `calculateConfidence` in the API call.	`false`
`useDefaultConfidence`	The option to assign default confidence value 1.0 to non-statistical entities instead of null.	`false`
`linkingConfidenceThreshold`	The confidence value threshold below which linking results by the kbLinker processor are ignored.	`-1.0`
`linkMentionMode`	If set to `entities`, the linker processor uses the statistical, pattern-matching, and exact-matching processors. When set to `text`, the linker extracts its own candidates.	`text`

Entity linking is enabled by setting the linkEntities value to true in the rex-factory-config.yaml file or by adding {"options": {"linkEntities": true}} to an API call.

By default, Entity Extractor factory is configured so the linker finds candidates in text before attempting to link them with knowledge base entries. To change this behavior and use pre-existing mentions extracted by the statistical, pattern-matching and exact-matching processors set the linkMentionMode in rex-factory-config.yaml to entities. In addition it is possible to pass the linkMentionMode option in the API call {"options": {"linkEntities": true, "linkMentionMode": "entities"}}. In both cases entity linking must be enabled.

Selecting a knowledge base for linking

By default, all knowledge bases under the data/flinx/data/kb directory inside the Entity Extractor installation will automatically be used for linking. Any custom knowledge bases placed in this directory will be loaded each time Entity Extractor launches.

You can enable dynamic loading, controlling which custom knowledge bases will be loaded in the rex-factory-config.yaml file with the kbs parameter, which takes a List of Paths to knowledge bases.

kbs:
    - /customKBs/kb1
    - /customKBs/kb2
    - /rosette/server/roots/rex/7.44.1.c62.2/data/flinx/data/kb/basis

The list is in priority order; the match from the highest knowledge base on the list will be returned.

Important

Setting the list of knowledge bases completely overwrites the list of knowledge bases the linker uses. If you want the default Wikidata knowledge base to be included, it must be on the list of knowledge bases.

DBpedia types for linked entities

The linker processor can associate entities with types drawn from the DBpedia ontology, which provides over 700 types at up to seven levels of granularity.

By default, providing DBpedia types is turned off. To turn it on, add {"options": {"includeDBpediaTypes": true}} to your API call.

PermIDs

The linker processor can return the Refinitiv PermID for a subset of entities which are identified with a QID. By default, linking to PermIDs is turned off.

To return the PermID, add {"options": {"includePermID": true}} to your call. To return PermIDs, entity linking must also enabled.

ISO 639-3 language codes

Entity Extractor uses ISO 639-3 codes to specify the language of the input text.

Language	Code
Arabic	`ara`
Chinese, Script-insensitive	`zho`
Chinese, Simplified	`zhs`
Chinese, Traditional	`zht`
Dutch	`nld`
English	`eng`
English Uppercase	`uen`
French	`fra`
German	`deu`
Hebrew	`heb`
Hungarian	`hun`
Indonesian	`ind`
Italian	`ita`
Japanese	`jpn`
Korean	`kor`
Malay, Standard	`zsm`
Persian (Iranian Persian, Afghan Persian)	`fas`
Portuguese	`por`
Pashto	`pus`
Russian	`rus`
Spanish	`spa`
Swedish	`swe`
Tagalog	`tgl`
Urdu	`urd`
Vietnamese	`vie`
Generic, Cross-Language	`xxx`

Tcl regex format

The Pattern Matching Processor uses the Tcl regular expression engine to identify named entities in input text. To see the named entity types that the Pattern Matching Processor with the shipped regexes file returns, see Language Support of Named Entities. For background information about adding your own regexes, see Accept regex.

For information on Tcl syntax, see the Tcl re_syntax Manual Page .

Tcl license

This software is copyrighted by the Regents of the University of California, Sun Microsystems, Inc., Scriptics Corporation, ActiveState Corporation and other parties. The following terms apply to all files associated with the software unless explicitly disclaimed in individual files.

The authors hereby grant permission to use, copy, modify, distribute, and license this software and its documentation for any purpose, provided that existing copyright notices are retained in all copies and that this notice is included verbatim in any distributions. No written agreement, license, or royalty fee is required for any of the authorized uses. Modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply.

IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT.THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

GOVERNMENT USE: If you are acquiring this software on behalf of the U.S. government, the Government shall have only "Restricted Rights" in the software and related documentation as defined in the Federal Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2). If you are acquiring the software on behalf of the Department of Defense, the software shall be classified as "Commercial Computer Software" and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs. Notwithstanding the foregoing, the authors grant the U.S. Government and others acting in its behalf permission to use and distribute the software in accordance with the terms specified in this license.

Entity types and DBpedia types

Entity Types and DBpedia Types

Entity type	DBpedia type
ACTIVITY	Activity
ACTIVITY	Activity/Game
ACTIVITY	Activity/Sales
ACTIVITY	Activity/Sport
ACTIVITY	Activity/Sport/Athletics
ACTIVITY	Activity/Sport/TeamSport
ACTIVITY	Activity
ANATOMY	AnatomicalStructure
ANATOMY	AnatomicalStructure/Artery
ANATOMY	AnatomicalStructure/BloodVessel
ANATOMY	AnatomicalStructure/Bone
ANATOMY	AnatomicalStructure/Brain
ANATOMY	AnatomicalStructure/Embryology
ANATOMY	AnatomicalStructure/Ligament
ANATOMY	AnatomicalStructure/Lymph
ANATOMY	AnatomicalStructure/Muscle
ANATOMY	AnatomicalStructure/Nerve
ANATOMY	AnatomicalStructure/Vein
DISEASE	Disease
EVENT	Event
EVENT	Event/Competition
EVENT	Event/Competition/Contest
EVENT	Event/LifeCycleEvent
EVENT	Event/LifeCycleEvent/PersonalEvent
EVENT	Event/NaturalEvent
EVENT	Event/NaturalEvent/Earthquake
EVENT	Event/NaturalEvent/SolarEclipse
EVENT	Event/NaturalEvent/StormSurge
EVENT	Event/PenaltyShootOut
EVENT	Event/SocietalEvent
EVENT	Event/SocietalEvent/AcademicConference
EVENT	Event/SocietalEvent/Attack
EVENT	Event/SocietalEvent/Convention
EVENT	Event/SocietalEvent/Election
EVENT	Event/SocietalEvent/FilmFestival
EVENT	Event/SocietalEvent/HistoricalEvent
EVENT	Event/SocietalEvent/Meeting
EVENT	Event/SocietalEvent/MilitaryConflict
EVENT	Event/SocietalEvent/MusicFestival
EVENT	Event/SocietalEvent/Rebellion
EVENT	Event/SocietalEvent/SpaceMission
EVENT	Event/SocietalEvent/SportsEvent
EVENT	Event/SocietalEvent/SportsEvent/CyclingCompetition
EVENT	Event/SocietalEvent/SportsEvent/FootballMatch
EVENT	Event/SocietalEvent/SportsEvent/GrandPrix
EVENT	Event/SocietalEvent/SportsEvent/InternationalFootballLeagueEvent
EVENT	Event/SocietalEvent/SportsEvent/MixedMartialArtsEvent
EVENT	Event/SocietalEvent/SportsEvent/NationalFootballLeagueEvent
EVENT	Event/SocietalEvent/SportsEvent/Olympics
EVENT	Event/SocietalEvent/SportsEvent/Olympics/OlympicEvent
EVENT	Event/SocietalEvent/SportsEvent/Race
EVENT	Event/SocietalEvent/SportsEvent/Race/CyclingRace
EVENT	Event/SocietalEvent/SportsEvent/Race/HorseRace
EVENT	Event/SocietalEvent/SportsEvent/Race/MotorRace
EVENT	Event/SocietalEvent/SportsEvent/Tournament
EVENT	Event/SocietalEvent/SportsEvent/Tournament/GolfTournament
EVENT	Event/SocietalEvent/SportsEvent/Tournament/SoccerTournament
EVENT	Event/SocietalEvent/SportsEvent/Tournament/TennisTournament
EVENT	Event/SocietalEvent/SportsEvent/Tournament/WomensTennisAssociationTournament
EVENT	Event/SocietalEvent/SportsEvent/WrestlingEvent
EVENT	Holiday
EVENT	SportsSeason
EVENT	SportsSeason/MotorsportSeason
EVENT	SportsSeason/SportsTeamSeason
EVENT	SportsSeason/SportsTeamSeason/BaseballSeason
EVENT	SportsSeason/SportsTeamSeason/FootballLeagueSeason
EVENT	SportsSeason/SportsTeamSeason/FootballLeagueSeason/NationalFootballLeagueSeason
EVENT	SportsSeason/SportsTeamSeason/NCAATeamSeason
EVENT	SportsSeason/SportsTeamSeason/SoccerClubSeason
EVENT	SportsSeason/SportsTeamSeason/SoccerLeagueSeason
EVENT	Statistic
EVENT	TimePeriod
EVENT	TimePeriod/CareerStation
EVENT	TimePeriod/CareerStation/MilitaryService
EVENT	TimePeriod/GeologicalPeriod
EVENT	TimePeriod/HistoricalPeriod
EVENT	TimePeriod/PeriodOfArtisticStyle
EVENT	TimePeriod/PrehistoricalPeriod
EVENT	TimePeriod/ProtohistoricalPeriod
EVENT	TimePeriod/Reign
EVENT	TimePeriod/Tenure
EVENT	TimePeriod/Year
EVENT	TimePeriod/YearInSpaceflight
EVENT	UnitOfWork/Case
EVENT	UnitOfWork/Case/LegalCase
EVENT	UnitOfWork/Case/LegalCase/SupremeCourtOfTheUnitedStatesCase
EVENT	UnitOfWork/Project
EVENT	UnitOfWork/Project/ResearchProject
EVENT	E4_Period
FOOD	Food
FOOD	Food/Beverage
FOOD	Food/Beverage/Beer
FOOD	Food/Beverage/Vodka
FOOD	Food/Beverage/Wine
FOOD	Food/Beverage/Wine/ControlledDesignationOfOriginWine
FOOD	Food/Cheese
IDENTIFIER	Identifier
IDENTIFIER	Identifier/TopLevelDomain
LANGUAGE	Language
LANGUAGE	Language/ProgrammingLanguage
LOCATION	ElectricalSubstation
LOCATION	Place
LOCATION	Place/ArchitecturalStructure
LOCATION	Place/ArchitecturalStructure/AmusementParkAttraction
LOCATION	Place/ArchitecturalStructure/AmusementParkAttraction/RollerCoaster
LOCATION	Place/ArchitecturalStructure/AmusementParkAttraction/WaterRide
LOCATION	Place/ArchitecturalStructure/Arena
LOCATION	Place/ArchitecturalStructure/Building
LOCATION	Place/ArchitecturalStructure/Building/Casino
LOCATION	Place/ArchitecturalStructure/Building/Castle
LOCATION	Place/ArchitecturalStructure/Building/Factory
LOCATION	Place/ArchitecturalStructure/Building/HistoricBuilding
LOCATION	Place/ArchitecturalStructure/Building/Hospital
LOCATION	Place/ArchitecturalStructure/Building/Hotel
LOCATION	Place/ArchitecturalStructure/Building/Museum
LOCATION	Place/ArchitecturalStructure/Building/Prison
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Church
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Monastery
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Mosque
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Shrine
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Synagogue
LOCATION	Place/ArchitecturalStructure/Building/ReligiousBuilding/Temple
LOCATION	Place/ArchitecturalStructure/Building/Restaurant
LOCATION	Place/ArchitecturalStructure/Building/ShoppingMall
LOCATION	Place/ArchitecturalStructure/Building/Skyscraper
LOCATION	Place/ArchitecturalStructure/Building/Venue
LOCATION	Place/ArchitecturalStructure/Building/Venue/Cinema
LOCATION	Place/ArchitecturalStructure/Building/Venue/Stadium
LOCATION	Place/ArchitecturalStructure/Building/Venue/Theatre
LOCATION	Place/ArchitecturalStructure/Infrastructure
LOCATION	Place/ArchitecturalStructure/Infrastructure/Airport
LOCATION	Place/ArchitecturalStructure/Infrastructure/Dam
LOCATION	Place/ArchitecturalStructure/Infrastructure/Dike
LOCATION	Place/ArchitecturalStructure/Infrastructure/LaunchPad
LOCATION	Place/ArchitecturalStructure/Infrastructure/Lock
LOCATION	Place/ArchitecturalStructure/Infrastructure/Port
LOCATION	Place/ArchitecturalStructure/Infrastructure/PowerStation
LOCATION	Place/ArchitecturalStructure/Infrastructure/PowerStation/NuclearPowerStation
LOCATION	Place/ArchitecturalStructure/Infrastructure/RestArea
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/Bridge
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/RailwayLine
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/RailwayTunnel
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/Road
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/RoadJunction
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/RoadTunnel
LOCATION	Place/ArchitecturalStructure/Infrastructure/RouteOfTransportation/WaterwayTunnel
LOCATION	Place/ArchitecturalStructure/Infrastructure/Station
LOCATION	Place/ArchitecturalStructure/Infrastructure/Station/MetroStation
LOCATION	Place/ArchitecturalStructure/Infrastructure/Station/RailwayStation
LOCATION	Place/ArchitecturalStructure/Infrastructure/Station/RouteStop
LOCATION	Place/ArchitecturalStructure/Infrastructure/Station/TramStation
LOCATION	Place/ArchitecturalStructure/MilitaryStructure
LOCATION	Place/ArchitecturalStructure/MilitaryStructure/Fort
LOCATION	Place/ArchitecturalStructure/Mill
LOCATION	Place/ArchitecturalStructure/Mill/Treadmill
LOCATION	Place/ArchitecturalStructure/Mill/Watermill
LOCATION	Place/ArchitecturalStructure/Mill/WindMotor
LOCATION	Place/ArchitecturalStructure/Mill/Windmill
LOCATION	Place/ArchitecturalStructure/Monument
LOCATION	Place/ArchitecturalStructure/Monument/GraveMonument
LOCATION	Place/ArchitecturalStructure/Monument/Memorial
LOCATION	Place/ArchitecturalStructure/Pyramid
LOCATION	Place/ArchitecturalStructure/SportFacility
LOCATION	Place/ArchitecturalStructure/SportFacility/CricketGround
LOCATION	Place/ArchitecturalStructure/SportFacility/GolfCourse
LOCATION	Place/ArchitecturalStructure/SportFacility/RaceTrack
LOCATION	Place/ArchitecturalStructure/SportFacility/RaceTrack/Racecourse
LOCATION	Place/ArchitecturalStructure/SportFacility/SkiArea
LOCATION	Place/ArchitecturalStructure/SportFacility/SkiArea/SkiResort
LOCATION	Place/ArchitecturalStructure/Square
LOCATION	Place/ArchitecturalStructure/Tower
LOCATION	Place/ArchitecturalStructure/Tower/Lighthouse
LOCATION	Place/ArchitecturalStructure/Tower/WaterTower
LOCATION	Place/ArchitecturalStructure/Tunnel
LOCATION	Place/ArchitecturalStructure/Zoo
LOCATION	Place/CelestialBody
LOCATION	Place/CelestialBody/Asteroid
LOCATION	Place/CelestialBody/Constellation
LOCATION	Place/CelestialBody/Galaxy
LOCATION	Place/CelestialBody/Planet
LOCATION	Place/CelestialBody/Satellite
LOCATION	Place/CelestialBody/Satellite/ArtificialSatellite
LOCATION	Place/CelestialBody/Star
LOCATION	Place/CelestialBody/Star/BrownDwarf
LOCATION	Place/CelestialBody/Swarm
LOCATION	Place/CelestialBody/Swarm/Globularswarm
LOCATION	Place/CelestialBody/Swarm/Openswarm
LOCATION	Place/Cemetery
LOCATION	Place/ConcentrationCamp
LOCATION	Place/CountrySeat
LOCATION	Place/Garden
LOCATION	Place/HistoricPlace
LOCATION	Place/Mine
LOCATION	Place/Mine/CoalPit
LOCATION	Place/NaturalPlace
LOCATION	Place/NaturalPlace/Archipelago
LOCATION	Place/NaturalPlace/Beach
LOCATION	Place/NaturalPlace/BodyOfWater
LOCATION	Place/NaturalPlace/BodyOfWater/Bay
LOCATION	Place/NaturalPlace/BodyOfWater/Lake
LOCATION	Place/NaturalPlace/BodyOfWater/Ocean
LOCATION	Place/NaturalPlace/BodyOfWater/Sea
LOCATION	Place/NaturalPlace/BodyOfWater/Stream
LOCATION	Place/NaturalPlace/BodyOfWater/Stream/Canal
LOCATION	Place/NaturalPlace/BodyOfWater/Stream/River
LOCATION	Place/NaturalPlace/Cape
LOCATION	Place/NaturalPlace/Cave
LOCATION	Place/NaturalPlace/Crater
LOCATION	Place/NaturalPlace/Crater/LunarCrater
LOCATION	Place/NaturalPlace/Desert
LOCATION	Place/NaturalPlace/Forest
LOCATION	Place/NaturalPlace/Glacier
LOCATION	Place/NaturalPlace/HotSpring
LOCATION	Place/NaturalPlace/Mountain
LOCATION	Place/NaturalPlace/MountainPass
LOCATION	Place/NaturalPlace/MountainRange
LOCATION	Place/NaturalPlace/Valley
LOCATION	Place/NaturalPlace/Volcano
LOCATION	Place/Park
LOCATION	Place/PopulatedPlace
LOCATION	Place/PopulatedPlace/Agglomeration
LOCATION	Place/PopulatedPlace/Community
LOCATION	Place/PopulatedPlace/Continent
LOCATION	Place/PopulatedPlace/Country
LOCATION	Place/PopulatedPlace/Country/HistoricalCountry
LOCATION	Place/PopulatedPlace/GatedCommunity
LOCATION	Place/PopulatedPlace/Intercommunality
LOCATION	Place/PopulatedPlace/Island
LOCATION	Place/PopulatedPlace/Island/Atoll
LOCATION	Place/PopulatedPlace/Locality
LOCATION	Place/PopulatedPlace/Region
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/ClericalAdministrativeRegion
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/ClericalAdministrativeRegion/Deanery
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/ClericalAdministrativeRegion/Diocese
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/ClericalAdministrativeRegion/Parish
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Arrondissement
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Canton
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Department
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Department/OverseasDepartment
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/District
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/District/HistoricalDistrict
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/DistrictWaterBoard
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/MicroRegion
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Municipality
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Municipality/FormerMunicipality
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Prefecture
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Province
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Province/HistoricalProvince
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/Regency
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/GovernmentalAdministrativeRegion/SubMunicipality
LOCATION	Place/PopulatedPlace/Region/AdministrativeRegion/HistoricalAreaOfAuthority
LOCATION	Place/PopulatedPlace/Region/HistoricalRegion
LOCATION	Place/PopulatedPlace/Region/NaturalRegion
LOCATION	Place/PopulatedPlace/Settlement
LOCATION	Place/PopulatedPlace/Settlement/City
LOCATION	Place/PopulatedPlace/Settlement/City/Capital
LOCATION	Place/PopulatedPlace/Settlement/City/CapitalOfRegion
LOCATION	Place/PopulatedPlace/Settlement/CityDistrict
LOCATION	Place/PopulatedPlace/Settlement/HistoricalSettlement
LOCATION	Place/PopulatedPlace/Settlement/Town
LOCATION	Place/PopulatedPlace/Settlement/Village
LOCATION	Place/PopulatedPlace/State
LOCATION	Place/PopulatedPlace/Street
LOCATION	Place/PopulatedPlace/Territory
LOCATION	Place/PopulatedPlace/Territory/OldTerritory
LOCATION	Place/ProtectedArea
LOCATION	Place/SiteOfSpecialScientificInterest
LOCATION	Place/WineRegion
LOCATION	Place/WorldHeritageSite
MEASURE	Altitude
MEASURE	Area
MEASURE	Depth
MEASURE	GrossDomesticProduct
MEASURE	GrossDomesticProductPerCapita
MEASURE	Population
MEASURE	SportCompetitionResult
MEASURE	SportCompetitionResult/OlympicResult
MISC	Thing
MISC	Agent
MISC	Agent/Employer
MISC	Agent/Organisation/TermOfOffice
MISC	Award
MISC	Award/Decoration
MISC	Award/NobelPrize
MISC	Blazon
MISC	ChartsPlacements
MISC	Colour
MISC	Demographics
MISC	ElectionDiagram
MISC	EthnicGroup
MISC	Flag
MISC	GeneLocation
MISC	GeneLocation/HumanGeneLocation
MISC	GeneLocation/MouseGeneLocation
MISC	List
MISC	List/TrackList
MISC	Media
MISC	MedicalSpecialty
MISC	Medicine
MISC	Name
MISC	Name/GivenName
MISC	Name/Surname
MISC	PublicService
MISC	Relationship
MISC	SportCompetitionResult/SnookerWorldRanking
MISC	TopicalConcept
MISC	TopicalConcept/AcademicSubject
MISC	TopicalConcept/CardinalDirection
MISC	TopicalConcept/Fashion
MISC	TopicalConcept/Genre
MISC	TopicalConcept/Genre/ArtisticGenre
MISC	TopicalConcept/Genre/LiteraryGenre
MISC	TopicalConcept/Genre/MovieGenre
MISC	TopicalConcept/Genre/MusicGenre
MISC	TopicalConcept/Ideology
MISC	TopicalConcept/MathematicalConcept
MISC	TopicalConcept/PhilosophicalConcept
MISC	TopicalConcept/PoliticalConcept
MISC	TopicalConcept/ScientificConcept
MISC	TopicalConcept/Standard
MISC	TopicalConcept/SystemOfLaw
MISC	TopicalConcept/Tax
MISC	TopicalConcept/Taxon
MISC	TopicalConcept/TheologicalConcept
MISC	TopicalConcept/TheologicalConcept/ChristianDoctrine
MISC	TopicalConcept/Type
MISC	TopicalConcept/Type/DocumentType
MISC	TopicalConcept/Type/GovernmentType
MISC	UnitOfWork
MISC	Unknown
MISC	Unknown/WikimediaTemplate
MISC	Work
MISC	Work/Document
MISC	Work/Document/File
MISC	Work/Document/Image
MISC	Work/Document/Image/MovingImage
MISC	Work/Document/Image/StillImage
MISC	Work/Document/Sound
MISC	Work/MusicalWork/NationalAnthem
MISC	Work/MusicalWork/Song/EurovisionSongContestEntry
MISC	Work/WrittenWork
MISC	Work/WrittenWork/Annotation
MISC	Work/WrittenWork/Annotation/Reference
MISC	Work/WrittenWork/Law
MISC	Work/WrittenWork/Letter
MISC	Work/WrittenWork/Quote
MISC	Work/WrittenWork/Resume
MISC	Work/WrittenWork/StatedResolution
MISC	Work/WrittenWork/Treaty
MISC	Work/Document
MISC	Image
MISC	SpatialThing
MISC	_Feature
MISC	Property
MISC	Concept
MISC	OrderedCollection
MONEY	Currency
ORGANIZATION	Agent/Family
ORGANIZATION	Agent/Family/NobleFamily
ORGANIZATION	Agent/Organisation
ORGANIZATION	Agent/Organisation/Broadcaster
ORGANIZATION	Agent/Organisation/Broadcaster/BroadcastNetwork
ORGANIZATION	Agent/Organisation/Broadcaster/RadioStation
ORGANIZATION	Agent/Organisation/Broadcaster/TelevisionStation
ORGANIZATION	Agent/Organisation/Company
ORGANIZATION	Agent/Organisation/Company/Bank
ORGANIZATION	Agent/Organisation/Company/Brewery
ORGANIZATION	Agent/Organisation/Company/Caterer
ORGANIZATION	Agent/Organisation/Company/LawFirm
ORGANIZATION	Agent/Organisation/Company/PublicTransitSystem
ORGANIZATION	Agent/Organisation/Company/PublicTransitSystem/Airline
ORGANIZATION	Agent/Organisation/Company/PublicTransitSystem/BusCompany
ORGANIZATION	Agent/Organisation/Company/Publisher
ORGANIZATION	Agent/Organisation/Company/RecordLabel
ORGANIZATION	Agent/Organisation/Company/Winery
ORGANIZATION	Agent/Organisation/EducationalInstitution
ORGANIZATION	Agent/Organisation/EducationalInstitution/College
ORGANIZATION	Agent/Organisation/EducationalInstitution/Library
ORGANIZATION	Agent/Organisation/EducationalInstitution/School
ORGANIZATION	Agent/Organisation/EducationalInstitution/University
ORGANIZATION	Agent/Organisation/EmployersOrganisation
ORGANIZATION	Agent/Organisation/GeopoliticalOrganisation
ORGANIZATION	Agent/Organisation/GovernmentAgency
ORGANIZATION	Agent/Organisation/GovernmentAgency/GovernmentCabinet
ORGANIZATION	Agent/Organisation/Group
ORGANIZATION	Agent/Organisation/Group/Band
ORGANIZATION	Agent/Organisation/Group/ComedyGroup
ORGANIZATION	Agent/Organisation/InternationalOrganisation
ORGANIZATION	Agent/Organisation/Legislature
ORGANIZATION	Agent/Organisation/MilitaryUnit
ORGANIZATION	Agent/Organisation/Non-ProfitOrganisation
ORGANIZATION	Agent/Organisation/Non-ProfitOrganisation/RecordOffice
ORGANIZATION	Agent/Organisation/Parliament
ORGANIZATION	Agent/Organisation/PoliticalParty
ORGANIZATION	Agent/Organisation/ReligiousOrganisation
ORGANIZATION	Agent/Organisation/ReligiousOrganisation/ClericalOrder
ORGANIZATION	Agent/Organisation/SambaSchool
ORGANIZATION	Agent/Organisation/SportsClub
ORGANIZATION	Agent/Organisation/SportsClub/HockeyClub
ORGANIZATION	Agent/Organisation/SportsClub/RugbyClub
ORGANIZATION	Agent/Organisation/SportsClub/SoccerClub
ORGANIZATION	Agent/Organisation/SportsClub/SoccerClub/NationalSoccerClub
ORGANIZATION	Agent/Organisation/SportsLeague
ORGANIZATION	Agent/Organisation/SportsLeague/AmericanFootballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/AustralianFootballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/AutoRacingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/BaseballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/BasketballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/BowlingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/BoxingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/CanadianFootballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/CricketLeague
ORGANIZATION	Agent/Organisation/SportsLeague/CurlingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/CyclingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/FieldHockeyLeague
ORGANIZATION	Agent/Organisation/SportsLeague/FormulaOneRacing
ORGANIZATION	Agent/Organisation/SportsLeague/GolfLeague
ORGANIZATION	Agent/Organisation/SportsLeague/HandballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/IceHockeyLeague
ORGANIZATION	Agent/Organisation/SportsLeague/InlineHockeyLeague
ORGANIZATION	Agent/Organisation/SportsLeague/LacrosseLeague
ORGANIZATION	Agent/Organisation/SportsLeague/MixedMartialArtsLeague
ORGANIZATION	Agent/Organisation/SportsLeague/MotorcycleRacingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/PaintballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/PoloLeague
ORGANIZATION	Agent/Organisation/SportsLeague/RadioControlledRacingLeague
ORGANIZATION	Agent/Organisation/SportsLeague/RugbyLeague
ORGANIZATION	Agent/Organisation/SportsLeague/SoccerLeague
ORGANIZATION	Agent/Organisation/SportsLeague/SoftballLeague
ORGANIZATION	Agent/Organisation/SportsLeague/SpeedwayLeague
ORGANIZATION	Agent/Organisation/SportsLeague/TennisLeague
ORGANIZATION	Agent/Organisation/SportsLeague/VideogamesLeague
ORGANIZATION	Agent/Organisation/SportsLeague/VolleyballLeague
ORGANIZATION	Agent/Organisation/SportsTeam
ORGANIZATION	Agent/Organisation/SportsTeam/AmericanFootballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/AustralianFootballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/BaseballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/BasketballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/CanadianFootballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/CricketTeam
ORGANIZATION	Agent/Organisation/SportsTeam/CyclingTeam
ORGANIZATION	Agent/Organisation/SportsTeam/FormulaOneTeam
ORGANIZATION	Agent/Organisation/SportsTeam/HandballTeam
ORGANIZATION	Agent/Organisation/SportsTeam/HockeyTeam
ORGANIZATION	Agent/Organisation/SportsTeam/SpeedwayTeam
ORGANIZATION	Agent/Organisation/TradeUnion
PERSON	Agent/Deity
PERSON	Agent/FictionalCharacter
PERSON	Agent/FictionalCharacter/ComicsCharacter
PERSON	Agent/FictionalCharacter/ComicsCharacter/AnimangaCharacter
PERSON	Agent/FictionalCharacter/DisneyCharacter
PERSON	Agent/FictionalCharacter/MythologicalFigure
PERSON	Agent/FictionalCharacter/NarutoCharacter
PERSON	Agent/FictionalCharacter/SoapCharacter
PERSON	Agent/Person
PERSON	Agent/Person/Archeologist
PERSON	Agent/Person/Architect
PERSON	Agent/Person/Aristocrat
PERSON	Agent/Person/Artist
PERSON	Agent/Person/Artist/Actor
PERSON	Agent/Person/Artist/Actor/AdultActor
PERSON	Agent/Person/Artist/Actor/VoiceActor
PERSON	Agent/Person/Artist/Comedian
PERSON	Agent/Person/Artist/ComicsCreator
PERSON	Agent/Person/Artist/Dancer
PERSON	Agent/Person/Artist/FashionDesigner
PERSON	Agent/Person/Artist/Humorist
PERSON	Agent/Person/Artist/MusicalArtist
PERSON	Agent/Person/Artist/MusicalArtist/BackScene
PERSON	Agent/Person/Artist/MusicalArtist/ClassicalMusicArtist
PERSON	Agent/Person/Artist/MusicalArtist/Instrumentalist
PERSON	Agent/Person/Artist/MusicalArtist/Instrumentalist/Guitarist
PERSON	Agent/Person/Artist/MusicalArtist/MusicDirector
PERSON	Agent/Person/Artist/MusicalArtist/Singer
PERSON	Agent/Person/Artist/Painter
PERSON	Agent/Person/Artist/Photographer
PERSON	Agent/Person/Artist/Sculptor
PERSON	Agent/Person/Astronaut
PERSON	Agent/Person/Athlete
PERSON	Agent/Person/Athlete/ArcherPlayer
PERSON	Agent/Person/Athlete/AthleticsPlayer
PERSON	Agent/Person/Athlete/AustralianRulesFootballPlayer
PERSON	Agent/Person/Athlete/BadmintonPlayer
PERSON	Agent/Person/Athlete/BaseballPlayer
PERSON	Agent/Person/Athlete/BasketballPlayer
PERSON	Agent/Person/Athlete/Bodybuilder
PERSON	Agent/Person/Athlete/Boxer
PERSON	Agent/Person/Athlete/Boxer/AmateurBoxer
PERSON	Agent/Person/Athlete/BullFighter
PERSON	Agent/Person/Athlete/Canoeist
PERSON	Agent/Person/Athlete/ChessPlayer
PERSON	Agent/Person/Athlete/Cricketer
PERSON	Agent/Person/Athlete/Cyclist
PERSON	Agent/Person/Athlete/DartsPlayer
PERSON	Agent/Person/Athlete/Fencer
PERSON	Agent/Person/Athlete/GaelicGamesPlayer
PERSON	Agent/Person/Athlete/GolfPlayer
PERSON	Agent/Person/Athlete/GridironFootballPlayer
PERSON	Agent/Person/Athlete/GridironFootballPlayer/AmericanFootballPlayer
PERSON	Agent/Person/Athlete/GridironFootballPlayer/CanadianFootballPlayer
PERSON	Agent/Person/Athlete/Gymnast
PERSON	Agent/Person/Athlete/HandballPlayer
PERSON	Agent/Person/Athlete/HighDiver
PERSON	Agent/Person/Athlete/HorseRider
PERSON	Agent/Person/Athlete/Jockey
PERSON	Agent/Person/Athlete/LacrossePlayer
PERSON	Agent/Person/Athlete/MartialArtist
PERSON	Agent/Person/Athlete/MotorsportRacer
PERSON	Agent/Person/Athlete/MotorsportRacer/MotorcycleRider
PERSON	Agent/Person/Athlete/MotorsportRacer/MotorcycleRider/MotocycleRacer
PERSON	Agent/Person/Athlete/MotorsportRacer/MotorcycleRider/SpeedwayRider
PERSON	Agent/Person/Athlete/MotorsportRacer/RacingDriver
PERSON	Agent/Person/Athlete/MotorsportRacer/RacingDriver/DTMRacer
PERSON	Agent/Person/Athlete/MotorsportRacer/RacingDriver/FormulaOneRacer
PERSON	Agent/Person/Athlete/MotorsportRacer/RacingDriver/NascarDriver
PERSON	Agent/Person/Athlete/MotorsportRacer/RacingDriver/RallyDriver
PERSON	Agent/Person/Athlete/NationalCollegiateAthleticAssociationAthlete
PERSON	Agent/Person/Athlete/NetballPlayer
PERSON	Agent/Person/Athlete/PokerPlayer
PERSON	Agent/Person/Athlete/Rower
PERSON	Agent/Person/Athlete/RugbyPlayer
PERSON	Agent/Person/Athlete/SnookerPlayer
PERSON	Agent/Person/Athlete/SnookerPlayer/SnookerChamp
PERSON	Agent/Person/Athlete/SoccerPlayer
PERSON	Agent/Person/Athlete/SquashPlayer
PERSON	Agent/Person/Athlete/Surfer
PERSON	Agent/Person/Athlete/Swimmer
PERSON	Agent/Person/Athlete/TableTennisPlayer
PERSON	Agent/Person/Athlete/TeamMember
PERSON	Agent/Person/Athlete/TennisPlayer
PERSON	Agent/Person/Athlete/VolleyballPlayer
PERSON	Agent/Person/Athlete/VolleyballPlayer/BeachVolleyballPlayer
PERSON	Agent/Person/Athlete/WaterPoloPlayer
PERSON	Agent/Person/Athlete/WinterSportPlayer
PERSON	Agent/Person/Athlete/WinterSportPlayer/Biathlete
PERSON	Agent/Person/Athlete/WinterSportPlayer/BobsleighAthlete
PERSON	Agent/Person/Athlete/WinterSportPlayer/CrossCountrySkier
PERSON	Agent/Person/Athlete/WinterSportPlayer/Curler
PERSON	Agent/Person/Athlete/WinterSportPlayer/FigureSkater
PERSON	Agent/Person/Athlete/WinterSportPlayer/IceHockeyPlayer
PERSON	Agent/Person/Athlete/WinterSportPlayer/NordicCombined
PERSON	Agent/Person/Athlete/WinterSportPlayer/Skater
PERSON	Agent/Person/Athlete/WinterSportPlayer/Ski_jumper
PERSON	Agent/Person/Athlete/WinterSportPlayer/Skier
PERSON	Agent/Person/Athlete/WinterSportPlayer/SpeedSkater
PERSON	Agent/Person/Athlete/Wrestler
PERSON	Agent/Person/Athlete/Wrestler/SumoWrestler
PERSON	Agent/Person/BeautyQueen
PERSON	Agent/Person/BusinessPerson
PERSON	Agent/Person/Chef
PERSON	Agent/Person/Cleric
PERSON	Agent/Person/Cleric/Cardinal
PERSON	Agent/Person/Cleric/ChristianBishop
PERSON	Agent/Person/Cleric/ChristianBishop/Archbishop
PERSON	Agent/Person/Cleric/ChristianPatriarch
PERSON	Agent/Person/Cleric/Pope
PERSON	Agent/Person/Cleric/Priest
PERSON	Agent/Person/Cleric/Saint
PERSON	Agent/Person/Cleric/Vicar
PERSON	Agent/Person/Coach
PERSON	Agent/Person/Coach/AmericanFootballCoach
PERSON	Agent/Person/Coach/CollegeCoach
PERSON	Agent/Person/Coach/VolleyballCoach
PERSON	Agent/Person/Criminal
PERSON	Agent/Person/Criminal/Murderer
PERSON	Agent/Person/Criminal/Murderer/SerialKiller
PERSON	Agent/Person/Economist
PERSON	Agent/Person/Egyptologist
PERSON	Agent/Person/Engineer
PERSON	Agent/Person/Farmer
PERSON	Agent/Person/HorseTrainer
PERSON	Agent/Person/Journalist
PERSON	Agent/Person/Judge
PERSON	Agent/Person/Lawyer
PERSON	Agent/Person/Linguist
PERSON	Agent/Person/MemberResistanceMovement
PERSON	Agent/Person/MilitaryPerson
PERSON	Agent/Person/Model
PERSON	Agent/Person/Monarch
PERSON	Agent/Person/MovieDirector
PERSON	Agent/Person/Noble
PERSON	Agent/Person/OfficeHolder
PERSON	Agent/Person/OrganisationMember
PERSON	Agent/Person/OrganisationMember/SportsTeamMember
PERSON	Agent/Person/Philosopher
PERSON	Agent/Person/PlayboyPlaymate
PERSON	Agent/Person/Politician
PERSON	Agent/Person/Politician/Ambassador
PERSON	Agent/Person/Politician/Chancellor
PERSON	Agent/Person/Politician/Congressman
PERSON	Agent/Person/Politician/Deputy
PERSON	Agent/Person/Politician/Governor
PERSON	Agent/Person/Politician/Lieutenant
PERSON	Agent/Person/Politician/Mayor
PERSON	Agent/Person/Politician/MemberOfParliament
PERSON	Agent/Person/Politician/Minister
PERSON	Agent/Person/Politician/President
PERSON	Agent/Person/Politician/PrimeMinister
PERSON	Agent/Person/Politician/Senator
PERSON	Agent/Person/Politician/VicePresident
PERSON	Agent/Person/Politician/VicePrimeMinister
PERSON	Agent/Person/PoliticianSpouse
PERSON	Agent/Person/Presenter
PERSON	Agent/Person/Presenter/RadioHost
PERSON	Agent/Person/Presenter/TelevisionHost
PERSON	Agent/Person/Producer
PERSON	Agent/Person/Psychologist
PERSON	Agent/Person/Referee
PERSON	Agent/Person/Religious
PERSON	Agent/Person/RomanEmperor
PERSON	Agent/Person/Royalty
PERSON	Agent/Person/Royalty/BritishRoyalty
PERSON	Agent/Person/Royalty/BritishRoyalty/Baronet
PERSON	Agent/Person/Scientist
PERSON	Agent/Person/Scientist/Biologist
PERSON	Agent/Person/Scientist/Entomologist
PERSON	Agent/Person/Scientist/Medician
PERSON	Agent/Person/Scientist/Professor
PERSON	Agent/Person/SportsManager
PERSON	Agent/Person/SportsManager/SoccerManager
PERSON	Agent/Person/TelevisionDirector
PERSON	Agent/Person/TheatreDirector
PERSON	Agent/Person/Writer
PERSON	Agent/Person/Writer/Historian
PERSON	Agent/Person/Writer/MusicComposer
PERSON	Agent/Person/Writer/PlayWright
PERSON	Agent/Person/Writer/Poet
PERSON	Agent/Person/Writer/ScreenWriter
PERSON	Agent/Person/Writer/SongWriter
PERSON	PersonFunction
PERSON	PersonFunction/PoliticalFunction
PERSON	PersonFunction/Profession
PERSON	Person
PRODUCT	Activity/Game/BoardGame
PRODUCT	Activity/Game/CardGame
PRODUCT	Device
PRODUCT	Device/Battery
PRODUCT	Device/Camera
PRODUCT	Device/Camera/DigitalCamera
PRODUCT	Device/Engine
PRODUCT	Device/Engine/AutomobileEngine
PRODUCT	Device/Engine/RocketEngine
PRODUCT	Device/InformationAppliance
PRODUCT	Device/Instrument
PRODUCT	Device/Instrument/Guitar
PRODUCT	Device/Instrument/Organ
PRODUCT	Device/MobilePhone
PRODUCT	Device/Weapon
PRODUCT	Work/Artwork
PRODUCT	Work/Artwork/Painting
PRODUCT	Work/Artwork/Sculpture
PRODUCT	Work/Cartoon
PRODUCT	Work/Cartoon/Anime
PRODUCT	Work/Cartoon/HollywoodCartoon
PRODUCT	Work/CollectionOfValuables
PRODUCT	Work/CollectionOfValuables/Archive
PRODUCT	Work/Database
PRODUCT	Work/Database/BiologicalDatabase
PRODUCT	Work/Film
PRODUCT	Work/LineOfFashion
PRODUCT	Work/MusicalWork
PRODUCT	Work/MusicalWork/Album
PRODUCT	Work/MusicalWork/ArtistDiscography
PRODUCT	Work/MusicalWork/ClassicalMusicComposition
PRODUCT	Work/MusicalWork/Musical
PRODUCT	Work/MusicalWork/Opera
PRODUCT	Work/MusicalWork/Single
PRODUCT	Work/MusicalWork/Song
PRODUCT	Work/RadioProgram
PRODUCT	Work/Software
PRODUCT	Work/Software/VideoGame
PRODUCT	Work/TelevisionEpisode
PRODUCT	Work/TelevisionSeason
PRODUCT	Work/TelevisionShow
PRODUCT	Work/Website
PRODUCT	Work/WrittenWork/Article
PRODUCT	Work/WrittenWork/Book
PRODUCT	Work/WrittenWork/Book/Novel
PRODUCT	Work/WrittenWork/Book/Novel/LightNovel
PRODUCT	Work/WrittenWork/Comic
PRODUCT	Work/WrittenWork/Comic/ComicStrip
PRODUCT	Work/WrittenWork/Comic/Manga
PRODUCT	Work/WrittenWork/Comic/Manhua
PRODUCT	Work/WrittenWork/Comic/Manhwa
PRODUCT	Work/WrittenWork/Drama
PRODUCT	Work/WrittenWork/MultiVolumePublication
PRODUCT	Work/WrittenWork/PeriodicalLiterature
PRODUCT	Work/WrittenWork/PeriodicalLiterature/AcademicJournal
PRODUCT	Work/WrittenWork/PeriodicalLiterature/Magazine
PRODUCT	Work/WrittenWork/PeriodicalLiterature/Newspaper
PRODUCT	Work/WrittenWork/PeriodicalLiterature/UndergroundJournal
PRODUCT	Work/WrittenWork/Play
PRODUCT	Work/WrittenWork/Poem
SPECIES	Species
SPECIES	Species/Archaea
SPECIES	Species/Bacteria
SPECIES	Species/Eukaryote
SPECIES	Species/Eukaryote/Animal
SPECIES	Species/Eukaryote/Animal/Amphibian
SPECIES	Species/Eukaryote/Animal/Arachnid
SPECIES	Species/Eukaryote/Animal/Bird
SPECIES	Species/Eukaryote/Animal/Crustacean
SPECIES	Species/Eukaryote/Animal/Fish
SPECIES	Species/Eukaryote/Animal/Insect
SPECIES	Species/Eukaryote/Animal/Mammal
SPECIES	Species/Eukaryote/Animal/Mammal/Cat
SPECIES	Species/Eukaryote/Animal/Mammal/Dog
SPECIES	Species/Eukaryote/Animal/Mammal/Horse
SPECIES	Species/Eukaryote/Animal/Mollusca
SPECIES	Species/Eukaryote/Animal/Reptile
SPECIES	Species/Eukaryote/Fungus
SPECIES	Species/Eukaryote/Plant
SPECIES	Species/Eukaryote/Plant/ClubMoss
SPECIES	Species/Eukaryote/Plant/Conifer
SPECIES	Species/Eukaryote/Plant/CultivatedVariety
SPECIES	Species/Eukaryote/Plant/Cycad
SPECIES	Species/Eukaryote/Plant/Fern
SPECIES	Species/Eukaryote/Plant/FloweringPlant
SPECIES	Species/Eukaryote/Plant/FloweringPlant/Grape
SPECIES	Species/Eukaryote/Plant/Ginkgo
SPECIES	Species/Eukaryote/Plant/Gnetophytes
SPECIES	Species/Eukaryote/Plant/GreenAlga
SPECIES	Species/Eukaryote/Plant/Moss
SUBSTANCE	Biomolecule
SUBSTANCE	Biomolecule/Enzyme
SUBSTANCE	Biomolecule/Gene
SUBSTANCE	Biomolecule/Gene/HumanGene
SUBSTANCE	Biomolecule/Gene/MouseGene
SUBSTANCE	Biomolecule/Hormone
SUBSTANCE	Biomolecule/Lipid
SUBSTANCE	Biomolecule/Polysaccharide
SUBSTANCE	Biomolecule/Protein
SUBSTANCE	ChemicalSubstance
SUBSTANCE	ChemicalSubstance/ChemicalCompound
SUBSTANCE	ChemicalSubstance/ChemicalElement
SUBSTANCE	ChemicalSubstance/Drug
SUBSTANCE	ChemicalSubstance/Drug/CombinationDrug
SUBSTANCE	ChemicalSubstance/Drug/MonoclonalAntibody
SUBSTANCE	ChemicalSubstance/Drug/Vaccine
SUBSTANCE	ChemicalSubstance/Mineral
TITLE	Diploma
TRANSPORT	MeanOfTransportation
TRANSPORT	MeanOfTransportation/Aircraft
TRANSPORT	MeanOfTransportation/Aircraft/MilitaryAircraft
TRANSPORT	MeanOfTransportation/Automobile
TRANSPORT	MeanOfTransportation/Locomotive
TRANSPORT	MeanOfTransportation/MilitaryVehicle
TRANSPORT	MeanOfTransportation/Motorcycle
TRANSPORT	MeanOfTransportation/On-SiteTransportation
TRANSPORT	MeanOfTransportation/On-SiteTransportation/ConveyorSystem
TRANSPORT	MeanOfTransportation/On-SiteTransportation/Escalator
TRANSPORT	MeanOfTransportation/On-SiteTransportation/MovingWalkway
TRANSPORT	MeanOfTransportation/Rocket
TRANSPORT	MeanOfTransportation/Ship
TRANSPORT	MeanOfTransportation/SpaceShuttle
TRANSPORT	MeanOfTransportation/SpaceStation
TRANSPORT	MeanOfTransportation/Spacecraft
TRANSPORT	MeanOfTransportation/Train
TRANSPORT	MeanOfTransportation/TrainCarriage
TRANSPORT	MeanOfTransportation/Tram

Extract and Link Information