2006年10月26日

Definitions of terms used in Information Extraction

Attribute
a property of an entity such as its name, alias, descriptor, or type

Annotation
mark up of a text span in a specific format that indicates a feature or features of the text within the span

Benchmark
assessment of performance according to standard measures

Data
textual input for an information extraction system

Dataset
a set of newswire texts chosen according to pre-specified conditions and meant to represent a rich text stream

Database
data in tabular format stored with the assistance of a relational database management system

Developer
a researcher who implements a system

Dry Run
an end-to-end practice run of an evaluation

Entity
an object of interest such as a person or organization

Evaluation
assessment of performance according to agreed upon measures

Event
an activity or occurrence of interest such as a terrorist act or an airline crash

Fact
a relationship held between two or more entities

Formal Test Material
a blind dataset, task definitions, test procedure, answer keys, and scoring software

Formal Run
the "official" evaluation

Information Extraction
the extraction or pulling out of pertinent information from large volumes of texts

Information Extraction Systems
an automated system to extract pertinent information from large volumes of text

Information Extraction Technologies
techniques used to automatically extract specified information from text

Metrics
pre-defined measures of performance calculable by comparison of system output with human-generated answer keys

MUC
Message Understanding Conference held at the end of the evaluation and attended only by participants and invited potential customers

Named Entity
a named object of interest such as a person, organization, or location

SAIC
Science Applications International Corporation

Scoring Software
fully automated software for the comparison of system performance against answer keys that tallies and reports metrics and error types for developers and evaluators

Search Engine
software which gives relevance rankings to documents in a collection based on a user query

Sources of News
edited electronic feeds from established news organizations such as the Wall Street Journal and the New York Times News Service

Statistical Algorithm
algorithm to determine the statistical significance of evaluation results

Systems Integration
building a system from off-the-shelf components to accomplish a job previously not automated

Systems Integrator
builder of a system from off-the-shelf components

Task Definition
document which defines the format and criteria for annotation or extraction of text and placement into a database or template. For example, task definitions give general guidelines and examples for the extraction of named entities, attributes, facts, and events from texts.

Text
electronically encoded alphabetic material from some human language

Training
process by which a system learns about a dataset



Source: http://www-nlpir.nist.gov/related_projects/muc/info/definitions.html

没有评论: