forked from dbpedia-spotlight/dbpedia-spotlight
-
Notifications
You must be signed in to change notification settings - Fork 0
Glossary
sandroacoelho edited this page Jul 26, 2013
·
3 revisions
- Context: the context refers to the "the parts of something written or spoken that immediately precede and follow a word or passage and clarify its meaning." source
- OntologyClass: an ontology class represents a set of resources sharing similar characteristics. Resources can be of several types: Person, Organisation, Location, FloweringPlant, etc. All of these classes are organized in a domain model (i.e. schema, ontology). The "type" or the "ontology class" of a resource comes from this ontology.
Phrase Recognition: See Spotting.
- Resource: a resource is any entity or concept in our target knowledge base (e.g. DBpedia). We take this name from RDF (Resource Description Framework), as a generic name for things, concepts, ideas "that can be identified on the Web, even when they cannot be directly retrieved on the Web."
- Spotting: We call Spotting or Phrase Recognition the task of selecting, from some textual document given as input, phrases that should be annotated by the system. This is closely related to Keyphrase Extraction and Named Entity Recognition, for instance. In Keyphrase Extraction, the system tries to guess the "important" phrases, according to some definition of importance. Meanwhile, in Named Entity Recognition, the system focuses on specific entity types (commonly Person, Location and Organization), and the notion of importance is usually irrelevant. We describe some of these and several other strategies for phrase recognition below.
- SurfaceForm: a surface form is the phrase used to refer to a resource in text. For example: "Barack Obama", "President Obama" and "Obama" are all surface forms for the resource `dbpedia:Obama`.
- Token: each individual element extracted after tokenizing the text more. Tokens are the individual words in the context, or slightly modified versions of these words (e.g. running -> run)
- Topic: a topic is a broad categorization of knowledge into areas of interest. For example, text can belong to Business, Politics, Sports or Arts topics.
Project
- Introduction
- Glossary
- User's manual
- Web application
- Installation
- Internationalization
- Licenses
- Researcher
- How to cite
- Support and Feedback
- Troubleshooting
- Team
- Acknowledgements
Statistical backend
Lucene backend
- Introduction
- Downloads
- Architecture
- Internationalization
- Web service parameters / API
- Splitting occurrences into topics
Developers