IR based implementation (Lucene)

The original DBpedia Spotlight implementation uses Apache Lucene for disambiguation and LingPipe for spotting. Pre-built indexes and spotter models are available for English.

Downloads

DBpedia Spotlight looks for ~3.5M things of ~320 types in text and tries to disambiguate them to their global unique identifiers in DBpedia. It uses the entire Wikipedia in order to learn how to annotate DBpedia Resources, the entire dataset cannot be distributed alongside the code, and can be downloaded in varied sizes from the download page. A tiny dataset is included in the distribution for demonstration purposes only. After you've downloaded the files, you need to modify the configuration in server.properties with the correct path to the files. More info here.

The corpus used to evaluate DBpedia Spotlight in this work is described here.

DBpedia Spotlight - Shedding Light on the Web of Documents

Home

Project

Statistical backend

Lucene backend

Developers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR based implementation (Lucene)

Downloads

Clone this wiki locally