-
Notifications
You must be signed in to change notification settings - Fork 0
IR based implementation (Lucene)
The original DBpedia Spotlight implementation uses Apache Lucene for disambiguation and LingPipe for spotting. Pre-built indexes and spotter models are available for English.
DBpedia Spotlight looks for ~3.5M things of ~320 types in text and tries to disambiguate them to their global unique identifiers in DBpedia. It uses the entire Wikipedia in order to learn how to annotate DBpedia Resources, the entire dataset cannot be distributed alongside the code, and can be downloaded in varied sizes from the download page. A tiny dataset is included in the distribution for demonstration purposes only. After you've downloaded the files, you need to modify the configuration in server.properties with the correct path to the files. More info here.
The corpus used to evaluate DBpedia Spotlight in this work is described here.
Project
- Introduction
- Glossary
- User's manual
- Web application
- Installation
- Internationalization
- Licenses
- Researcher
- How to cite
- Support and Feedback
- Troubleshooting
- Team
- Acknowledgements
Statistical backend
Lucene backend
- Introduction
- Downloads
- Architecture
- Internationalization
- Web service parameters / API
- Splitting occurrences into topics
Developers