-
Notifications
You must be signed in to change notification settings - Fork 0
User's manual
DBpedia Spotlight is a tool for annotating mentions of DBpedia concepts in plain text.
We offer three basic functions: Annotate, Disambiguate and Candidates (Best K). They can be accessed from a Scala/Java API, REST Web Service and from a user interface on the Web (HTML/Javascript). For the Scala/Java API, there are a number of configuration parameters that can be used to instruct the annotation and disambiguation functions. The classes DefaultAnnotator, DefaultDisambiguator and DefaultParagraphDisambiguator offer the configuration that we found to provide the best results. The configuration interface offers ways to control the quality of the output of the two above tasks.
The DBpedia Spotlight Architecture is composed by the following modules:
- Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
- Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
- Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
- Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.
External dependencies:
- DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
- Lucene 2.9.3, providing the low level indexing framework used by DBpedia Spotlight.
- LingPipe 4.0.0, providing the string matching implementation used for the Spotter module.
- Java 1.6+
- Scala 2.9+
- Spotlight JAR
- Spotlight Library JARs
- Lucene disambiguation index
- Spotter dictionary
- large RAM to set the heap size big enough for the Spotter (approx. 8G)
- Maven 3 for the automagic installation of dependencies.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
If you want to use DBpedia Spotlight in your Java/Scala code, take a look at core/SpotlightFactory to see how you can create your objects, and then look at rest/Candidates.java to see how you can wire them together.
The Web Application is located at http://dbpedia-spotlight.github.io/demo/.
The Web Service is explained in detail at Web Service.
You can request different types of output by setting the Accept
[request header](<http://en.wikipedia.org/wiki/List_of_HTTP_header_fields "request header").
For example, in order to request JSON output, you can add Accept:application/json
to the request headers.
One example using cURL:
curl "http://spotlight.dbpedia.org/rest/annotate?text=President%20Michelle%20Obama%20called%20Thursday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20"\ -H "Accept:application/json"
The content types we currently support are:
text/html
application/xhtml+xml
text/xml
application/json
The application/xhtml+xml
comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.
If your input text is long, you may prefer using POST instead of GET.
curl -i -X POST \ -H "Accept:application/json" \ -H "content-type:application/x-www-form-urlencoded" \ -d disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package" \ http://spotlight.dbpedia.org/dev/rest/annotate/
Please note that you must use content-type application/x-www-form-urlencoded
for POST requests.
The following are 4 examples, each consists of a query url and the result.
http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/United_States_Congress"
support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
<Resource URI="http://dbpedia.org/resource/Tax_break"
support="32" types="" surfaceForm="tax break" offset="57"
similarityScore="0.35041093826293945" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/Student"
support="1701" types="" surfaceForm="students" offset="71"
similarityScore="0.32534149289131165" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/Policy"
support="557" types="" surfaceForm="policy" offset="148"
similarityScore="0.3228176236152649" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>
http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&types=Person,Organisation
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20" types="Person,Organisation">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/United_States_Congress"
support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
</Resources>
</Annotation>
http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&sparql=SELECT+DISTINCT+%3Fx%0D%0AWHERE+%7B%0D%0A%3Fx+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FOfficeHolder%3E+.%0D%0A%3Fx+%3Frelated+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FChicago%3E+.%0D%0A%7D
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20"
sparql="SELECT DISTINCT ?x WHERE { ?x a <http://dbpedia.org/ontology/OfficeHolder>; .
?x ?related <http://dbpedia.org/resource/Chicago>; }"
policy="whitelist">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.2730408310890198" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>
The parameters are the same as in Example 1, but you will send your request to http://spotlight.dbpedia.org/rest/candidates
http://spotlight.dbpedia.org/rest/candidates?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20
returns XML
<annotation text="President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials. ">
<surfaceForm name="individuals" offset="67">
<resource label="Individual" uri="Individual" contextualScore="0.26683980226516724" percentageOfSecondRank="-1.0" support="312" priorScore="0.0" finalScore="0.26683980226516724"/>
<resource label="The Individuals (New Jersey band)" uri="The_Individuals_%28New_Jersey_band%29" contextualScore="0.011762913316488266" percentageOfSecondRank="-1.0" support="17" priorScore="0.0" finalScore="0.011762913316488266"/>
<resource label="The Individuals (Chicago band)" uri="The_Individuals_%28Chicago_band%29" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
</surfaceForm>
<surfaceForm name="officials" offset="233">
<resource label="Official" uri="Official" contextualScore="0.1324356347322464" percentageOfSecondRank="-1.0" support="196" priorScore="0.0" finalScore="0.1324356347322464"/>
<resource label="Rugby league match officials" uri="Rugby_league_match_officials" contextualScore="0.04376954212784767" percentageOfSecondRank="-1.0" support="9" priorScore="0.0" finalScore="0.04376954212784767"/>
</surfaceForm>
<surfaceForm name="President Obama" offset="0">
<resource label="Presidency of Barack Obama" uri="Presidency_of_Barack_Obama" contextualScore="0.5634340643882751" percentageOfSecondRank="-1.0" support="134" priorScore="0.0" finalScore="0.5634340643882751"/>
</surfaceForm>
<surfaceForm name="1 million" offset="97">
<resource label="Million" uri="Million" contextualScore="0.527919590473175" percentageOfSecondRank="-1.0" support="492" priorScore="0.0" finalScore="0.527919590473175"/>
</surfaceForm>
<surfaceForm name="percentage" offset="156">
<resource label="Percentage" uri="Percentage" contextualScore="0.6362485885620117" percentageOfSecondRank="-1.0" support="165" priorScore="0.0" finalScore="0.6362485885620117"/>
</surfaceForm>
<surfaceForm name="earnings" offset="176">
<resource label="Income" uri="Income" contextualScore="0.5776156187057495" percentageOfSecondRank="-1.0" support="648" priorScore="0.0" finalScore="0.5776156187057495"/>
</surfaceForm>
<surfaceForm name="taxpayers" offset="194">
<resource label="Tax" uri="Tax" contextualScore="0.7484055757522583" percentageOfSecondRank="-1.0" support="1540" priorScore="0.0" finalScore="0.7484055757522583"/>
<resource label="TaxPayers' Alliance" uri="TaxPayers%27_Alliance" contextualScore="0.12765906751155853" percentageOfSecondRank="-1.0" support="15" priorScore="0.0" finalScore="0.12765906751155853"/>
<resource label="The Taxpayer (Luxembourg)" uri="The_Taxpayer_%28Luxembourg%29" contextualScore="0.024930020794272423" percentageOfSecondRank="-1.0" support="3" priorScore="0.0" finalScore="0.024930020794272423"/>
<resource label="The Taxpayers" uri="The_Taxpayers" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
</surfaceForm>
</annotation>
Project
- Introduction
- Glossary
- User's manual
- Web application
- Installation
- Internationalization
- Licenses
- Researcher
- How to cite
- Support and Feedback
- Troubleshooting
- Team
- Acknowledgements
Statistical backend
Lucene backend
- Introduction
- Downloads
- Architecture
- Internationalization
- Web service parameters / API
- Splitting occurrences into topics
Developers