Skip to content
sandroacoelho edited this page Jul 26, 2013 · 6 revisions

DBpedia Spotlight is a tool for annotating mentions of DBpedia concepts in plain text.

We offer three basic functions: Annotate, Disambiguate and Candidates (Best K). They can be accessed from a Scala/Java API, REST Web Service and from a user interface on the Web (HTML/Javascript). For the Scala/Java API, there are a number of configuration parameters that can be used to instruct the annotation and disambiguation functions. The classes DefaultAnnotator, DefaultDisambiguator and DefaultParagraphDisambiguator offer the configuration that we found to provide the best results. The configuration interface offers ways to control the quality of the output of the two above tasks.

Architecture

The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
  • Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
  • Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.

External dependencies:

  • DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
  • Lucene 2.9.3, providing the low level indexing framework used by DBpedia Spotlight.
  • LingPipe 4.0.0, providing the string matching implementation used for the Spotter module.

System Requirements

  • Java 1.6+
  • Scala 2.9+
  • Spotlight JAR
  • Spotlight Library JARs
  • Lucene disambiguation index
  • Spotter dictionary
  • large RAM to set the heap size big enough for the Spotter (approx. 8G)
  • Maven 3 for the automagic installation of dependencies.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

Programmatic usage

If you want to use DBpedia Spotlight in your Java/Scala code, take a look at core/SpotlightFactory to see how you can create your objects, and then look at rest/Candidates.java to see how you can wire them together.

Online Usage

Web Application

The Web Application is located at http://dbpedia-spotlight.github.io/demo/.

Web Service

The Web Service is explained in detail at Web Service.

Content Negotiation

You can request different types of output by setting the Accept [request header](<http://en.wikipedia.org/wiki/List_of_HTTP_header_fields "request header"). For example, in order to request JSON output, you can add Accept:application/json to the request headers.

One example using cURL:

curl "http://spotlight.dbpedia.org/rest/annotate?text=President%20Michelle%20Obama%20called%20Thursday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20"\
 -H "Accept:application/json"

The content types we currently support are:

  • text/html
  • application/xhtml+xml
  • text/xml
  • application/json

The application/xhtml+xml comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.

If your input text is long, you may prefer using POST instead of GET.

curl -i -X POST \
    -H "Accept:application/json" \
    -H "content-type:application/x-www-form-urlencoded" \
    -d disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package" \
       http://spotlight.dbpedia.org/dev/rest/annotate/

Please note that you must use content-type application/x-www-form-urlencoded for POST requests.

The following are 4 examples, each consists of a query url and the result.

Example 1: without type restriction

http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20">
    <Resources>
      <Resource URI="http://dbpedia.org/resource/Barack_Obama"
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
        similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/United_States_Congress"
        support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36" 
        similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
      <Resource URI="http://dbpedia.org/resource/Tax_break"
        support="32" types="" surfaceForm="tax break" offset="57"
        similarityScore="0.35041093826293945" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/Student"
        support="1701" types="" surfaceForm="students" offset="71"
        similarityScore="0.32534149289131165" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/Policy"
        support="557" types="" surfaceForm="policy" offset="148"
        similarityScore="0.3228176236152649" percentageOfSecondRank="-1.0"/>
    </Resources>
  </Annotation>

Example 2: with type restriction

http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&types=Person,Organisation

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20" types="Person,Organisation">
    <Resources>
      <Resource URI="http://dbpedia.org/resource/Barack_Obama"
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0" 
        similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/United_States_Congress"
        support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36" 
        similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
    </Resources>
  </Annotation>

Example 3: with SPARQL restriction

http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&sparql=SELECT+DISTINCT+%3Fx%0D%0AWHERE+%7B%0D%0A%3Fx+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FOfficeHolder%3E+.%0D%0A%3Fx+%3Frelated+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FChicago%3E+.%0D%0A%7D

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20" 
  sparql="SELECT DISTINCT ?x WHERE { ?x a <http://dbpedia.org/ontology/OfficeHolder>; . 
  ?x ?related <http://dbpedia.org/resource/Chicago>;  }" 
  policy="whitelist"> 
    <Resources> 
      <Resource URI="http://dbpedia.org/resource/Barack_Obama" 
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0" 
        similarityScore="0.2730408310890198" percentageOfSecondRank="-1.0"/> 
    </Resources> 
  </Annotation> 

Example 4: Candidates Interface

The parameters are the same as in Example 1, but you will send your request to http://spotlight.dbpedia.org/rest/candidates

http://spotlight.dbpedia.org/rest/candidates?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20

returns XML

  <annotation text="President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials. ">
  <surfaceForm name="individuals" offset="67">
    <resource label="Individual" uri="Individual" contextualScore="0.26683980226516724" percentageOfSecondRank="-1.0" support="312" priorScore="0.0" finalScore="0.26683980226516724"/>
    <resource label="The Individuals (New Jersey band)" uri="The_Individuals_%28New_Jersey_band%29" contextualScore="0.011762913316488266" percentageOfSecondRank="-1.0" support="17" priorScore="0.0" finalScore="0.011762913316488266"/>
    <resource label="The Individuals (Chicago band)" uri="The_Individuals_%28Chicago_band%29" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
  </surfaceForm>
  <surfaceForm name="officials" offset="233">
    <resource label="Official" uri="Official" contextualScore="0.1324356347322464" percentageOfSecondRank="-1.0" support="196" priorScore="0.0" finalScore="0.1324356347322464"/>
    <resource label="Rugby league match officials" uri="Rugby_league_match_officials" contextualScore="0.04376954212784767" percentageOfSecondRank="-1.0" support="9" priorScore="0.0" finalScore="0.04376954212784767"/>
  </surfaceForm>
  <surfaceForm name="President Obama" offset="0">
    <resource label="Presidency of Barack Obama" uri="Presidency_of_Barack_Obama" contextualScore="0.5634340643882751" percentageOfSecondRank="-1.0" support="134" priorScore="0.0" finalScore="0.5634340643882751"/>
  </surfaceForm>
  <surfaceForm name="1 million" offset="97">
    <resource label="Million" uri="Million" contextualScore="0.527919590473175" percentageOfSecondRank="-1.0" support="492" priorScore="0.0" finalScore="0.527919590473175"/>
  </surfaceForm>
  <surfaceForm name="percentage" offset="156">
    <resource label="Percentage" uri="Percentage" contextualScore="0.6362485885620117" percentageOfSecondRank="-1.0" support="165" priorScore="0.0" finalScore="0.6362485885620117"/>
  </surfaceForm>
  <surfaceForm name="earnings" offset="176">
    <resource label="Income" uri="Income" contextualScore="0.5776156187057495" percentageOfSecondRank="-1.0" support="648" priorScore="0.0" finalScore="0.5776156187057495"/>
  </surfaceForm>
  <surfaceForm name="taxpayers" offset="194">
    <resource label="Tax" uri="Tax" contextualScore="0.7484055757522583" percentageOfSecondRank="-1.0" support="1540" priorScore="0.0" finalScore="0.7484055757522583"/>
    <resource label="TaxPayers&apos; Alliance" uri="TaxPayers%27_Alliance" contextualScore="0.12765906751155853" percentageOfSecondRank="-1.0" support="15" priorScore="0.0" finalScore="0.12765906751155853"/>
    <resource label="The Taxpayer (Luxembourg)" uri="The_Taxpayer_%28Luxembourg%29" contextualScore="0.024930020794272423" percentageOfSecondRank="-1.0" support="3" priorScore="0.0" finalScore="0.024930020794272423"/>
    <resource label="The Taxpayers" uri="The_Taxpayers" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
  </surfaceForm>
  </annotation>
Clone this wiki locally