-
Notifications
You must be signed in to change notification settings - Fork 0
Build from Source with Maven
Requirements
- Java 1.6+
- Scala 2.9+
- Maven (see note on versions below)
- Git
- RAM of appropriate size for the spotter lexicon you need
index
requires Java 1.7.
Install pre-requisites:
NOTE: The latest version (0.6.5 and newer) builds only with Maven3.
sudo apt-get install git maven3
If you also want to run the demo in your server, install Apache
sudo apt-get install apache2
Checkout all code using the command:
git clone https://github.com/dbpedia-spotlight/dbpedia-spotlight.git
Run install through Maven
cd dbpedia-spotlight-* mvn install
This mvn install from the parent pom.xml is important because it runs install-file for some jars distributed alongside the source code.
After installing the software, in order to run a Web service in your machine, also need the disambiguation index and the spotter lexicon, change the conf/server.properties
file to point to those files, and run mvn scala:run '-DaddArgs=../conf/server.properties'
from the rest directory. Get the necessary files. See http://spotlight.dbpedia.org/download/
Depending on the files you choose (small, medium, large) you will need different RAM requirements. With the largest dictionary, you will need close to 16GB of RAM. This parameter can be configured within pom.xml inside the rest directory.
mvn scala:run '-DaddArgs=../conf/server.properties'
Get DBpedia Extraction
hg clone http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework cd extraction_framework mvn install
Edit the file dbpedia-spotlight-latest/pom.xml
and leave only the modules core, index, rest and demo.
Run install through Maven
cd dbpedia-spotlight-* mvn install
Follow instructions in dbpedia-spotlight-*/bin/index.sh. See also at the Data Generation Manual and Internationalization Manual to learn more about the steps to create your own datasets for DBpedia Spotlight.
Some frequently observed errors are collected below.
if you experience problems with missing dependencies while doing mvn install on the project, you might want to check your installed version of Maven.
Error:
org.apache.maven.reactor.MavenExecutionException: Could not find the model file '/usr/local/spotlight/trunk/jung'. for project unknown
Solution: The only required modules for running the web service are: core, rest and demo (if you want the HTML interface as well). If you do not need to index, you can remove every other module from the parent pom.xml The only required modules for running indexing are: core and index. You can remove the other modules from the parent pom.xml
Error:
Memory error, heap space
You may need to update your pom.xml with adequate heap space for the dictionary file you are using.
<properties> <heapspace.Xmx.server>-Xmx16g</heapspace.xmx.server> </properties>
The memory requirements are directly tied to your target lexicon, as our most rudimentary implementation loads the entire lexicon into memory in order to speed up spotting.
You can build a dictionary of People, Locations and Organizations with about 200M of RAM. See the one that I included in the distribution, for example. http://dbp-spotlight.svn.sourceforge.net/viewvc/dbp-spotlight/tags/release-0.5/dist/src/deb/control/data/usr/share/dbpedia-spotlight/spotter.dict?view=log
You can also download the dictionary built from URIs that occurred more than 75 times in Wikipedia: http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz
This should load with a lot less (maybe 5x) less RAM than the one we use in production. And it will spot the most important things anyways.
See: http://sourceforge.net/mailarchive/message.php?msg_id=28255247
For some dependencies that either did not have a maven repo or that we had to patch, we distribute the jars alongside our code, and install them via install-file in the parent pom.xml. Make sure you run *mvn install* from the parent directory (e.g. /home/user/workspace/dbpedia-spotlight-*/)
Error:
(Failed to execute goal on project core: Could not resolve dependencies for project org.dbpedia.spotlight:core:jar:0.5) dependencies are missing for: org.semanticweb.yars:nx-parser:jar:1.1 com.aliasi:lingpipe:jar:4.0.0 edu.umd:cloud9:jar:SNAPSHOT weka:weka:jar:3.7.3
Solution:
cd /home/user/workspace/dbpedia-spotlight-*/ mvn install
In case you are using Maven3 and still could not solve the problem, refer to http://sourceforge.net/mailarchive/forum.php?thread_name=CA%2B3KvkOfTzMsdwUutx625WZK6VOJApADyKatmwQo2Gv49AbmqQ%40mail.gmail.com&forum_name=dbp-spotlight-users
problems with jersey dependencies: " This is due to the glassfish repository, which is hardcoded in the jerser-server-1.1.5.pom, returning a junk artifact (some HTML with a nginx message instead of a real pom).
You can work around this by adding this to the "mirrors" section of your $HOME/.m2/settings.xml:
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> ... <mirrors> <mirror> <id>glassfish-mirror</id> <name>glassfish mirror</name> <url>http://maven.nuxeo.org/nexus/content/repositories/public-releases</url> <mirrorOf>glassfish-repository</mirrorof> </mirror> </mirrors> ... </settings>
and removing all "com.sun.jersey" artifacts from your local repository (rm -rf ~/.m2/repository/com/sun/jersey) " (http://answers.nuxeo.com/questions/2195/cant-build-nuxeo-source-nuxeo-webengine-jax-rs-jersey-server-error)
If this problem occurs when installing dbpedia spotlight, try running (in root folder of the project):
1) mvn --non-recursive clean install 2) mvn clean install
Server needs dictionary and other data files.You need to download all this file and edit paths in server.properties to point to these files. If you don't know what files are needed,you can simplely download the quick start jar version and look into its data folder and compare to its server.properties.you may also need stopwords file like this.
Project
- Introduction
- Glossary
- User's manual
- Web application
- Installation
- Internationalization
- Licenses
- Researcher
- How to cite
- Support and Feedback
- Troubleshooting
- Team
- Acknowledgements
Statistical backend
Lucene backend
- Introduction
- Downloads
- Architecture
- Internationalization
- Web service parameters / API
- Splitting occurrences into topics
Developers