Skip to content
Alexandre Cançado Cardoso edited this page Oct 22, 2013 · 11 revisions

Last update: May, 23. 2013 (Licenses last update: Aug, 29. 2013)

NLP tools

Library Description Spotlight version Current version License
Apache OpenNLP Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text 1.5.2 1.5.3 ASL 2.0
LingPipe A tool kit for processing text using computational linguistics 4.1 4.1 Alias-i Royalty Free License v.1
Maxent OpenNLP Maximum entropy modeling is a framework for integrating information from many heterogeneous information sources for classification 3.0 3.0 ASL 2.0 / LGPL v.2
Scala AhoCorasick This is an imperative implementation of the Aho-Corasick string-matching algorithm written entirely in Scala 0.1 0.1 ASL 2.0
ScalaNLP ScalaNLP is a suite of Natural Language Processing,Machine Learning and numerical computing libraries 0.1 0.3-SNAPSHOT ASL 2.0

Machine learning tools

Library Description Spotlight version Current version License
Weka A collection of machine learning algorithms for data mining tasks that contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization 3.7.3 3.7.8 GPL


Library Description Spotlight version Current version License
Grizzly Grizzlys goal is to help developers to build scalable and robust servers using NIO as well as offering extended framework components: Web Framework (HTTP/S), WebSocket, Comet, and more 1.9.48 2.3.2 CDDL v.1.1
Jersey (CLient/Grizzly/Bundle/Server) Jersey is the open source, production quality, JAX-RS (JSR 311) Reference Implementation for building RESTful Web services 1.10 2.0-m12 CDDL v.1.1

Search Engine / Backend tools

Library Description Spotlight version Current version License
Cloud9 A collection of Hadoop tools that tries to make working with big data a bit easier SNAPSHOT 1.4.14 ASL 2.0
Fastutil Providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion 6.3 6.5.4 ASL 2.0
HSQLDB HSQLDB (HyperSQL DataBase) is a SQL relational database engine written in Java that offers a small, fast multithreaded and transactional database engine with in-memory and disk-based tables and supports embedded and server modes 2.2.9 HSQLDB License
Kea-goss-weka An implementation of keyphrases extraction algorithm 5.0 5.0 GPL
JDBM JDBM provides TreeMap, HashMap and other collections backed up by disk storage 3.0-SNAPSHOT MapDB ASL 2.0
Lucene (Core, Analyzer, Queries, Misc, Phonetic) information retrieval software library 3.6 4.0 ASL 2.0
Mahout collection The Mahout Collections library is a set of container classes that address some limitations/performance problems of the standard collections in Java. 1.0 1.0 ASL 2.0
Trove The Trove library provides high speed regular and primitive collections for Java 1.1-beta-5 3.0.3 LGPL v.2

Parsing tools

Library Description Spotlight version Current version License
Boilerpipe boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page 1.1.0 1.2.0 ASL 2.0
Jackson JSON Processor JSON processor (JSON parser + JSON generator) written in Java 1.9.8 2.2 ASL 2.0
Jericho HTML Parser Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML 3.1 3.2 EPL v1.0 /LGPL v.2
Jettison Jettison is a collection of Java APIs (like STaX and DOM) which read and write JSON 1.3 1.3.3 ASL 2.0
JSON Lib JSON-lib is a java library for transforming beans, maps, collections, java arrays and XML to JSON and back again to beans and DynaBeans jdk15 (?) jdk15 (?) ASL 2.0
Liftweb JSON JSON Library 2.5-M1 2.5-M1 ASL 2.0
Neko HTML Simple html scanner to parse HTML documents and access the information using standard XML interfaces 0.9.5 1.9.18 ASL 2.0
NxParser NxParser is a Java open source, streaming, non-validating parser for the Nx format, where x = Triples, Quads, or any other number 1.1 1.2.3 BSD-3-Clause
OpenCSV Opencsv is a very simple csv (comma-separated values) parser library for Java 2.0 2.3 ASL 2.0
OpenRDF (RIO) RDF parsers and -writers for various RDF file formats 1.0.10 1.0.10 LGPL v.2
XML-APIS Xml-commons provides an Apache-hosted set of DOM, SAX, and JAXP interfaces for use in other xml-based projects 1.0.b2 2.02(has been retired) ASL 2.0
Stax StAX is a standard XML processing API that allows you to stream XML data from and to your application 1.0.1 1.2.0 ASL 2.0
Xerces Parser for XML 2.6.2(Parser API)/2.9.1(XercesImpl) ? ASL 2.0
XOM is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order 1.2.5 1.2.9 LGPL v.2
XStream XStream is a simple library to serialize objects to XML and back again 1.3.1 1.4.4 XStream License

Logging tools

Library Description Spotlight version Current version License
Log4j Apache log4j is a Java-based logging utility. 1.2.16 2.x ASL 2.0

Testing tools

Library Description Spotlight version Current version License
JUnit Unit testing framework for Java 4.8.2 4.11 EPL v1.0
ScalaTest Unit testing framework for Scala 2.0.M4 2.0.M5b ASL 2.0

Miscellaneous tools

Library Description Spotlight version Current version License
Akka actor Process messages asynchronously using an event-driven receive loop 2.0.5 2.1.4 ASL 2.0
Apache Commons Codec Apache Commons Codec (TM) software provides implementations of common encoders and decoders such as Base64, Hex, Phonetic and URLs 1.6 1.8 ASL 2.0
Apache Commons Compress Apache Commons Compress library defines an API for working with ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200 and bzip2 files 1.4.1 1.5 ASL 2.0
Apache Commons IO Commons IO is a library of utilities to assist with developing IO functionality 2.4 2.4 ASL 2.0
Apache Commons Lang Reusable Java components/The standard Java libraries fail to provide enough methods for manipulation of its core classes. Apache Commons Lang provides these extra methods 2.5 3.1 ASL 2.0
Apache Commons Math Commons Math is a library of lightweight, self-contained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang 2.2 3.2 ASL 2.0
Guava Collections, caching, primitives support, concurrency libraries, common annotations, string processing, I/O r07 r14 ASL 2.0
Http Client Rich package that implementing the client side of the most recent HTTP standards and recommendations 4.2 4.2.5 - Replaced by Apache HttpComponents ASL 2.0
Kryo Fast and efficient object graph serialization framework for Java 2.20 2.21 BSD-3-Clause
JSON (binder for Java) JSON data interchange format support in Java 20090211 ? JSON License
Paranamer It is a library that allows the parameter names of non-private methods and constructors to be accessed at runtime (Reflection) 2.3 2.3 BSD-3-Clause
Scalaz It provides purely functional data structures to complement those from the Scala standard library. It defines a set of foundational type classes (e.g. Functor, Monad) and corresponding instances for a large number of data structure 6.0.4 7.0.0 Scalaz License
Clone this wiki locally