-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of github.com:nddipiazza/tika-fork
- Loading branch information
Showing
1 changed file
with
5 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,20 @@ | ||
# tika-fork | ||
|
||
Utility that allows you to run Tika as a forked JVM to minimize memory issues. | ||
Utility that allows you to run Tika as a pool of forked JVMs to minimize memory issues. | ||
|
||
## Motiviation | ||
|
||
It is a common issue when dealing with Tika to have parses that cause your entire JVM to crash due to out-of-memory conditions. | ||
It is a common issue when dealing with Tika to have parses that cause your entire JVM to crash due to out-of-memory conditions. There are some parameters that are intended to prevent these issues but the issues can still happen from time to time as described in https://issues.apache.org/jira/browse/TIKA-2575 | ||
|
||
There are also problems where a Tika parse will not return in sufficient time due to GC hell or some other CPU intense process and will cause issues. | ||
|
||
This program attempts to deal with these problems: | ||
|
||
* Launches a pool of forked JVMs that are all limited by the amount of memory they can use. | ||
* Uses sockets (not HTTP) to send a stream your document content to the Tika parser, and to receive back a stream of metadata and a stream of the parsed content. | ||
* Uses sockets (not HTTP) to send a stream of your document content to the Tika parser, and to receive back a stream of metadata and a stream of the parsed content. | ||
* Uses commons-pool to provide fine-grained control the pool of the forked Tika JVMs. | ||
* Provides a very simple "abortAfterMs" parameter to the parse that will throw a TimeoutException if too much time is taken. This will result in the forked JVM to be aborted. This is useful in the situations where the JVM went into GC hell eating tons of CPU and never returning. | ||
* Provides a "abortAfterMs" parameter to the parse method that will throw a TimeoutException if too much time is taken. This will result in the forked JVM to be aborted. This is useful in the situations where the JVM went into GC hell eating tons of CPU and never returning. | ||
|
||
## Usage | ||
|
||
See the [Tika Fork Process Unit Test](fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program. | ||
See the [Tika Fork Process Unit Test](tika-fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program. |