From bdbf8367ecf2709390c35c178eb9c6c6cb61495b Mon Sep 17 00:00:00 2001 From: Nicholas DiPiazza Date: Wed, 29 May 2019 08:52:46 -0500 Subject: [PATCH 1/3] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 4060154..6a34209 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ # tika-fork -Utility that allows you to run Tika as a forked JVM to minimize memory issues. +Utility that allows you to run Tika as a pool of forked JVMs to minimize memory issues. ## Motiviation -It is a common issue when dealing with Tika to have parses that cause your entire JVM to crash due to out-of-memory conditions. +It is a common issue when dealing with Tika to have parses that cause your entire JVM to crash due to out-of-memory conditions. There are some parameters that are intended to prevent these issues but the issues can still happen from time to time as described in https://issues.apache.org/jira/browse/TIKA-2575 There are also problems where a Tika parse will not return in sufficient time due to GC hell or some other CPU intense process and will cause issues. @@ -17,4 +17,4 @@ This program attempts to deal with these problems: ## Usage -See the [Tika Fork Process Unit Test](fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program. \ No newline at end of file +See the [Tika Fork Process Unit Test](fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program. From 4c5c26a7edb77228194b62c6b47ac939fe6057a4 Mon Sep 17 00:00:00 2001 From: Nicholas DiPiazza Date: Wed, 29 May 2019 08:58:56 -0500 Subject: [PATCH 2/3] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6a34209..8d72b68 100644 --- a/README.md +++ b/README.md @@ -11,9 +11,9 @@ There are also problems where a Tika parse will not return in sufficient time du This program attempts to deal with these problems: * Launches a pool of forked JVMs that are all limited by the amount of memory they can use. -* Uses sockets (not HTTP) to send a stream your document content to the Tika parser, and to receive back a stream of metadata and a stream of the parsed content. +* Uses sockets (not HTTP) to send a stream of your document content to the Tika parser, and to receive back a stream of metadata and a stream of the parsed content. * Uses commons-pool to provide fine-grained control the pool of the forked Tika JVMs. -* Provides a very simple "abortAfterMs" parameter to the parse that will throw a TimeoutException if too much time is taken. This will result in the forked JVM to be aborted. This is useful in the situations where the JVM went into GC hell eating tons of CPU and never returning. +* Provides a "abortAfterMs" parameter to the parse method that will throw a TimeoutException if too much time is taken. This will result in the forked JVM to be aborted. This is useful in the situations where the JVM went into GC hell eating tons of CPU and never returning. ## Usage From 2da5283bf599700982d022632eff1303f16b01f4 Mon Sep 17 00:00:00 2001 From: Nicholas DiPiazza Date: Wed, 29 May 2019 09:21:04 -0500 Subject: [PATCH 3/3] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8d72b68..b9a19d6 100644 --- a/README.md +++ b/README.md @@ -17,4 +17,4 @@ This program attempts to deal with these problems: ## Usage -See the [Tika Fork Process Unit Test](fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program. +See the [Tika Fork Process Unit Test](tika-fork/src/test/java/org/apache/tika/fork/TikaProcessTest.java) for several detailed examples of how to use the program.