chore(release): 1.33.0

## [1.33.0](v1.32.1...v1.33.0) (2024-07-05) ### Features * add map-reduce summary ([#80](#80)) ([21e1209](21e1209))
rpidanny · Jul 5, 2024 · 08d9362 · 08d9362
1 parent 21e1209
commit 08d9362
Show file tree

Hide file tree

Showing 10 changed files with 66 additions and 44 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,9 @@
+## [1.33.0](https://github.com/rpidanny/darwin/compare/v1.32.1...v1.33.0) (2024-07-05)
+
+### Features
+
+* add map-reduce summary ([#80](https://github.com/rpidanny/darwin/issues/80)) ([21e1209](https://github.com/rpidanny/darwin/commit/21e1209b71c295ad4b26c5622d3586f05430141c))
+
 ## [1.32.1](https://github.com/rpidanny/darwin/compare/v1.32.0...v1.32.1) (2024-07-05)
 
 ### Bug Fixes

diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ $ npm install -g @rpidanny/darwin
 $ darwin COMMAND
 running command...
 $ darwin (--version)
-@rpidanny/darwin/1.32.1 linux-x64 node-v20.15.0
+@rpidanny/darwin/1.33.0 linux-x64 node-v20.15.0
 $ darwin --help [COMMAND]
 USAGE
   $ darwin COMMAND

diff --git a/docs/chat.md b/docs/chat.md
@@ -27,4 +27,4 @@ EXAMPLES
   $ darwin chat
 ```
 
-_See code: [src/commands/chat/index.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/chat/index.ts)_
+_See code: [src/commands/chat/index.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/chat/index.ts)_
diff --git a/docs/config.md b/docs/config.md
@@ -24,7 +24,7 @@ EXAMPLES
   $ darwin config get
 ```
 
-_See code: [src/commands/config/get.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/config/get.ts)_
+_See code: [src/commands/config/get.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/config/get.ts)_
 
 ## `darwin config set`
 
@@ -44,4 +44,4 @@ EXAMPLES
   $ darwin config set
 ```
 
-_See code: [src/commands/config/set.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/config/set.ts)_
+_See code: [src/commands/config/set.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/config/set.ts)_
diff --git a/docs/download.md b/docs/download.md
@@ -31,4 +31,4 @@ EXAMPLES
   $ darwin download papers "crispr cas9" --output papers/ --count 100 --log-level debug
 ```
 
-_See code: [src/commands/download/papers.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/download/papers.ts)_
+_See code: [src/commands/download/papers.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/download/papers.ts)_
diff --git a/docs/faq.md b/docs/faq.md
@@ -9,6 +9,9 @@
 - [How do I configure Darwin for paper summarization?](#how-do-i-configure-darwin-for-paper-summarization)
   - [Using OpenAI's API](#using-openais-api)
   - [Using a Local LLM](#using-a-local-llm)
+- [What are the different methods of summarization?](#what-are-the-different-methods-of-summarization)
+  - [Map Reduce Method](#map-reduce-method)
+  - [Refine Method](#refine-method)
 
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
 

diff --git a/docs/search.md b/docs/search.md
@@ -13,28 +13,30 @@ Search and export papers containing accession numbers to a CSV file.
 ```
 USAGE
   $ darwin search accession KEYWORDS [-l TRACE|DEBUG|INFO|WARN|ERROR|FATAL] [-c NUMBER] [-p NUMBER] [-o PATH] [-a
-    REGEX] [-s] [--legacy] [-h] [-S] [--llm openai|ollama] [-q STRING]
+    REGEX] [-s] [--legacy] [-h] [-S] [--summary-method refine|map_reduce] [--llm openai|ollama] [-q STRING]
 
 ARGUMENTS
   KEYWORDS  The keywords to search for. (Example: "crispr cas9")
 
 FLAGS
-  -S, --summary                       Include summaries in the output CSV (requires LLM, sets concurrency to 1)
-  -a, --accession-number-regex=REGEX  [default: PRJNA\d+] Regex to match accession numbers. Defaults to matching
-                                      BioProject accession numbers.
-  -c, --count=NUMBER                  [default: 10] Minimum number of papers to search for. Actual number may be
-                                      slightly higher with concurrency.
-  -h, --headless                      Run the browser in headless mode (no UI).
-  -o, --output=PATH                   [default: .] Destination for the CSV file. Specify folder path for auto-generated
-                                      filename or file path for direct use.
-  -p, --concurrency=NUMBER            [default: 10] The number papers to process in parallel.
-  -q, --question=STRING               The question to ask the language model about the text content. (requires LLM, sets
-                                      concurrency to 1)
-  -s, --skip-captcha                  Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be solved.
-      --legacy                        Enable legacy processing which extracts text only from the main URL. The new
-                                      method attempts to extract text from the source URLs (pdf or html) and falls back
-                                      to the main URL.
-      --llm=openai|ollama             [default: ollama] The LLM provider to use for generating summaries.
+  -S, --summary                           Include summaries in the output CSV (requires LLM, sets concurrency to 1)
+  -a, --accession-number-regex=REGEX      [default: PRJNA\d+] Regex to match accession numbers. Defaults to matching
+                                          BioProject accession numbers.
+  -c, --count=NUMBER                      [default: 10] Minimum number of papers to search for. Actual number may be
+                                          slightly higher with concurrency.
+  -h, --headless                          Run the browser in headless mode (no UI).
+  -o, --output=PATH                       [default: .] Destination for the CSV file. Specify folder path for
+                                          auto-generated filename or file path for direct use.
+  -p, --concurrency=NUMBER                [default: 10] The number papers to process in parallel.
+  -q, --question=STRING                   The question to ask the language model about the text content. (requires LLM,
+                                          sets concurrency to 1)
+  -s, --skip-captcha                      Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be
+                                          solved.
+      --legacy                            Enable legacy processing which extracts text only from the main URL. The new
+                                          method attempts to extract text from the source URLs (pdf or html) and falls
+                                          back to the main URL.
+      --llm=openai|ollama                 [default: ollama] The LLM provider to use for generating summaries.
+      --summary-method=refine|map_reduce  [default: map_reduce] Selects the method used to generate summaries.
 
 GLOBAL FLAGS
   -l, --log-level=TRACE|DEBUG|INFO|WARN|ERROR|FATAL  [default: INFO] Specify logging level.
@@ -54,9 +56,13 @@ FLAG DESCRIPTIONS
     The question to ask the language model about the text content. (requires LLM, sets concurrency to 1)
 
     Questions are answered using LLM. Ensure LLMs are configured by running `darwin config set`.
+
+  --summary-method=refine|map_reduce  Selects the method used to generate summaries.
+
+    Refer to the FAQ for details on each method.
 ```
 
-_See code: [src/commands/search/accession.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/search/accession.ts)_
+_See code: [src/commands/search/accession.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/search/accession.ts)_
 
 ## `darwin search papers KEYWORDS`
 
@@ -65,27 +71,30 @@ Searches and exports research papers based on keywords to a CSV file.
 ```
 USAGE
   $ darwin search papers KEYWORDS [-l TRACE|DEBUG|INFO|WARN|ERROR|FATAL] [-c NUMBER] [-p NUMBER] [-o PATH] [-f
-    REGEX] [-s] [--legacy] [-h] [-S] [--llm openai|ollama] [-q STRING]
+    REGEX] [-s] [--legacy] [-h] [-S] [--summary-method refine|map_reduce] [--llm openai|ollama] [-q STRING]
 
 ARGUMENTS
   KEYWORDS  The keywords to search for. (Example: "crispr cas9")
 
 FLAGS
-  -S, --summary             Include summaries in the output CSV (requires LLM, sets concurrency to 1)
-  -c, --count=NUMBER        [default: 10] Minimum number of papers to search for. Actual number may be slightly higher
-                            with concurrency.
-  -f, --filter=REGEX        Case-insensitive regex to filter papers by content. (Example:
-                            "Colidextribacter|Caproiciproducens")
-  -h, --headless            Run the browser in headless mode (no UI).
-  -o, --output=PATH         [default: .] Destination for the CSV file. Specify folder path for auto-generated filename
-                            or file path for direct use.
-  -p, --concurrency=NUMBER  [default: 10] The number papers to process in parallel.
-  -q, --question=STRING     The question to ask the language model about the text content. (requires LLM, sets
-                            concurrency to 1)
-  -s, --skip-captcha        Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be solved.
-      --legacy              Enable legacy processing which extracts text only from the main URL. The new method attempts
-                            to extract text from the source URLs (pdf or html) and falls back to the main URL.
-      --llm=openai|ollama   [default: ollama] The LLM provider to use for generating summaries.
+  -S, --summary                           Include summaries in the output CSV (requires LLM, sets concurrency to 1)
+  -c, --count=NUMBER                      [default: 10] Minimum number of papers to search for. Actual number may be
+                                          slightly higher with concurrency.
+  -f, --filter=REGEX                      Case-insensitive regex to filter papers by content. (Example:
+                                          "Colidextribacter|Caproiciproducens")
+  -h, --headless                          Run the browser in headless mode (no UI).
+  -o, --output=PATH                       [default: .] Destination for the CSV file. Specify folder path for
+                                          auto-generated filename or file path for direct use.
+  -p, --concurrency=NUMBER                [default: 10] The number papers to process in parallel.
+  -q, --question=STRING                   The question to ask the language model about the text content. (requires LLM,
+                                          sets concurrency to 1)
+  -s, --skip-captcha                      Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be
+                                          solved.
+      --legacy                            Enable legacy processing which extracts text only from the main URL. The new
+                                          method attempts to extract text from the source URLs (pdf or html) and falls
+                                          back to the main URL.
+      --llm=openai|ollama                 [default: ollama] The LLM provider to use for generating summaries.
+      --summary-method=refine|map_reduce  [default: map_reduce] Selects the method used to generate summaries.
 
 GLOBAL FLAGS
   -l, --log-level=TRACE|DEBUG|INFO|WARN|ERROR|FATAL  [default: INFO] Specify logging level.
@@ -107,6 +116,10 @@ FLAG DESCRIPTIONS
     The question to ask the language model about the text content. (requires LLM, sets concurrency to 1)
 
     Questions are answered using LLM. Ensure LLMs are configured by running `darwin config set`.
+
+  --summary-method=refine|map_reduce  Selects the method used to generate summaries.
+
+    Refer to the FAQ for details on each method.
 ```
 
-_See code: [src/commands/search/papers.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/search/papers.ts)_
+_See code: [src/commands/search/papers.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/search/papers.ts)_
diff --git a/docs/update.md b/docs/update.md
@@ -23,4 +23,4 @@ EXAMPLES
   $ darwin update
 ```
 
-_See code: [src/commands/update/index.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/update/index.ts)_
+_See code: [src/commands/update/index.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/update/index.ts)_
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -1,7 +1,7 @@
 {
   "name": "@rpidanny/darwin",
   "description": "An elegant CLI wizard enhancing biotech research efficiency, with adaptable features for other domains, albeit with minor constraints.",
-  "version": "1.32.1",
+  "version": "1.33.0",
   "author": "Abhishek <[email protected]>",
   "bin": {
     "darwin": "./bin/run.js"