diff --git a/CHANGELOG.md b/CHANGELOG.md index 3e66df7..17016f9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,9 @@ +## [1.33.0](https://github.com/rpidanny/darwin/compare/v1.32.1...v1.33.0) (2024-07-05) + +### Features + +* add map-reduce summary ([#80](https://github.com/rpidanny/darwin/issues/80)) ([21e1209](https://github.com/rpidanny/darwin/commit/21e1209b71c295ad4b26c5622d3586f05430141c)) + ## [1.32.1](https://github.com/rpidanny/darwin/compare/v1.32.0...v1.32.1) (2024-07-05) ### Bug Fixes diff --git a/README.md b/README.md index f9324c9..0641c88 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ $ npm install -g @rpidanny/darwin $ darwin COMMAND running command... $ darwin (--version) -@rpidanny/darwin/1.32.1 linux-x64 node-v20.15.0 +@rpidanny/darwin/1.33.0 linux-x64 node-v20.15.0 $ darwin --help [COMMAND] USAGE $ darwin COMMAND diff --git a/docs/chat.md b/docs/chat.md index 732b4be..385cb42 100644 --- a/docs/chat.md +++ b/docs/chat.md @@ -27,4 +27,4 @@ EXAMPLES $ darwin chat ``` -_See code: [src/commands/chat/index.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/chat/index.ts)_ +_See code: [src/commands/chat/index.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/chat/index.ts)_ diff --git a/docs/config.md b/docs/config.md index 0550871..fd55cad 100644 --- a/docs/config.md +++ b/docs/config.md @@ -24,7 +24,7 @@ EXAMPLES $ darwin config get ``` -_See code: [src/commands/config/get.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/config/get.ts)_ +_See code: [src/commands/config/get.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/config/get.ts)_ ## `darwin config set` @@ -44,4 +44,4 @@ EXAMPLES $ darwin config set ``` -_See code: [src/commands/config/set.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/config/set.ts)_ +_See code: [src/commands/config/set.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/config/set.ts)_ diff --git a/docs/download.md b/docs/download.md index d1b1fbe..fd59ec0 100644 --- a/docs/download.md +++ b/docs/download.md @@ -31,4 +31,4 @@ EXAMPLES $ darwin download papers "crispr cas9" --output papers/ --count 100 --log-level debug ``` -_See code: [src/commands/download/papers.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/download/papers.ts)_ +_See code: [src/commands/download/papers.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/download/papers.ts)_ diff --git a/docs/faq.md b/docs/faq.md index 2226eab..c653d4a 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -9,6 +9,9 @@ - [How do I configure Darwin for paper summarization?](#how-do-i-configure-darwin-for-paper-summarization) - [Using OpenAI's API](#using-openais-api) - [Using a Local LLM](#using-a-local-llm) +- [What are the different methods of summarization?](#what-are-the-different-methods-of-summarization) + - [Map Reduce Method](#map-reduce-method) + - [Refine Method](#refine-method) diff --git a/docs/search.md b/docs/search.md index 00f1a2f..d83cd4b 100644 --- a/docs/search.md +++ b/docs/search.md @@ -13,28 +13,30 @@ Search and export papers containing accession numbers to a CSV file. ``` USAGE $ darwin search accession KEYWORDS [-l TRACE|DEBUG|INFO|WARN|ERROR|FATAL] [-c NUMBER] [-p NUMBER] [-o PATH] [-a - REGEX] [-s] [--legacy] [-h] [-S] [--llm openai|ollama] [-q STRING] + REGEX] [-s] [--legacy] [-h] [-S] [--summary-method refine|map_reduce] [--llm openai|ollama] [-q STRING] ARGUMENTS KEYWORDS The keywords to search for. (Example: "crispr cas9") FLAGS - -S, --summary Include summaries in the output CSV (requires LLM, sets concurrency to 1) - -a, --accession-number-regex=REGEX [default: PRJNA\d+] Regex to match accession numbers. Defaults to matching - BioProject accession numbers. - -c, --count=NUMBER [default: 10] Minimum number of papers to search for. Actual number may be - slightly higher with concurrency. - -h, --headless Run the browser in headless mode (no UI). - -o, --output=PATH [default: .] Destination for the CSV file. Specify folder path for auto-generated - filename or file path for direct use. - -p, --concurrency=NUMBER [default: 10] The number papers to process in parallel. - -q, --question=STRING The question to ask the language model about the text content. (requires LLM, sets - concurrency to 1) - -s, --skip-captcha Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be solved. - --legacy Enable legacy processing which extracts text only from the main URL. The new - method attempts to extract text from the source URLs (pdf or html) and falls back - to the main URL. - --llm=openai|ollama [default: ollama] The LLM provider to use for generating summaries. + -S, --summary Include summaries in the output CSV (requires LLM, sets concurrency to 1) + -a, --accession-number-regex=REGEX [default: PRJNA\d+] Regex to match accession numbers. Defaults to matching + BioProject accession numbers. + -c, --count=NUMBER [default: 10] Minimum number of papers to search for. Actual number may be + slightly higher with concurrency. + -h, --headless Run the browser in headless mode (no UI). + -o, --output=PATH [default: .] Destination for the CSV file. Specify folder path for + auto-generated filename or file path for direct use. + -p, --concurrency=NUMBER [default: 10] The number papers to process in parallel. + -q, --question=STRING The question to ask the language model about the text content. (requires LLM, + sets concurrency to 1) + -s, --skip-captcha Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be + solved. + --legacy Enable legacy processing which extracts text only from the main URL. The new + method attempts to extract text from the source URLs (pdf or html) and falls + back to the main URL. + --llm=openai|ollama [default: ollama] The LLM provider to use for generating summaries. + --summary-method=refine|map_reduce [default: map_reduce] Selects the method used to generate summaries. GLOBAL FLAGS -l, --log-level=TRACE|DEBUG|INFO|WARN|ERROR|FATAL [default: INFO] Specify logging level. @@ -54,9 +56,13 @@ FLAG DESCRIPTIONS The question to ask the language model about the text content. (requires LLM, sets concurrency to 1) Questions are answered using LLM. Ensure LLMs are configured by running `darwin config set`. + + --summary-method=refine|map_reduce Selects the method used to generate summaries. + + Refer to the FAQ for details on each method. ``` -_See code: [src/commands/search/accession.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/search/accession.ts)_ +_See code: [src/commands/search/accession.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/search/accession.ts)_ ## `darwin search papers KEYWORDS` @@ -65,27 +71,30 @@ Searches and exports research papers based on keywords to a CSV file. ``` USAGE $ darwin search papers KEYWORDS [-l TRACE|DEBUG|INFO|WARN|ERROR|FATAL] [-c NUMBER] [-p NUMBER] [-o PATH] [-f - REGEX] [-s] [--legacy] [-h] [-S] [--llm openai|ollama] [-q STRING] + REGEX] [-s] [--legacy] [-h] [-S] [--summary-method refine|map_reduce] [--llm openai|ollama] [-q STRING] ARGUMENTS KEYWORDS The keywords to search for. (Example: "crispr cas9") FLAGS - -S, --summary Include summaries in the output CSV (requires LLM, sets concurrency to 1) - -c, --count=NUMBER [default: 10] Minimum number of papers to search for. Actual number may be slightly higher - with concurrency. - -f, --filter=REGEX Case-insensitive regex to filter papers by content. (Example: - "Colidextribacter|Caproiciproducens") - -h, --headless Run the browser in headless mode (no UI). - -o, --output=PATH [default: .] Destination for the CSV file. Specify folder path for auto-generated filename - or file path for direct use. - -p, --concurrency=NUMBER [default: 10] The number papers to process in parallel. - -q, --question=STRING The question to ask the language model about the text content. (requires LLM, sets - concurrency to 1) - -s, --skip-captcha Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be solved. - --legacy Enable legacy processing which extracts text only from the main URL. The new method attempts - to extract text from the source URLs (pdf or html) and falls back to the main URL. - --llm=openai|ollama [default: ollama] The LLM provider to use for generating summaries. + -S, --summary Include summaries in the output CSV (requires LLM, sets concurrency to 1) + -c, --count=NUMBER [default: 10] Minimum number of papers to search for. Actual number may be + slightly higher with concurrency. + -f, --filter=REGEX Case-insensitive regex to filter papers by content. (Example: + "Colidextribacter|Caproiciproducens") + -h, --headless Run the browser in headless mode (no UI). + -o, --output=PATH [default: .] Destination for the CSV file. Specify folder path for + auto-generated filename or file path for direct use. + -p, --concurrency=NUMBER [default: 10] The number papers to process in parallel. + -q, --question=STRING The question to ask the language model about the text content. (requires LLM, + sets concurrency to 1) + -s, --skip-captcha Skip captcha on paper URLs. Note: Google Scholar captcha still needs to be + solved. + --legacy Enable legacy processing which extracts text only from the main URL. The new + method attempts to extract text from the source URLs (pdf or html) and falls + back to the main URL. + --llm=openai|ollama [default: ollama] The LLM provider to use for generating summaries. + --summary-method=refine|map_reduce [default: map_reduce] Selects the method used to generate summaries. GLOBAL FLAGS -l, --log-level=TRACE|DEBUG|INFO|WARN|ERROR|FATAL [default: INFO] Specify logging level. @@ -107,6 +116,10 @@ FLAG DESCRIPTIONS The question to ask the language model about the text content. (requires LLM, sets concurrency to 1) Questions are answered using LLM. Ensure LLMs are configured by running `darwin config set`. + + --summary-method=refine|map_reduce Selects the method used to generate summaries. + + Refer to the FAQ for details on each method. ``` -_See code: [src/commands/search/papers.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/search/papers.ts)_ +_See code: [src/commands/search/papers.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/search/papers.ts)_ diff --git a/docs/update.md b/docs/update.md index 5869f2c..57cdbfe 100644 --- a/docs/update.md +++ b/docs/update.md @@ -23,4 +23,4 @@ EXAMPLES $ darwin update ``` -_See code: [src/commands/update/index.ts](https://github.com/rpidanny/darwin/blob/v1.32.1/src/commands/update/index.ts)_ +_See code: [src/commands/update/index.ts](https://github.com/rpidanny/darwin/blob/v1.33.0/src/commands/update/index.ts)_ diff --git a/package-lock.json b/package-lock.json index 6e5cf3d..7026bce 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@rpidanny/darwin", - "version": "1.32.1", + "version": "1.33.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@rpidanny/darwin", - "version": "1.32.1", + "version": "1.33.0", "license": "MIT", "dependencies": { "@json2csv/node": "^7.0.6", diff --git a/package.json b/package.json index a4452e8..95855e9 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "@rpidanny/darwin", "description": "An elegant CLI wizard enhancing biotech research efficiency, with adaptable features for other domains, albeit with minor constraints.", - "version": "1.32.1", + "version": "1.33.0", "author": "Abhishek ", "bin": { "darwin": "./bin/run.js"