Skip to content

Commit

Permalink
feat: add question flag to ask questions about papers (#76)
Browse files Browse the repository at this point in the history
* feat: add question flag to ask questions about papers

* test: add llm service basic test

* feat: add custom map-reduce prompts

* feat: update map reduce promots

* feat: include question in the filename
  • Loading branch information
rpidanny authored Jul 4, 2024
1 parent 9e2607b commit 73afa49
Show file tree
Hide file tree
Showing 14 changed files with 323 additions and 33 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ coverage
*.html
*.csv

data/
data/
darwin-data/
10 changes: 6 additions & 4 deletions jest.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,15 @@ const config: JestConfigWithTsJest = {
'<rootDir>/coverage/',
'<rootDir>/src/commands/',
'\\.config\\.ts$',
'<rootDir>/src/services/chat/autonomous-agent.ts',
'<rootDir>/src/utils/ui/output.ts',
],
coverageThreshold: {
global: {
statements: 81,
branches: 90,
functions: 90,
lines: 81,
statements: 96,
branches: 91,
functions: 94,
lines: 96,
},
},
}
Expand Down
5 changes: 4 additions & 1 deletion src/commands/search/accession.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import headlessFlag from '../../inputs/flags/headless.flag.js'
import legacyFlag from '../../inputs/flags/legacy.flag.js'
import llmProviderFlag from '../../inputs/flags/llm-provider.flag.js'
import outputFlag from '../../inputs/flags/output.flag.js'
import questionFlag from '../../inputs/flags/question.flag.js'
import skipCaptchaFlag from '../../inputs/flags/skip-captcha.flag.js'
import summaryFlag from '../../inputs/flags/summary.flag.js'
import { PaperSearchService } from '../../services/search/paper-search.service.js'
Expand Down Expand Up @@ -47,6 +48,7 @@ export default class SearchAccession extends BaseCommand<typeof SearchAccession>
headless: headlessFlag,
summary: summaryFlag,
llm: llmProviderFlag,
question: questionFlag,
}

async init(): Promise<void> {
Expand Down Expand Up @@ -86,7 +88,7 @@ export default class SearchAccession extends BaseCommand<typeof SearchAccession>
}

public async run(): Promise<void> {
const { count, output, 'accession-number-regex': filterPattern, summary } = this.flags
const { count, output, 'accession-number-regex': filterPattern, summary, question } = this.flags
const { keywords } = this.args

this.logger.info(`Searching papers with Accession Numbers (${filterPattern}) for: ${keywords}`)
Expand All @@ -96,6 +98,7 @@ export default class SearchAccession extends BaseCommand<typeof SearchAccession>
minItemCount: count,
filterPattern,
summarize: summary,
question,
})

this.logger.info(`Exported papers list to: ${outputPath}`)
Expand Down
7 changes: 6 additions & 1 deletion src/commands/search/papers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import headlessFlag from '../../inputs/flags/headless.flag.js'
import legacyFlag from '../../inputs/flags/legacy.flag.js'
import llmProviderFlag from '../../inputs/flags/llm-provider.flag.js'
import outputFlag from '../../inputs/flags/output.flag.js'
import questionFlag from '../../inputs/flags/question.flag.js'
import skipCaptchaFlag from '../../inputs/flags/skip-captcha.flag.js'
import summaryFlag from '../../inputs/flags/summary.flag.js'
import { PaperSearchService } from '../../services/search/paper-search.service.js'
Expand Down Expand Up @@ -41,6 +42,7 @@ export default class SearchPapers extends BaseCommand<typeof SearchPapers> {
headless: headlessFlag,
summary: summaryFlag,
llm: llmProviderFlag,
question: questionFlag,
}

async init(): Promise<void> {
Expand All @@ -50,6 +52,7 @@ export default class SearchPapers extends BaseCommand<typeof SearchPapers> {
headless,
concurrency,
summary,
question,
llm: llmProvider,
'skip-captcha': skipCaptcha,
legacy,
Expand All @@ -60,6 +63,7 @@ export default class SearchPapers extends BaseCommand<typeof SearchPapers> {
headless,
concurrency,
summary,
question,
llmProvider,
skipCaptcha,
legacy,
Expand All @@ -80,7 +84,7 @@ export default class SearchPapers extends BaseCommand<typeof SearchPapers> {
}

public async run(): Promise<void> {
const { count, output, filter, summary } = this.flags
const { count, output, filter, summary, question } = this.flags
const { keywords } = this.args

this.logger.info(`Searching papers for: ${keywords}`)
Expand All @@ -90,6 +94,7 @@ export default class SearchPapers extends BaseCommand<typeof SearchPapers> {
minItemCount: count,
filterPattern: filter,
summarize: summary,
question,
})

this.logger.info(`Exported papers list to: ${outputFile}`)
Expand Down
5 changes: 3 additions & 2 deletions src/containers/search.container.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,22 @@ export function initSearchContainer(
headless: boolean
concurrency: number
summary: boolean
question?: string
llmProvider: LLMProvider
skipCaptcha: boolean
legacy: boolean
},
config: TConfig,
logger: Quill,
) {
const { headless, concurrency, summary, llmProvider, skipCaptcha, legacy } = opts
const { headless, concurrency, summary, llmProvider, skipCaptcha, legacy, question } = opts

Container.set(
Odysseus,
new Odysseus({ headless, waitOnCaptcha: true, initHtml: getInitPageContent() }),
)
Container.set(Quill, logger)
Container.set(PaperSearchConfig, { concurrency: summary ? 1 : concurrency })
Container.set(PaperSearchConfig, { concurrency: summary || question != null ? 1 : concurrency })
Container.set(PaperServiceConfig, {
skipCaptcha,
legacyProcessing: legacy,
Expand Down
1 change: 1 addition & 0 deletions src/inputs/flags/char.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ export enum FlagChar {
Headless = 'h',
IncludeSummary = 'S',
LogLevel = 'l',
Question = 'q',
}
9 changes: 9 additions & 0 deletions src/inputs/flags/question.flag.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import * as oclif from '@oclif/core'

import { FlagChar } from './char.js'

export default oclif.Flags.string({
char: FlagChar.Question,
helpValue: 'STRING',
summary: 'The question to ask the language model about the text content.',
})
71 changes: 71 additions & 0 deletions src/services/llm/llm.service.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import { jest } from '@jest/globals'
import { BaseLanguageModel } from '@langchain/core/language_models/base'
import { mock } from 'jest-mock-extended'

import { LLMService } from './llm.service'

describe('LLMService', () => {
const mockBaseLanguageModel = mock<BaseLanguageModel>()

let llmService: LLMService

beforeEach(() => {
llmService = new LLMService(
mock<BaseLanguageModel>({
pipe: () => mockBaseLanguageModel,
}),
)
})

afterEach(() => {
jest.clearAllMocks()
jest.resetAllMocks()
})

describe('summarize', () => {
it('should call llm once for short text', async () => {
const inputText = 'input text'
mockBaseLanguageModel.invoke.mockResolvedValue('summary')

await expect(llmService.summarize(inputText)).resolves.toEqual('summary')

expect(mockBaseLanguageModel.invoke).toHaveBeenCalledTimes(1)
})

it('should call llm n times for longer text', async () => {
const inputText = 'input text'.repeat(10_000)
mockBaseLanguageModel.invoke.mockResolvedValue('summary')

await expect(llmService.summarize(inputText)).resolves.toEqual('summary')

// 2 calls for each chunk and 1 call for final summary
expect(mockBaseLanguageModel.invoke).toHaveBeenCalledTimes(3)
})
})

describe('ask', () => {
it('should call llm once', async () => {
const inputText = 'input text'
const question = 'question'

mockBaseLanguageModel.invoke.mockResolvedValue('answer')
mockBaseLanguageModel.getNumTokens.mockResolvedValue(3)

await expect(llmService.ask(inputText, question)).resolves.toEqual('answer')

expect(mockBaseLanguageModel.invoke).toHaveBeenCalledTimes(11)
})

it('should call llm n times for longer text', async () => {
const inputText = 'input text'.repeat(10_000)
const question = 'question'

mockBaseLanguageModel.invoke.mockResolvedValue('answer')
mockBaseLanguageModel.getNumTokens.mockResolvedValue(3)

await expect(llmService.ask(inputText, question)).resolves.toEqual('answer')

expect(mockBaseLanguageModel.invoke).toHaveBeenCalledTimes(31)
})
})
})
60 changes: 58 additions & 2 deletions src/services/llm/llm.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import { Quill } from '@rpidanny/quill'
import chalk from 'chalk'
import { Presets, SingleBar } from 'cli-progress'
import {
loadQAMapReduceChain,
loadSummarizationChain,
MapReduceDocumentsChain,
RefineDocumentsChain,
Expand All @@ -12,11 +13,14 @@ import {
import { TokenTextSplitter } from 'langchain/text_splitter'
import { Service } from 'typedi'

import { MAP_PROMPT, REDUCE_PROMPT } from './prompt-templates/map-reduce.template.js'
import { SUMMARY_PROMPT, SUMMARY_REFINE_PROMPT } from './prompt-templates/summary.template.js'

@Service()
export class LLMService {
summarizeChain!: RefineDocumentsChain | MapReduceDocumentsChain | StuffDocumentsChain
qaChain!: RefineDocumentsChain | MapReduceDocumentsChain | StuffDocumentsChain

textSplitter!: TokenTextSplitter

constructor(
Expand All @@ -34,12 +38,18 @@ export class LLMService {
questionPrompt: SUMMARY_PROMPT,
refinePrompt: SUMMARY_REFINE_PROMPT,
})

this.qaChain = loadQAMapReduceChain(llm, {
verbose: false,
combineMapPrompt: MAP_PROMPT,
combinePrompt: REDUCE_PROMPT,
})
}

public async summarize(inputText: string) {
const bar = new SingleBar(
{
clearOnComplete: false,
clearOnComplete: true,
hideCursor: true,
format: `${chalk.magenta('Summarizing')} [{bar}] {percentage}% | ETA: {eta}s | {value}/{total}`,
},
Expand All @@ -52,7 +62,7 @@ export class LLMService {
const docChunks = await this.textSplitter.splitDocuments([document])

this.logger?.info(
`Summarizing ${inputText.length} char (${docChunks.length} chunks) document...`,
`Summarizing document with ${inputText.length} chars (${docChunks.length} chunks)`,
)

bar.start(docChunks.length, 0)
Expand All @@ -79,4 +89,50 @@ export class LLMService {

return resp.output_text
}

public async ask(inputText: string, question: string): Promise<string> {
const bar = new SingleBar(
{
clearOnComplete: true,
hideCursor: true,
format: `${chalk.magenta('Querying')} [{bar}] {percentage}% | ETA: {eta}s | {value}/{total}`,
},
Presets.shades_classic,
)

const document = new Document({
pageContent: inputText,
})
const docChunks = await this.textSplitter.splitDocuments([document])

this.logger?.info(
`Querying "${question}" on document with ${inputText.length} chars (${docChunks.length} chunks)`,
)

// n map + 1 reduce
bar.start(docChunks.length + 1, 0)

let docCount = 0

const resp = await this.qaChain.invoke(
{
// eslint-disable-next-line camelcase
input_documents: docChunks,
question,
},
{
callbacks: [
{
handleLLMEnd: async () => {
bar.update(++docCount)
},
},
],
},
)

bar.stop()

return resp.text
}
}
58 changes: 58 additions & 0 deletions src/services/llm/prompt-templates/map-reduce.template.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import { PromptTemplate } from '@langchain/core/prompts'

export const MAP_TEMPLATE = `
Examine the following document excerpt to identify any text that directly answers the question.
Return the relevant text verbatim. If no relevant text is found, return nothing.
Document:
\`\`\`txt
{context}
\`\`\`
QUESTION: {question}
RELEVANT TEXT:`

export const REDUCE_TEMPLATE = `
Given the extracted text from a document and a question, provide a final answer. If you don't know the answer, state that you don't know. Do not fabricate information.
EXAMPLES:
\`\`\`txt
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an injunction or other relief to protect its Intellectual Property Rights.
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other) right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as defined in Clause 8.5) or that such a violation is reasonably likely to occur,
=========
FINAL ANSWER: This Agreement is governed by English law.
QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Content: And we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease. \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans. \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Content: And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards. \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more. \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
=========
FINAL ANSWER: The president did not mention Michael Jackson.
\`\`\`
Use the extracted content below to formulate your answer.
QUESTION: {question}
CONTENT:
=========
{summaries}
=========
FINAL ANSWER:`

export const MAP_PROMPT = PromptTemplate.fromTemplate(MAP_TEMPLATE)
export const REDUCE_PROMPT = PromptTemplate.fromTemplate(REDUCE_TEMPLATE)
Loading

0 comments on commit 73afa49

Please sign in to comment.