Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MetaData LLM call #20

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chavan-arvind
Copy link

Related to #16

Add an optional LLM call for generating tags and summary of the file.

  • app/main.py

    • Add a new endpoint /llm_tags_summary to generate tags and summary using the LLM.
    • Update the OllamaGenerateRequest class to include a new field generate_tags_summary.
    • Update the generate_llama function to handle the new generate_tags_summary field.
  • app/tasks.py

    • Add a new function generate_tags_summary to generate tags and summary using the LLM.
    • Update the ocr_task function to include an optional call to generate_tags_summary after extracting text.
  • client/cli.py

    • Add a new command llm_tags_summary for generating tags and summary.
    • Update the main function to handle the new llm_tags_summary command.
  • .env.example

    • Add a new environment variable LLM_TAGS_SUMMARY_API_URL.

Related to CatchTheTornado#16

Add an optional LLM call for generating tags and summary of the file.

* **app/main.py**
  - Add a new endpoint `/llm_tags_summary` to generate tags and summary using the LLM.
  - Update the `OllamaGenerateRequest` class to include a new field `generate_tags_summary`.
  - Update the `generate_llama` function to handle the new `generate_tags_summary` field.

* **app/tasks.py**
  - Add a new function `generate_tags_summary` to generate tags and summary using the LLM.
  - Update the `ocr_task` function to include an optional call to `generate_tags_summary` after extracting text.

* **client/cli.py**
  - Add a new command `llm_tags_summary` for generating tags and summary.
  - Update the `main` function to handle the new `llm_tags_summary` command.

* **.env.example**
  - Add a new environment variable `LLM_TAGS_SUMMARY_API_URL`.
@@ -116,3 +117,25 @@ async def generate_llama(request: OllamaGenerateRequest):

generated_text = response.get("response", "")
return {"generated_text": generated_text}

@app.post("/llm_tags_summary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name itllm_metadata and use this name instead of tags_summary


return extracted_text

def generate_tags_summary(prompt, model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename it to generate_metadata instead

@@ -59,7 +59,27 @@ def ocr_task(self, pdf_bytes, strategy_name, pdf_hash, ocr_cache, prompt, model)
num_chunk += 1
extracted_text += chunk['response']

self.update_state(state='DONE', meta={'progress': 100 , 'status': 'Processing done!', 'start_time': start_time, 'elapsed_time': time.time() - start_time}) # Example progress update
# Optional call to generate tags and summary
if prompt and model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an option generate_metadata - by default set to true; only when set the metadata is being generated - no material if prompt was given or not

@pkarw
Copy link
Contributor

pkarw commented Nov 5, 2024

Thanks this is cool!

I requested minor changes. When applied and when #10 merged I'll also ask you to extend this feature for the metadata to be used within storage strategies to format file names (to use the tags or other fields within file names)

@pkarw
Copy link
Contributor

pkarw commented Nov 5, 2024

One more thing - please use the defined prompt (can be a env variable with some nice default) to have the prompt used for metadata configurable

It should return JSON object with metadata:

{
 title: "",
 filename_title: "",
 tags: "",
 "summary": ""

Copy link
Contributor

@pkarw pkarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't mess the tags and metadata within the output - it should be used for naming strategy saved within celery task output and in the storage and could be possible to get by the webapi

I can work on these extensions once you fix the other changes I suggested in the PR

Sorry for so many changes - this tasks was simply not yet specified 😅

Endpoint to generate tags and summary using Llama 3.1 model (and other models) via the Ollama API.
"""
print(request)
if not request.prompt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata request is different than the main request with prompt.

Should be executed after the main LLM call on top of its results

It should use metadata prompt defined in env variable for configuration purposes

Then it should always return JSON object - I proposed its structure in the other comment

# Optional call to generate tags and summary
if prompt and model:
tags_summary = generate_tags_summary(prompt, model)
extracted_text += "\n\nTags and Summary:\n" + tags_summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generate_metadata option defined it should be not returned within the general output but stored in the different json filename according to #10, used for file name strategies and stored within other field in the celerey result

We probably need an additional endpoint to get just the metadata stored for specific celery task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants