-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrape reddit comments #10
base: main
Are you sure you want to change the base?
Scrape reddit comments #10
Conversation
Thanks for trying the branch out! The error is noted. I am working on it |
In the meantime, I am planning to revise the prompts and add some instructions along the lines of "prioritise comments with high votes from authors with high karma count". Wanted to know the rationale in having a def summarize(self):
if self.loaded_documents is not None:
num_tokens = self._get_number_of_tokens()
if num_tokens <= 4097:
method = "stuff"
else:
method = "chromadb"
with self.console.status(
"Generating a summary of the loaded tweets ... ⌛ \n",
spinner="aesthetic",
speed=1.5,
spinner_style="red",
):
if method == "stuff":
summary = summarize_tweets(self.loaded_documents)
elif method == "chromadb":
response = self.chain(
{
"question": summarization_question_template,
},
)
summary = response["answer"] |
Good question. in this context, "chroma" and "stuff" indicate summarization methods, not storage methods. In fact, I'm using two summarization methods depending on the number of tokens (that obviously depends on the number of ingested posts/tweets) If the number of tweets/posts is low enough that the number of tokens is lower than 4097 (a limitation of OpenAI) we use a classic summarization method based on the Otherwise, we can use Chromadb as a proxy to summarize the data. (that's when the we can maybe change "chroma" and "stuff" variable names with something more explicit |
Do you need help on this? @syltruong |
Hey @ahmedbesbes sorry for being MIA on this. I still have some changes to push before marking this PR as ready for review. Should be done by eod SGT :) |
No problem, take your time :) and thank you for your help |
@ahmedbesbes this is ready for review |
|
||
{text} | ||
|
||
I want you to provide a short summary and produce three questions that cover the discussed topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the LLM know if the post has a high number of upvotes or if the redditor has a high karma?
I think it's interesting to include this information but I'm not sure the LLMs uses it (the text only is used here).
Unless I'm missing something?
|
||
elif method == PromptMethod.retrievalqa: | ||
ret = """\ | ||
Given the following documents, I want you to provide a short summary and produce three questions that cover the discussed topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, let's replace ret
with template
|
||
def get_summarization_template(self, method: PromptMethod) -> str: | ||
if method == PromptMethod.stuff: | ||
ret = """\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the sake of being explicit: could you replace ret
with template
?
Thank you @syltruong! Just a few comments above. I tried the app and I have some remarks:
|
twitter_agent
moduleDocument
metadata and establish a comment graph (left as an Issue)