-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft, Feedback Needed] Memory in AgentChat #4438
base: main
Are you sure you want to change the base?
Conversation
python/packages/autogen-agentchat/src/autogen_agentchat/memory/_base_memory.py
Outdated
Show resolved
Hide resolved
|
||
async def query( | ||
self, | ||
query: Union[str, Image, List[Union[str, Image]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query also could benefit of mimetypes
if not isinstance(last_message, TextMessage) and not isinstance(last_message, MultiModalMessage): | ||
raise ValueError( | ||
"Memory query failed: Last message must be a text message or multimodal message.") | ||
results: List[MemoryQueryResult] = await self._memory.query(messages[-1].content, cancellation_token=cancellation_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is saying that the default memory implementation is RAG with the last message.
I feel that using the last message for the RAG is reasonable but restrictive, could we just pass messages and have the memory query decide?
A high level comment that I'm not sure whether to call this memory or datastore/database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@husseinmozannar , I agree about passing the entire context and let the query method decide.
query: ContentItem | List[ContentItem]
I am flexible on naming.
I like Memory because it connotes "just in time" retrieval/recall of content relevant to a step (in a task) an agent is about to take. Memory also gives a sense of what is stored inside the memory - in this case it really should be content relevant to task completion (not just anything that can be in a database).
I agree with @husseinmozannar that this is a reasonable and very clean implementation of this idea -- if perhaps a little restrictive. I like the idea of passing the entire context (or perhaps even state!) to the query engine. It's also worth thinking if we can somehow parameterize how the memory is added to the context at the time of the inference. E.g., this implementation adds memory right after the system prompt, and without any explanation or preamble. Other implementations are also reasonable. For example you could introduce memory with something like: "As you work through the user's request, the following snippets may, or may not, be helpful:" You could decide to include memory as the second-to-last message, or the last message (rather than the second). In AutoGen 0.2, we had the idea of context transformers. I wonder if something similar could work here. |
I second @afourney and @husseinmozannar's suggestion. I think the How about let the memory protocol provide a There is a |
Chatted with @victordibia API is nice and clean and I agree with its usefulness. It would be useful to have following somewhere in the repo but not in the base protocol
|
Since
Adding my 2 cents here. I think would be interesting to have a lower-level abstraction for storage and information types which I think it could be useful to think how memory uses storage and have chroma as a storage implementation that some |
@lspinheiro ,
Good idea, can you propose some concrete examples?
I think I understand your comment here ie. that VectorEmbeddingMemory is a general enough case that we should explore some standardized implemetation that enables easily switching out various standard dbs. One thing to note is that the apis for this DBs are so different that there will still be quite a bit of code written specifically for each. That being said, perhaps we can get the base Memory protocol done in this PR and then open an new issue for designing something for |
Agree with @victordibia here, let's focus on the memory protocol first before worrying about the implementation level stuff. Furthermore, I would argue that we should be careful not to introduce too many abstractions.
We should take a look at Semantic Kernel's vector memory abstraction and consider adopt that or duck type it. |
Memory for AgentChat Agents
It would be useful to have some notion of memory, and the ability to attach memory to an agent.
Right now the
AssitantAgent
can take on tools.Some use cases often benefit from being able to retrieve memory just in time, add to the prompt before responding (RAG etc).
This PR implements
Memory Behaviour
Memory protocol that devs can overload.
Integrating with AssistantAgent
Perhaps a big change with this PR is how AssistantAgent is extended to use memory.
on_messages_stream
(if TextMessage, or MultiModalMessage), returned result is appended tomodel_context
AssistantAgent
impl above focuses onmemory.query
and adds that JIT to the agent context. It does not concern itself much with how stuff is added to memory - reason being that his can be heavily usecase driven. It is expected that the developer will runmemory.add
outside of agent logic .Memory
protocol.Example Implementation
Example notebook highlighting these.
---------- user ---------- Tell me the most important thing about Tokyo. ---------- travel_agent ---------- One of the most important aspects of Tokyo is that it has the world's busiest railway station, Shinjuku Station. This station serves as a major hub for transportation, with millions of commuters and travelers passing through its complex network of train lines each day. It highlights Tokyo's status as a bustling metropolis with an advanced public transportation system. [Prompt tokens: 72, Completion tokens: 66] ---------- Summary ---------- Number of messages: 2 Finish reason: Maximum number of messages 2 reached, current message count: 2 Total prompt tokens: 72 Total completion tokens: 66 Duration: 1.47 seconds
Related issue number
Closes #4039, #4648
TBD
Checks
Open Questions
memory = [chroma_memory]
ormemory = chroma_memory
. E.g, should an agent had the opportunity to "dip" into several memory banks?