Skip to content

Clarification Request: Discrepancy between Controller formulation (Dot Product) and implementation (Actor Head MLP) #2

@buctzzp

Description

@buctzzp

First of all, thank you for this fantastic and forward-looking work! The idea of decoupling memory operations into a self-evolving skill bank is highly inspiring for the community's research on long-term agent memory mechanisms.

I am currently doing a deep dive into the codebase to fully grasp the architectural choices of the PPOController. While reviewing the implementation, I noticed a structural discrepancy between the mathematical formulation in the paper and the actual PyTorch implementation regarding the scoring mechanism.

1. The Formulation in the Paper

In Section 3.3.1 , the logits for skill selection are described conceptually as a dot product between the state embedding and the skill embedding:

$$z_{t,i} = h_t^\top u_i$$

This implies a standard dense retrieval paradigm where the skill representation $u_i$ is static and the match is based on geometric similarity.

2. The Implementation in the Code

However, in src/controller.py (PPOController), the architecture appears to be a dual-encoder with a cross-encoder interaction layer:

  • Both the state and the operations pass through separate trainable MLPs (state_net and op_net).

  • Instead of a dot product, the state and operation representations are concatenated and passed through a third MLP (actor_head) to output the scalar logit:

    $$z_{t,i} = \text{MLP}_{\text{actor}}([h_t \parallel o_i])$$

My Questions:

Design Philosophy: Was the dot product $h_t^\top u_i$ in the paper intended as a theoretical simplification?

Clarifying this would be incredibly helpful for researchers (like myself) who are looking to build upon or evaluate your architectural design choices.

Thank you so much for your time and for open-sourcing this excellent project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions