Clarification Request: Discrepancy between Controller formulation (Dot Product) and implementation (Actor Head MLP)

First of all, thank you for this fantastic and forward-looking work! The idea of decoupling memory operations into a self-evolving skill bank is highly inspiring for the community's research on long-term agent memory mechanisms.

I am currently doing a deep dive into the codebase to fully grasp the architectural choices of the `PPOController`. While reviewing the implementation, I noticed a structural discrepancy between the mathematical formulation in the paper and the actual PyTorch implementation regarding the scoring mechanism.

**1. The Formulation in the Paper**

In Section 3.3.1 , the logits for skill selection are described conceptually as a dot product between the state embedding and the skill embedding:

$$z_{t,i} = h_t^\top u_i$$

This implies a standard dense retrieval paradigm where the skill representation $u_i$ is static and the match is based on geometric similarity.

**2. The Implementation in the Code**

However, in `src/controller.py` (`PPOController`), the architecture appears to be a dual-encoder with a cross-encoder interaction layer:

- Both the state and the operations pass through separate trainable MLPs (`state_net` and `op_net`).

- Instead of a dot product, the state and operation representations are concatenated and passed through a third MLP (`actor_head`) to output the scalar logit:

  $$z_{t,i} = \text{MLP}_{\text{actor}}([h_t \parallel o_i])$$

**My Questions:**

 **Design Philosophy:** Was the dot product $h_t^\top u_i$ in the paper intended as a theoretical simplification? 


Clarifying this would be incredibly helpful for researchers (like myself) who are looking to build upon or evaluate your architectural design choices.

Thank you so much for your time and for open-sourcing this excellent project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification Request: Discrepancy between Controller formulation (Dot Product) and implementation (Actor Head MLP) #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification Request: Discrepancy between Controller formulation (Dot Product) and implementation (Actor Head MLP) #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions