Skip to content

Question about the input queries of dual decoders #20

@zisang0210

Description

@zisang0210

Why is the initial control point queries the sum of shared control point query embedding and coarse bounding box coordinate embedding, while the initial character queries are the sum of shared character query embedding and 1D positional encoding? What I mean is, if coarse bounding box coordinate embedding is useful, why not add it to initial character queries? The same confusion also exists for 1D positional encoding. If the coarse bounding box coordinate embedding and 1D positional encoding are removed at the same time, will the performance of the model decrease? Looking forward to your reply, thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions