Question about the input queries of dual decoders

Why is the initial control point queries the sum of shared control point query embedding and coarse bounding box coordinate embedding, while the initial character queries are the sum of shared character query embedding and 1D positional encoding? What I mean is, if coarse bounding box coordinate embedding is useful, why not add it to initial character queries? The same confusion also exists for 1D positional encoding. If the coarse bounding box coordinate embedding and 1D positional encoding are removed at the same time, will the performance of the model decrease? Looking forward to your reply, thanks！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the input queries of dual decoders #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the input queries of dual decoders #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions