Skip to content

How to handle input pdb file with [UNK] token ? #3

@Eikor

Description

@Eikor

Dear researchers,

Thank you for your amazing work!

I have encountered a small problem when running on my custom dataset.
The PDB file is as follows:

ATOM     15  CB  SER A   4      50.214  -9.038  18.043  1.00  0.00           C  
ATOM     16  OG  SER A   4      49.838 -10.302  18.781  1.00  0.00           O  
ATOM     17  N   UNK A   5      49.409  -6.273  19.161  1.00  0.00           N  
ATOM     18  CA  UNK A   5      49.233  -5.330  20.255  1.00  0.00           C  
ATOM     19  C   UNK A   5      48.099  -4.517  19.988  1.00  0.00           C  
ATOM     20  O   UNK A   5      47.425  -4.219  20.898  1.00  0.00           O  
ATOM     21  CB  UNK A   5      50.344  -4.350  20.407  1.00  0.00           C  
ATOM     22  CG  UNK A   5      49.845  -3.031  21.133  1.00  0.00           C  
ATOM     23  N   ASP A   6      48.053  -4.242  18.702  1.00  0.00           N  
ATOM     24  CA  ASP A   6      47.037  -3.448  18.051  1.00  0.00           C  

There is a [UNK] residue type in my PDB file, and this residue will be encoded as id 20. However, this will trigger an error of ResidueTypeSeqFeat.

How should I process the input if there are unknown residues in my PDB?
Am I supposed to set the unknown residue to a padding token?

Thanks!

Best,
JD

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions