How to handle input pdb file with [UNK] token ?

Dear researchers, 

Thank you for your amazing work!

I have encountered a small problem when running on my custom dataset.
The PDB file is as follows:
```
ATOM     15  CB  SER A   4      50.214  -9.038  18.043  1.00  0.00           C  
ATOM     16  OG  SER A   4      49.838 -10.302  18.781  1.00  0.00           O  
ATOM     17  N   UNK A   5      49.409  -6.273  19.161  1.00  0.00           N  
ATOM     18  CA  UNK A   5      49.233  -5.330  20.255  1.00  0.00           C  
ATOM     19  C   UNK A   5      48.099  -4.517  19.988  1.00  0.00           C  
ATOM     20  O   UNK A   5      47.425  -4.219  20.898  1.00  0.00           O  
ATOM     21  CB  UNK A   5      50.344  -4.350  20.407  1.00  0.00           C  
ATOM     22  CG  UNK A   5      49.845  -3.031  21.133  1.00  0.00           C  
ATOM     23  N   ASP A   6      48.053  -4.242  18.702  1.00  0.00           N  
ATOM     24  CA  ASP A   6      47.037  -3.448  18.051  1.00  0.00           C  
```
There is a [UNK] residue type in my PDB file, and this residue will be encoded as id 20. However, this will trigger an error of [ResidueTypeSeqFeat](https://github.com/NVIDIA-Digital-Bio/la-proteina/blob/cde5de3ead6e4d76f367da6dc5174be9913ef6ca/proteinfoundation/nn/feature_factory.py#L681).

How should I process the input if there are unknown residues in my PDB?
Am I supposed to set the unknown residue to a padding token?

Thanks!

Best,
JD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle input pdb file with [UNK] token ? #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to handle input pdb file with [UNK] token ? #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions