Skip to content

DetermineResidueSequence

dstoeckel edited this page Mar 16, 2015 · 2 revisions

How can I get the protein's amino acid sequence in one- or three-letter-code?

Use the peptide-related methods:

C++

#include <BALL/STRUCTURE/peptides.h>

#include <iostream>
#include <iterator>
...

// cast to protein
if (RTTI::isKindOf<Protein>(*(S.getMolecule(0))))
{
   // take the system's first protein 
   Protein* protein = RTTI::castTo<Protein>(*(S.getMolecule(0)));
   
   // get the sequence
   String olc_seq = Peptides::GetSequence(*protein);
   
   // convert to three-letter code
   std::list<String> tlc_seq = Peptides::OneLetterToThreeLetter(olc_seq);

   cout << "One-letter-code: " << olc_seq << endl;

   cout << "Three-letter-code: ";
   std::copy(tlc_seq.begin(), tlc_seq.end(), ostream_iterator<String>(cout, " "));
   cout << std::endl;
}

In many cases, a PDB-File contains not only the protein itself, but also crystallized water, ligands, hetero atoms.... Those will result in a '?' in the sequence. This is not a bug, but rather the correct answer, since the corresponding residue is no amino acid. If this is not desired, you can instead iterate over the sequence and use isAminoAcid() to check, if the residue under consideration is an aminoacid or not.

#include <BALL/KERNEL/residueIterator.h>
#include <BALL/STRUCTURE/peptides.h>

BALL::System S;
...
BALL::ResidueIterator resit = S.beginResidue();

for (; +resit ; ++resit)
{
   if (resit->isAminoAcid())
   {
      std::cout << Peptides::OneLetterCode(resit->getName()) << " ";
   }
}     
std::cout << std::endl;

Python

# get the protein, e.g. from BALLView
sys = getSystems()[0]
prot = sys.getProtein(0)

# get the sequence in one letter code
seq = Peptides.GetSequence(prot)
print "One-letter-code: " + seq

# convert it to three letter code
print "Three-letter-code: " + Peptides.OneLetterToThreeLetter(seq)

and

for res in residues(prot):
  if res.isAminoAcid():
    print Peptides.OneLetterCode(res.getName()) 
Clone this wiki locally