Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR ports the following commit to vEcoli: CovertLab/WholeCellEcoliRelease@bdfd19a
Currently, the kinetic tRNA charging model can only be run with growth rate control off:
The average doubling time with the new model is close to the expected 44 minutes with operons off.

With operons on, the spread of doubling times increases dramatically, with many outliers above 1 hour.

*16 seeds, 32 generations (omit doubling times for first 8 generations)
Other tweaks
Converting between TUs and cistrons where necessary
Operons were added after the reference commit. As a result, some parts of the reference commit (example) require cistron data instead of TU data (example ported).
Discovering coding opal codons at runtime
The reference commit hard-coded the protein IDs and positions of opal codons that code for selenocysteine. I added some logic to determine this at runtime using the EcoCyc sequence data. At the time of this PR, my logic produced the exact same proteins and positions as the hard-coded values in the reference commit.
Adjusting TU boundaries to include TSS of first gene
Some of the adjusted genes are already noted on EcoCyc as unusual (example). My manual adjustments are just a temporary fix. We'll need to come up with a more robust long-term solution.
Making new ParCa and sim modules deterministic with PRNG seeding
There were some places in the code that used unseeded
np.random
or Cythonrand()
. This meant running the sim twice with the same inputs could yield different outputs. This lack of reproducibility made it near impossible to debug issues (especially rare ones like the reconciliation buffer issue discussed below).Accounting for reconciliation buffer when building codon sequences
In the ParCa, codon sequences are stored in an array such that the longest polypeptide sequence has 30 codons worth of padding at the end. In the model, the reconciliation program is allowed to read ahead of a ribosome's current position by up to 32 codons. In the extremely rare case that a ribosome is just on the cusp of fully translating the longest polypeptide, reconciliation may try to read beyond the dimensions of the codon sequence array, raising an error.
Enabling multi-core tRNA charging parameter optimization
The reference commit adds a new ParCa option to optimize the kinetic tRNA charging parameters (
optimize_trna_charging_kinetics
). I modified this to use multiprocessing, with each process handling optimization for a single amino acid. Unfortunately, a handful of amino acids take an order of magnitude longer than the others. In my testing, most amino acids finish optimizing in a few minutes while a handful drag the process out for 30+ minutes.Re-optimizing ParCa tRNA charging parameters
Because a lot has changed since the reference commit, I decided to re-run the optimization described above and commit the new parameters. We'll need to develop a standard protocol for re-optimizing parameters going forwards.
Next Steps