Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Commit

Permalink
Merge branch 'devel' into frames_from_files_fixes
Browse files Browse the repository at this point in the history
[ci skip]
  • Loading branch information
marscher committed Oct 21, 2016
2 parents e6a69d9 + 4308906 commit 5d52505
Show file tree
Hide file tree
Showing 14 changed files with 145 additions and 78 deletions.
8 changes: 8 additions & 0 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Thanks for submitting an issue!

Here's a quick checklist in what to include:

- [ ] Include a detailed description of the bug or suggestion
- [ ] `pip list` or `conda list` of the environment you are using (please attach a txt file to the issue).
- [ ] PyEMMA version and operating system versions
- [ ] Minimal example if possible, a Python script, zipped input data (if not too large)
8 changes: 8 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Thanks for submitting a PR, your contribution is really appreciated!

Here's a quick checklist that should be present in PRs:

- [ ] Make sure to include one or more tests for your change
- [ ] Add yourself to `AUTHORS`
- [ ] Add a new entry to the `doc/source/CHANGELOG` (choose any open position to avoid merge conflicts with other PRs).
Decide whether your change is a fix or a new feature.
27 changes: 0 additions & 27 deletions .project

This file was deleted.

9 changes: 0 additions & 9 deletions .pydevproject

This file was deleted.

24 changes: 24 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Main Authors
============
Benjamin Trendelkamp-Schroer
Christoph Wehmeyer
Fabian Paul
Frank Noe
Guillermo Pérez-Hernández
Martin K. Scherer
Moritz Hoffmann
Jan-Hendrik Prinz



Contributors
============
Alexandra La Fleur
Antonia Meys
Ariel Rokem
Francesco Bonazzi
Ismael Rodriguez Espigares
John Chodera
Josh Fass
Stephan Doerr
@vargaslo
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ EMMA (Emma's Markov Model Algorithms)
:target: https://pypi.python.org/pypi/pyemma
.. image:: https://img.shields.io/pypi/dm/pyemma.svg
:target: https://pypi.python.org/pypi/pyemma
.. image:: https://anaconda.org/xavier/binstar/badges/downloads.svg
.. image:: https://anaconda.org/omnia/badges/downloads.svg
:target: https://anaconda.org/omnia/pyemma
.. image:: https://anaconda.org/omnia/pyemma/badges/installer/conda.svg
:target: https://conda.anaconda.org/omnia
Expand Down
37 changes: 21 additions & 16 deletions doc/source/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
Changelog
=========

2.2.7 (10-20-16)
2.2.7 (10-21-16)
----------------

**New features**:

- coordinates: for lag < chunksize improved speed (50%) for TICA. #960
- coordinates:
- for lag < chunksize improved speed (50%) for TICA. #960
- new config variable "coordinates_check_output" to test for "NaN" and "inf" values in
iterator output for every chunk. The option is disabled by default. It gives insight
during debugging where faulty values are introduced into the pipeline. #967


**Fixes**:

Expand Down Expand Up @@ -142,6 +147,7 @@ Service release. Fixes some
considerable high chunk size as well.

**Fixes**:

- In parallel environments (clusters with shared filesystem) there will be no
crashes due to the config module, which tried to write files in users home
directory. Config files are optional by now.
Expand Down Expand Up @@ -196,19 +202,18 @@ Service release. Fixes some
(reported as Warnings).

- coordinates:
- Completly re-designed class hierachy (user-code/API unaffected).
- Added trajectory info cache to avoid re-computing lengths, dimensions and
byte offsets of data sets.
- Random access strategies supported (eg. via slices).
- FeatureReader supports random access for XTC and TRR (in conjunction with mdtraj-1.6).
- Re-design API to support scikit-learn interface (fit, transform).
- Pipeline elements (former Transformer class) now uses iterator pattern to
obtain data and therefore supports now pipeline trees.
- pipeline elements support writing their output to csv files.
- TICA/PCA uses covartools to estimate covariance matrices.
- This now saves one pass over the data set.
- Supports sparsification data on the fly.

- Completely re-designed class hierachy (user-code/API unaffected).
- Added trajectory info cache to avoid re-computing lengths, dimensions and
byte offsets of data sets.
- Random access strategies supported (eg. via slices).
- FeatureReader supports random access for XTC and TRR (in conjunction with mdtraj-1.6).
- Re-design API to support scikit-learn interface (fit, transform).
- Pipeline elements (former Transformer class) now uses iterator pattern to
obtain data and therefore supports now pipeline trees.
- pipeline elements support writing their output to csv files.
- TICA/PCA uses covartools to estimate covariance matrices:
+ This now saves one pass over the data set.
+ Supports sparsification data on the fly.

**Fixes**:

Expand Down Expand Up @@ -348,7 +353,7 @@ reorganization of the code.
- coordinates package: allow metrics to be passed to cluster algorithms.
- coordinates package: cache trajectory lengths by default
(uncached led to 1 pass of reading for non indexed (XTC) formats).
This avoids re-reading e.g XTC files to determine their lengths.
This avoids re-reading e.g XTC files to determine their lengths.
- coordinates package: enable passing chunk size to readers and pipelines in API.
- coordinates package: assign_to_centers now allows all supported file formats as centers input.
- coordinates package: save_traj(s) now handles stride parameter.
Expand Down
16 changes: 11 additions & 5 deletions pyemma/coordinates/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -984,7 +984,7 @@ def pca(data=None, dim=-1, var_cutoff=0.95, stride=1, mean=None, skip=0):
return _param_stage(data, res, stride=stride)


def tica(data=None, lag=10, dim=-1, var_cutoff=0.95, kinetic_map=True, stride=1,
def tica(data=None, lag=10, dim=-1, var_cutoff=0.95, kinetic_map=True, commute_map=False, stride=1,
force_eigenvalues_le_one=False, mean=None, remove_mean=True, skip=0):
r""" Time-lagged independent component analysis (TICA).
Expand Down Expand Up @@ -1035,6 +1035,10 @@ def tica(data=None, lag=10, dim=-1, var_cutoff=0.95, kinetic_map=True, stride=1,
distances in the transformed data approximate kinetic distances [4]_.
This is a good choice when the data is further processed by clustering.
commute_map : bool, optional, default False
Eigenvector_i will be scaled by sqrt(timescale_i / 2). As a result, Euclidean distances in the transformed
data will approximate commute distances [5]_.
stride : int, optional, default = 1
If set to 1, all input data will be used for estimation. Note that this
could cause this calculation to be very slow for large data sets. Since
Expand Down Expand Up @@ -1145,17 +1149,19 @@ def tica(data=None, lag=10, dim=-1, var_cutoff=0.95, kinetic_map=True, stride=1,
Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9
J. Chem. Theory. Comput. 9, 2000-2009. doi:10.1021/ct300878a
.. [4] Noe, F. and C. Clementi. 2015.
Kinetic distance and kinetic maps from molecular dynamics simulation
(in preparation).
.. [4] Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation.
J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
.. [5] Noe, F., Banisch, R., Clementi, C. 2016. Commute maps: separating slowly-mixing molecular configurations
for kinetic modeling. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.6b00762
"""
from pyemma.coordinates.transform.tica import TICA
if mean is not None:
import warnings
warnings.warn("user provided mean for TICA is deprecated and its value is ignored.")

res = TICA(lag, dim=dim, var_cutoff=var_cutoff, kinetic_map=kinetic_map,
res = TICA(lag, dim=dim, var_cutoff=var_cutoff, kinetic_map=kinetic_map, commute_map=commute_map,
mean=mean, remove_mean=remove_mean, skip=skip)
return _param_stage(data, res, stride=stride)

Expand Down
11 changes: 11 additions & 0 deletions pyemma/coordinates/data/_base/datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,6 +671,14 @@ def next(self):
(not self.return_traj_index and len(X) == 0) or (self.return_traj_index and len(X[1]) == 0)
):
X = self._it_next()
if config.coordinates_check_output:
array = X if not self.return_traj_index else X[1]
if not np.all(np.isfinite(array)):
# determine position
start = self.pos
msg = "Found invalid values in chunk in trajectory index {itraj} at chunk [{start}, {stop}]" \
.format(itraj=self.current_trajindex, start=start, stop=start+len(array))
raise InvalidDataInStreamException(msg)
return X

def __iter__(self):
Expand All @@ -683,3 +691,6 @@ def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
return False


class InvalidDataInStreamException(Exception):
"""Data stream contained NaN or (+/-) infinity"""
21 changes: 21 additions & 0 deletions pyemma/coordinates/tests/test_coordinates_iterator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import numpy as np

from pyemma.coordinates.data import DataInMemory
from pyemma.util.contexts import settings
from pyemma.util.files import TemporaryDirectory
import os
from glob import glob
Expand Down Expand Up @@ -153,5 +154,25 @@ def test_write_to_csv_propagate_filenames(self):
for a, e in zip(actual, expected):
np.testing.assert_allclose(a, e)

def test_invalid_data_in_input_nan(self):
self.d[0][-1] = np.nan
r = DataInMemory(self.d)
it = r.iterator()
from pyemma.coordinates.data._base.datasource import InvalidDataInStreamException
with settings(coordinates_check_output=True):
with self.assertRaises(InvalidDataInStreamException):
for itraj, X in it:
pass

def test_invalid_data_in_input_inf(self):
self.d[1][-1] = np.inf
r = DataInMemory(self.d, chunksize=5)
it = r.iterator()
from pyemma.coordinates.data._base.datasource import InvalidDataInStreamException
with settings(coordinates_check_output=True):
with self.assertRaises(InvalidDataInStreamException) as cm:
for itraj, X in it:
pass

if __name__ == '__main__':
unittest.main()
26 changes: 20 additions & 6 deletions pyemma/coordinates/transform/tica.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def _lazy_estimation(func, *args, **kw):
class TICA(StreamingTransformer):
r""" Time-lagged independent component analysis (TICA)"""

def __init__(self, lag, dim=-1, var_cutoff=0.95, kinetic_map=True, epsilon=1e-6,
def __init__(self, lag, dim=-1, var_cutoff=0.95, kinetic_map=True, commute_map=False, epsilon=1e-6,
mean=None, stride=1, remove_mean=True, skip=0):
r""" Time-lagged independent component analysis (TICA) [1]_, [2]_, [3]_.
Expand All @@ -77,6 +77,9 @@ def __init__(self, lag, dim=-1, var_cutoff=0.95, kinetic_map=True, epsilon=1e-6,
kinetic_map : bool, optional, default True
Eigenvectors will be scaled by eigenvalues. As a result, Euclidean distances in the transformed data
approximate kinetic distances [4]_. This is a good choice when the data is further processed by clustering.
commute_map : bool, optional, default False
Eigenvector_i will be scaled by sqrt(timescale_i / 2). As a result, Euclidean distances in the transformed
data will approximate commute distances [5]_.
epsilon : float
eigenvalue norm cutoff. Eigenvalues of C0 with norms <= epsilon will be
cut off. The remaining number of eigenvalues define the size
Expand Down Expand Up @@ -122,23 +125,25 @@ def __init__(self, lag, dim=-1, var_cutoff=0.95, kinetic_map=True, epsilon=1e-6,
.. [3] L. Molgedey and H. G. Schuster. 1994.
Separation of a mixture of independent signals using time delayed correlations
Phys. Rev. Lett. 72, 3634.
.. [4] Noe, F. and C. Clementi. 2015.
Kinetic distance and kinetic maps from molecular dynamics simulation
http://arxiv.org/abs/1506.06259
.. [4] Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation.
J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
.. [5] Noe, F., Banisch, R., Clementi, C. 2016. Commute maps: separating slowly-mixing molecular configurations
for kinetic modeling. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.6b00762
"""
default_var_cutoff = get_default_args(self.__init__)['var_cutoff']
if dim != -1 and var_cutoff != default_var_cutoff:
raise ValueError('Trying to set both the number of dimension and the subspace variance. Use either or.')

if kinetic_map and commute_map:
raise ValueError('Trying to use both kinetic_map and commute_map. Use either or.')
super(TICA, self).__init__()

if dim > -1:
var_cutoff = 1.0

# empty dummy model instance
self._model = TICAModel()
self.set_params(lag=lag, dim=dim, var_cutoff=var_cutoff, kinetic_map=kinetic_map,
self.set_params(lag=lag, dim=dim, var_cutoff=var_cutoff, kinetic_map=kinetic_map, commute_map=commute_map,
epsilon=epsilon, mean=mean, stride=stride, remove_mean=remove_mean, skip=skip)

@property
Expand Down Expand Up @@ -339,8 +344,17 @@ def _transform_array(self, X):
"""
X_meanfree = X - self.mean
Y = np.dot(X_meanfree, self.eigenvectors[:, 0:self.dimension()])
if self.kinetic_map and self.commute_map:
raise ValueError('Trying to use both kinetic_map and commute_map. Use either or.')
if self.kinetic_map: # scale by eigenvalues
Y *= self.eigenvalues[0:self.dimension()]
if self.commute_map: # scale by (regularized) timescales
timescales = self.timescales[0:self.dimension()]

# dampen timescales smaller than the lag time, as in section 2.5 of ref. [5]
regularized_timescales = 0.5 * timescales * np.tanh(np.pi * ((timescales - self.lag) / self.lag) + 1)

Y *= np.sqrt(regularized_timescales / 2)
return Y.astype(self.output_type())

@property
Expand Down
20 changes: 7 additions & 13 deletions pyemma/msm/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1198,7 +1198,7 @@ def bayesian_hidden_markov_model(dtrajs, nstates, lag, nsamples=100, reversible=
def tpt(msmobj, A, B):
r""" A->B reactive flux from transition path theory (TPT)
The returned :class:`ReactiveFlux <msmtools.flux.ReactiveFlux>` object
The returned :class:`ReactiveFlux <pyemma.msm.models.ReactiveFlux>` object
can be used to extract various quantities of the flux, as well as to
compute A -> B transition pathways, their weights, and to coarse-grain
the flux onto sets of states.
Expand All @@ -1214,29 +1214,29 @@ def tpt(msmobj, A, B):
Returns
-------
tptobj : :class:`ReactiveFlux <pyemma.msm.reactive_flux.ReactiveFlux>` object
tptobj : :class:`ReactiveFlux <pyemma.msm.models.ReactiveFlux>` object
An object containing the reactive A->B flux network
and several additional quantities, such as the stationary probability,
committors and set definitions.
See also
--------
:class:`ReactiveFlux <pyemma.msm.reactive_flux.ReactiveFlux>`
:class:`ReactiveFlux <pyemma.msm.models.ReactiveFlux>`
Reactive Flux model
.. autoclass:: pyemma.msm.reactive_flux.ReactiveFlux
.. autoclass:: pyemma.msm.models.ReactiveFlux
:members:
:undoc-members:
.. rubric:: Methods
.. autoautosummary:: pyemma.msm.reactive_flux.ReactiveFlux
.. autoautosummary:: pyemma.msm.models.ReactiveFlux
:methods:
.. rubric:: Attributes
.. autoautosummary:: pyemma.msm.reactive_flux.ReactiveFlux
.. autoautosummary:: pyemma.msm.models.ReactiveFlux
:attributes:
References
Expand Down Expand Up @@ -1282,13 +1282,6 @@ def tpt(msmobj, A, B):
By default (False), T is a transition matrix.
If set to True, T is a rate matrix.
Returns
-------
tpt: msmtools.flux.ReactiveFlux object
A python object containing the reactive A->B flux network
and several additional quantities, such as stationary probability,
committors and set definitions.
Notes
-----
The central object used in transition path theory is
Expand Down Expand Up @@ -1330,6 +1323,7 @@ def tpt(msmobj, A, B):
raise ValueError('set A or B defines more states, than given transition matrix.')

# forward committor
#msmobj.
qplus = msmana.committor(T, A, B, forward=True)
# backward committor
if msmana.is_reversible(T, mu=mu):
Expand Down
Loading

0 comments on commit 5d52505

Please sign in to comment.