Merge pull request #1 from analysiscenter/release

Initial public release
analysiscenter · Nov 22, 2017 · 75d6689 · 75d6689
2 parents d654f72 + fd116eb
commit 75d6689
Show file tree

Hide file tree

Showing 69 changed files with 16,459 additions and 3 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,6 @@
+# Set the default behavior, in case people don't have core.autocrlf set.
+* text=auto
+
+# Explicitly declare text files you want to always be normalized on checkout.
+*.py text
+*.sh text
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+*.pyc
+.cache/*
+__pycache__
+__pycache__/*
+*/__pycache__/*
+.ipynb_checkpoints
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "cardio/dataset"]
+	path = cardio/dataset
+	url = https://github.com/analysiscenter/dataset.git
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,38 @@
+- Перед любыми операциями с репозиториями у каждого пользователя должно быть настроено имя и адрес почты:
+```bash
+git config --global user.name "Firstname Lastnameov"
+git config --global user.email [email protected]
+```
+Причем email **должен совпадать** с email'ом, который указан в вашем github-аккаунте (в нем может быть несколько email'ов). 
+
+- В корневом каталоге каждого репозитория должен быть размещен файл README.md с кратким описанием проекта, структуры исходного кода, инструкцией по установке и ссылками на документацию.
+
+- Все содержательные файлы рекомендуется размещать в подкаталогах, а в корневом хранить только описательные (README.md, INSTALL.md и т.п.), 
+инсталляционные (setup.py, requirements.txt и т.п.), а также конфигурационные и make-файлы.
+
+- Имена файлов должны содержать только латинские буквы. Пробелы в наименованиях файлов не допускаются.
+
+- Коммиты в ветку `master` не допускаются. Она должна быть защищена от удаления и изменения истории 
+(Settings - Branches - Protected branches).
+
+- Изменения в исходном коде и файлах репозитория рекомендуется производить только в рамках задач (issues). 
+Для каждого изменения исполнитель открывает отдельную ветку с наименованием вида <iTASK-ID>-<short branch name> (например, `i15-dataset` или `i22-HMM`).
+
+- В рамках одной задачи можно создавать несколько веток в одном репозитории.
+
+- Если у вас нет задачи, имеет смысл ее открыть и явным образом завести в issues.
+
+- Коммиты в рабочие ветки рекомендуется делать регулярно, чтобы каждый коммит содержал не слишком объемные, 
+но вместе с тем завершенные и независимые от всего остального изменения в репозитории 
+(лучше закоммитить 3 измененных строки, чем сразу 300).
+
+- Коммит должен содержать однострочный англоязычный комментарий (длиной 20-60 символов), 
+отражающий содержание включенных в него изменений исходного кода и файлов.
+
+- Более подробное описание изменений следует сохранять в файле HISTORY.md, размещенном в корневом каталоге репозитория.
+
+- Выполнив задачу и завершив все изменения, исполнитель открывает pull request на слияние рабочей и продуктивной ветки (например, master).
+
+- Перед слиянием рабочая ветка не должна отставать от продуктивной (что можно проверить с помощью `git status`). Для этого следует предварительно синхронизировать рабочую ветку (`git pull`).
+
+- После слияния рабочая ветка удаляется.
diff --git a/LICENSE b/LICENSE
@@ -178,15 +178,15 @@
    APPENDIX: How to apply the Apache License to your work.
 
       To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
+      boilerplate notice, with the fields enclosed by brackets "{}"
       replaced with your own identifying information. (Don't include
       the brackets!)  The text should be enclosed in the appropriate
       comment syntax for the file format. We also recommend that a
       file or class name and description of purpose be included on the
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [yyyy] [name of copyright owner]
+   Copyright {yyyy} {name of copyright owner}
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,16 @@
+include MANIFEST.in
+include LICENSE
+include README.md
+include setup.py
+
+recursive-include cardio *
+recursive-include docs *
+recursive-include tutorials *
+recursive-exclude docs/_build *
+
+global-exclude *.pyc *.pyo *.pyd
+global-exclude *.git
+global-exclude *.so
+global-exclude *~
+global-exclude \#*
+global-exclude .DS_Store
diff --git a/README.md b/README.md
@@ -1 +1,97 @@
-# cardio
+# CardIO
+
+CardIO is a library that works with electrocardiograms (ECG). With CardIO you can
+
+* load and save signal in various formats
+* resample, crop, filter and flip signal
+* allocate PQ, QT, QRS segments
+* calculate heart rate and other standard ECG characteristics
+* apply complex transformations like fft and wavelets, or any other custom functions.
+* recognize heart diseases from ECG
+* efficiently work with large datasets that do not even fit into memory
+* easily arrange new custom actions into pipelines
+* do end-to-end ECG processing
+* build, train and test custom models for deep research
+
+… and do everything under a single API.
+
+For more details see [the documentation and tutorials](https://analysiscenter.github.io/cardio/).
+
+## About CardIO
+
+The library is based on [Dataset](https://github.com/analysiscenter/dataset/). We suggest to read Dataset's [documentation](https://analysiscenter.github.io/dataset/) to learn more.
+
+CardIO has three modules: [```batch```](https://analysiscenter.github.io/cardio/intro/batch.html) [```models```](https://analysiscenter.github.io/cardio/intro/models.html) and [```pipelines```](https://analysiscenter.github.io/cardio/intro/pipeline.html).
+
+Module ```batch``` contains low-level actions for ECG processing.
+Actions are included in ```EcgBatch``` class that also defines how
+to store ECGs. From these actions you can biuld new pipelines. You can also
+write custom action and include it in ```EcgBatch```.
+
+In ```models``` we provide several models that were elaborated to learn the most important problems in ECG:
+* how to recognize specific features of ECG like R-peaks, P-wave, T-wave
+* how to recognize heart diseases from ECG, for example - atrial fibrillation.
+
+Module ```pipelines``` contains high-level methods that
+* train model to allocate PQ, QT, QRS segments
+* calculate heart rate
+* train model to find probabilities of heart diseases.
+
+Under the hood these methods contain many actions that load signals, filter it and do complex caclulations. Using pipelines you do not think about this part of work and simply pass ECG datasets and get results.
+
+## Basic usage
+
+Here is an example of pipeline that loads ECG signals, makes some preprocessing and learns model over 50 epochs.
+```python
+train_ppl = (
+    dtst.train
+        .pipeline
+        .init_model("dynamic", DirichletModel, name="dirichlet",
+                    config=model_config)
+        .init_variable("loss_history", init=list)
+        .load(components=["signal", "meta"], fmt="wfdb")
+        .load(components="target", fmt="csv", src=LABELS_PATH)
+        .drop_labels(["~"])
+        .replace_labels({"N": "NO", "O": "NO"})
+        .flip_signals()
+        .random_resample_signals("normal", loc=300, scale=10)
+        .random_split_signals(2048, {"A": 9, "NO": 3})
+        .binarize_labels()
+        .train_model("dirichlet", make_data=make_data, fetches="loss", save_to=V("loss_history"), mode="a")
+        .run(batch_size=100, shuffle=True, drop_last=True, n_epochs=50)
+)
+```
+
+As a result of this pipeline one obtains a trained model.
+
+## Installation
+
+> `CardIO` module is in the beta stage. Your suggestions and improvements are very welcome.
+
+> `CardIO` supports python 3.5 or higher.
+
+### Installation as python package
+
+With [pipenv](https://docs.pipenv.org/):
+
+    pipenv install git+https://github.com/analysiscenter/cardio.git#egg=cardio
+
+With [pip](https://pip.pypa.io/en/stable/):
+
+    pip3 install git+https://github.com/analysiscenter/cardio.git
+
+After that just import `cardio`:
+```python
+import cardio
+```
+
+### Installation as a project repository:
+
+    git clone --recursive https://github.com/analysiscenter/ecg.git
+
+Flag `--recursive` is used to clone submodules.
+
+## Citing CardIO
+Please cite CardIO in your publications if it helps your research.
+
+    Khudorozhkov R., Illarionov E., Kuvaev A., Podvyaznikov D. CardIO library for data science research of heart signals. 2017.
diff --git a/cardio/__init__.py b/cardio/__init__.py
@@ -0,0 +1,8 @@
+""" ECG package """
+import sys
+
+from .batch import *  # pylint: disable=wildcard-import
+from . import dataset  # pylint: disable=wildcard-import
+
+
+__version__ = '0.1.0'
diff --git a/cardio/batch/__init__.py b/cardio/batch/__init__.py
@@ -0,0 +1,3 @@
+""" ECG Batch """
+from .ecg_batch import EcgBatch
+from .ecg_dataset import EcgDataset