Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
419c5a0
Update AUTHORS
WladimirSidorenko Mar 10, 2016
76331e1
travis.yml added
WladimirSidorenko Mar 12, 2016
4cad960
Merge branch 'master' of github.com:WladimirSidorenko/DiscourseSegmenter
WladimirSidorenko Mar 12, 2016
70f90a7
travis.yml added
WladimirSidorenko Mar 12, 2016
f885b70
travis.yml corrected
WladimirSidorenko Mar 12, 2016
61f5571
travis.yml corrected (scipy put higher in the reqirements)
WladimirSidorenko Mar 12, 2016
a296d57
updated requirements in setup.py
WladimirSidorenko Mar 14, 2016
6cf9e10
updated travis.yml
WladimirSidorenko Mar 14, 2016
0e076e3
pip added to dependencies
WladimirSidorenko Mar 14, 2016
98e0b09
updated dependencies
WladimirSidorenko Mar 14, 2016
7099e55
.travis.yml updated
WladimirSidorenko Mar 14, 2016
3ff6139
.travis.yml and requirements updated
WladimirSidorenko Mar 14, 2016
fd2a3f4
requirements reordered
WladimirSidorenko Mar 14, 2016
fb8ef92
issue #3 (ValueError with edseg) fixed
WladimirSidorenko Apr 8, 2016
e7f157f
requirements updated
WladimirSidorenko Apr 9, 2016
48a1e7a
travis build updated
WladimirSidorenko Apr 9, 2016
768f3b7
travis build updated
WladimirSidorenko Apr 9, 2016
dc886c9
travis build updated
WladimirSidorenko Apr 9, 2016
0db0983
travis build updated
WladimirSidorenko Apr 9, 2016
2f61ce5
travis build updated
WladimirSidorenko Apr 9, 2016
24f1d4d
travis build updated
WladimirSidorenko Apr 9, 2016
ab26e14
travis build updated
WladimirSidorenko Apr 9, 2016
fec45b9
travis build updated
WladimirSidorenko Apr 9, 2016
e4266d0
fortran compiler added to travis
WladimirSidorenko Apr 9, 2016
4c5f0e9
travis recipes updaed
WladimirSidorenko Apr 9, 2016
e3c9beb
travis recipe updated
WladimirSidorenko Apr 9, 2016
62c15b7
one more attempt
WladimirSidorenko Apr 9, 2016
88eb22d
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
d907a4c
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
593357c
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
8e50eac
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
d7abf72
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
06fc866
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
d21c47a
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
b338af9
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
58fb0bf
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
d95b086
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
bc52c54
one more attempt to build using miniconda
WladimirSidorenko Apr 9, 2016
e94b586
minor change in setup
WladimirSidorenko Jun 6, 2016
adf83d1
slight improvements of formattig
WladimirSidorenko Jun 6, 2016
f27f8a3
PEP8 compatibility improved
WladimirSidorenko Jun 8, 2016
ce8a563
doc requirements updated
WladimirSidorenko Jun 8, 2016
1da9961
documentation updated
WladimirSidorenko Jun 8, 2016
650d16b
documentation updated (more RST files added)
WladimirSidorenko Jun 8, 2016
a50aeed
docstrings updated; PEP8 issues fixed
WladimirSidorenko Jun 10, 2016
fdea7fd
test requirements updated
WladimirSidorenko Oct 4, 2016
4f071db
travis recipe updated
WladimirSidorenko Oct 5, 2016
ac76db0
mateseg adjustments merged in
WladimirSidorenko Jan 6, 2017
db33e3f
cosmetic change
WladimirSidorenko Jan 6, 2017
35124fa
documentation updated
WladimirSidorenko Jan 6, 2017
5a66bbe
feat bug fixed
WladimirSidorenko Jan 6, 2017
0716ae8
crossval method removed
WladimirSidorenko Jan 6, 2017
f12efcd
version set to 0.2.1
WladimirSidorenko Jan 6, 2017
12e2cc8
issue #4 fixed
WladimirSidorenko Jan 25, 2017
748eaf9
project version set to 0.2.0
WladimirSidorenko Jun 30, 2017
0a50d2a
project version set to 0.2.1
WladimirSidorenko Jun 30, 2017
8e116fd
travis badge address changed to discourse-lab
WladimirSidorenko Jun 30, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[run]
source = dsegmenter
omit =
*/python?.?/*
*/lib-python/?.?/*.py
*/lib_pypy/_*.py
*/site-packages/ordereddict.py
*/site-packages/nose/*
*/unittest2/*
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,4 @@ docs/_build/
# PyBuilder
target/
MANIFEST
venv/
47 changes: 47 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
language: python

python:
- 2.7

git:
depth: 3

branches:
only:
- master

notifications:
email: false

# Setup anaconda
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda2/bin:$PATH
- conda update --yes conda
- conda create --yes -n condaenv python=$TRAVIS_PYTHON_VERSION
- conda install --yes -n condaenv pip
- source activate condaenv
# The next couple lines fix a crash with multiprocessing on Travis
# and are not specific to using Miniconda
- sudo rm -rf /dev/shm
- sudo ln -s /run/shm /dev/shm

# Install packages
install:
- conda install --yes python=$TRAVIS_PYTHON_VERSION anaconda-client atlas numpy scipy
- conda install --yes python=$TRAVIS_PYTHON_VERSION scikit-learn
# - conda install --yes python=$TRAVIS_PYTHON_VERSION --file=requirements.txt
# Coverage packages are on my binstar channel
# - conda install --yes -c dan_blanchard python-coveralls nose-cov
- pip install -r requirements.txt
- pip install -r test-requirements.txt
- ./setup.py build install

# Run test
script:
- ./setup.py test

after_success:
- bash <(curl -s https://codecov.io/bash)
2 changes: 1 addition & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Andreas Peldszus <peldszus AT uni DASH potsdam DOT de>
Jean VanCoppenolle <vancoppenolle AT uni DASH potsdam DOT de>
Wladimir Sidorenko (Uladzimir Sidarenka) <sidarenk AT uni DASH potsdam DOT de>
Jean VanCoppenolle <vancoppenolle AT uni DASH potsdam DOT de>
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ include README.rst
include requirements.txt
include dsegmenter/bparseg/data/*.model
include dsegmenter/bparseg/data/*.npy
include dsegmenter/bparseg/data/mate.model
include dsegmenter/mateseg/data/mate.model
include dsegmenter/edseg/data/*.txt
include dsegmenter/evaluation/alpha/*
recursive-include examples *
34 changes: 21 additions & 13 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
Discourse Segmenter
===================

.. image:: https://travis-ci.org/discourse-lab/DiscourseSegmenter.svg?branch=master
:alt: Build Status
:align: right
:target: https://travis-ci.org/discourse-lab/DiscourseSegmenter

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:alt: MIT License
:align: right
Expand All @@ -18,8 +23,8 @@ This python module currently comprises three discourse segmenters:

**edseg**
is a rule-based system that uses shallow discourse-oriented
parsing to determine boundaries of elementary discourse units in
text. The rules are hard-coded in the `submodule's file`_ and are
parsing to determine the boundaries of elementary discourse units.
The rules are hard-coded in the `submodule's file`_ and are
only applicable to German input.

**bparseg**
Expand All @@ -32,14 +37,11 @@ This python module currently comprises three discourse segmenters:
--help`` for further instructions on how to do that).

**mateseg**
is an ML-based segmentation module that operates on syntactic
dependency trees (output from Mate_) and decides whether a
sub-structure of the dependency graph initiates a discourse segment
or not using a pre-trained linear SVM model. Again, this model was
trained on the German PCC_ corpus.


*Since the current model is a serialized file and, therefore, likely to be incompatible with future releases of `numpy`, we will probably remove the model files from future versions of this package, including source data instead and performing training during the installation.*
is another ML-based segmentation module that operates on dependency
trees (output from MateParser_) and decides whether a sub-structure
of the dependency graph initiates a discourse segment or not using
a pre-trained linear SVM model. Again, this model was trained on
the German PCC_ corpus.


Installation
Expand Down Expand Up @@ -79,9 +81,15 @@ or, alternatively, also use the delivered front-end script

discourse_segmenter bparseg segment DiscourseSegmenter/examples/bpar/maz-8727.exb.bpar

or

.. code-block:: shell

discourse_segmenter mateseg segment DiscourseSegmenter/examples/conll/maz-8727.parsed.conll

Note that this script requires two mandatory arguments: the type of
the segmenter to use (`bparseg` in the above case) and the operation
to perform (which are specific to each segmenter).
the segmenter to use (`bparseg` or `mateseg` in the above cases) and the
operation to perform (which meight be specific to each segmenter).


Evaluation
Expand All @@ -104,7 +112,7 @@ which requires Java 8.


.. _`Bitpar`: http://www.cis.uni-muenchen.de/~schmid/tools/BitPar/
.. _`Mate`: http://code.google.com/p/mate-tools/
.. _`MateParser`: http://code.google.com/p/mate-tools/
.. _`PCC`: http://www.lrec-conf.org/proceedings/lrec2014/pdf/579_Paper.pdf
.. _`here`: https://github.com/discourse-lab/DiscourseSegmenter/blob/master/scripts/discourse_segmenter
.. _`submodule's file`: https://github.com/discourse-lab/DiscourseSegmenter/blob/master/dsegmenter/edseg/clause_segmentation.py
Expand Down
4 changes: 4 additions & 0 deletions doc-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sphinx>=1.4.1
sphinxcontrib-napoleon>=0.4.4
sphinx-pypi-upload>=0.2.1
sphinx_rtd_theme>=0.1.9
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath(os.path.join('..', 'dsegmenter')))
sys.path.insert(0, os.path.abspath(os.path.join('dsegmenter')))

# -- General configuration ------------------------------------------------

Expand Down Expand Up @@ -77,9 +77,9 @@
# built documents.
#
# The short X.Y version.
version = u'0.0.1.dev1'
version = u'0.2.1'
# The full version, including alpha/beta/rc tags.
release = u'0.0.1.dev1'
release = u'0.2.1'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
6 changes: 6 additions & 0 deletions docs/dsegmenter.bparseg.align.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.bparseg.align
~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.bparseg.align
:members:
:special-members:
6 changes: 6 additions & 0 deletions docs/dsegmenter.bparseg.bparsegmenter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.bparseg.bparsegmenter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: dsegmenter.bparseg.bparsegmenter.BparSegmenter
:members:
:special-members:
6 changes: 6 additions & 0 deletions docs/dsegmenter.bparseg.constants.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.bparseg.constants
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.bparseg.constants
:members:
:special-members:
6 changes: 6 additions & 0 deletions docs/dsegmenter.bparseg.constituency_tree.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.bparseg.constituency_tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.bparseg.constituency_tree
:members:
:special-members:
10 changes: 8 additions & 2 deletions docs/dsegmenter.bparseg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,11 @@ dsegmenter.bparseg
~~~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.bparseg
:members:
:undoc-members:

.. toctree::
:maxdepth: 2

dsegmenter.bparseg.align.rst
dsegmenter.bparseg.bparsegmenter.rst
dsegmenter.bparseg.constants.rst
dsegmenter.bparseg.constituency_tree.rst
6 changes: 6 additions & 0 deletions docs/dsegmenter.edseg.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.edseg
~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.edseg
:members:
:undoc-members:
6 changes: 6 additions & 0 deletions docs/dsegmenter.mateseg.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dsegmenter.mateseg
~~~~~~~~~~~~~~~~~~

.. automodule:: dsegmenter.mateseg
:members:
:undoc-members:
6 changes: 3 additions & 3 deletions docs/dsegmenter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ dsegmenter
----------

.. automodule:: dsegmenter
:members:
:undoc-members:

.. toctree::
:maxdepth: 2
:maxdepth: 3

dsegmenter.bparseg.rst
dsegmenter.edseg.rst
dsegmenter.mateseg.rst
13 changes: 7 additions & 6 deletions dsegmenter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,20 @@

##################################################################
# Documentation

"""Main meta-package containing a collection of discourse segmenters.

Attributes:
edseg (module):
common (module):
routines common to multiple subpackages
edseg (subpackage):
rule-based discourse segmenter for Mate dependency trees
treeseg (module):
treeseg (subpackage):
auxiliary segmenter routines used by syntax-driven segmenters
bparseg (module):
bparseg (subpackage):
machine-learning discourse segmenter for BitPar constituency trees
mateseg (module):
mateseg (subpackage):
machine-learning discourse segmenter for Mate dependency graphs
evaluation (module):
evaluation (subpackage):
metrics for evaluating discourse segmentation
__all__ (List[str]):
list of sub-modules exported by this package
Expand Down
16 changes: 10 additions & 6 deletions dsegmenter/bparseg/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
trees
bparsegmenter (module): class for segmenting syntax trees into discourse
units
__all__ (List[str]): list of sub-modules exported by this package
__all__ (list[str]): list of sub-modules exported by this package
__author__ (str): package's author
__email__ (str): email of package's author
__name__ (str): package's name
Expand All @@ -22,15 +22,19 @@

##################################################################
# Imports
from .constants import ENCODING, NO_PARSE_RE, WORD_SEP
from .bparsegmenter import BparSegmenter, read_trees, read_segments, trees2segs
from .constituency_tree import CTree
from __future__ import absolute_import

from dsegmenter.bparseg.constants import ENCODING, NO_PARSE_RE, WORD_SEP
from dsegmenter.bparseg.bparsegmenter import BparSegmenter, \
read_trees, read_tok_trees, trees2segs
from dsegmenter.bparseg.constituency_tree import CTree, OP, OP_RE, CP, CP_RE

##################################################################
# Intialization
__name__ = "bparseg"
__all__ = ["ENCODING", "NO_PARSE_RE", "WORD_SEP", "BparSegmenter", "CTree", \
"read_trees", "read_segments", "trees2segs"]
__all__ = ["ENCODING", "NO_PARSE_RE", "WORD_SEP", "BparSegmenter", "CTree",
"OP", "OP_RE", "CP", "CP_RE", "read_trees", "read_tok_trees",
"trees2segs"]
__author__ = "Uladzimir Sidarenka"
__email__ = "sidarenk at uni dash potsdam dot de"
__version__ = "0.0.1"
Loading