Merge pull request #17 from camelot-dev/rules-manager

[MRG] Add rules manager
camelot-dev · Nov 12, 2018 · 9e26ea1 · 9e26ea1
2 parents 0eb63a4 + 75a60eb
commit 9e26ea1
Show file tree

Hide file tree

Showing 12 changed files with 198 additions and 157 deletions.
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@ That's it! Now you can go to http://localhost:5000 and extract data tables from
 
 - **Excalibur gives you complete control over your data**. All file storage and processing happens on your own local or remote machine.
 - Excalibur can be configured with **MySQL and Celery** for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.
-- You can save table extraction [rules](https://excalibur-py.readthedocs.io/en/master/user/concepts.html#rule) as **presets** and apply them on different PDFs to extract tables with similar structures. (*in v0.3.0*)
+- You can save table extraction [rules](https://excalibur-py.readthedocs.io/en/master/user/concepts.html#rule) as **presets** and apply them on different PDFs to extract tables with similar structures.
 - You can extract tables from **multiple PDFs in one go** using an extraction rule by starting [jobs](https://excalibur-py.readthedocs.io/en/master/user/concepts.html#job). (*in v0.4.0*)
 
 Excalibur uses [Camelot](https://camelot-py.readthedocs.io/) under the hood. You can check out its [comparison with other PDF table extraction libraries and tools](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).

diff --git a/docs/index.rst b/docs/index.rst
@@ -55,7 +55,7 @@ Why Excalibur?
 
 - **Excalibur gives you complete control over your data**. All file storage and processing happens on your own local or remote machine.
 - Excalibur can be configured with **MySQL and Celery** for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.
-- You can save table extraction :ref:`rules <concepts>` as **presets** and apply them on different PDFs to extract tables with similar structures. (*in v0.3.0*)
+- You can save table extraction :ref:`rules <concepts>` as **presets** and apply them on different PDFs to extract tables with similar structures.
 - You can extract tables from **multiple PDFs in one go** using an extraction rule by starting :ref:`jobs <concepts>`. (*in v0.4.0*)
 
 Excalibur uses `Camelot <https://camelot-py.readthedocs.io/>`_ under the hood. You can check out its `comparison with other PDF table extraction libraries and tools`_.

diff --git a/docs/user/concepts.rst b/docs/user/concepts.rst
@@ -36,14 +36,18 @@ You can check out Camelot's `read_pdf`_ documentation to see a list of all confi
 
 Inside Excalibur, a rule can be specified by selecting a flavor and its corresponding options in the rule box on the workspace. (As shown on the right)
 
-From *v0.2.0*, you will be able to give each rule a name and save them as a preset for use on different PDFs to extract tables with similar structures.
+When you create an extraction rule and start an extraction job, the rule is saved as a preset can be used in the future for PDFs having the same table structure as the one you created the rule on. A saved rule can be loaded on the workspace by selecting it from the "Saved Rules" dropdown.
+
+.. image:: ../_static/gifs/saved-rule.gif
+    :scale: 65%
+    :align: center
 
 Job
 ---
 
 When you create a rule and apply it on a PDF, a table extraction job is created.
 
-From *v0.2.0*, you will be able to apply a rule on multiple PDFs at once.
+From *v0.4.0*, you will be able to apply a rule on multiple PDFs at once.
 
 ----
 

diff --git a/docs/user/howto.rst b/docs/user/howto.rst
@@ -1,71 +1,80 @@
-.. _howto:
-
-How-to Guides
-=============
-
-Excalibur's architecture is heavily inspired from Airflow, so you may get a feeling of déjà vu while reading this page of the documentation. `Airflow LICENSE`_.
-
-.. _Airflow LICENSE: https://github.com/apache/incubator-airflow/blob/master/LICENSE
-
-Setting Configuration Options
------------------------------
-
-The first time you run Excalibur, it will create a file called ``excalibur.cfg`` in your ``$EXCALIBUR_HOME`` directory (``~/excalibur`` by default). This file contains Excalibur’s configuration and you can edit it to change any of the settings.
-
-For example, the metadata database connection string can be set in ``excalibur.cfg`` like this::
-
-    [core]
-    sql_alchemy_conn = my_conn_string
-
-Using the MySQL Database Backend
---------------------------------
-
-Excalibur uses SqlAlchemy to connect to a database backend. By default, stores all metadata in a sqlite database. To use MySQL, you need to first install MySQL and then create a database and a user.
-
-Installing MySQL
-^^^^^^^^^^^^^^^^
-
-To use the MySQL database backend, you need to install Excalibur using::
-
-    $ pip install excalibur-py[mysql]
-
-You can install MySQL using your system's package manager. For Ubuntu::
-
-    $ sudo apt update
-    $ sudo apt install mysql-server libmysqlclient-dev
-
-And then set it up using::
-
-    $ mysql_secure_installation
-
-Setup
-^^^^^
-
-Now you can create the a database and a user for Excalibur::
-
-    > CREATE DATABASE excalibur CHARACTER SET utf8 COLLATE utf8_unicode_ci;
-    > grant all on excalibur.* TO 'excalibur'@'%' IDENTIFIED BY '1234';
-
-Finally, you need to change the ``sql_alchemy_conn`` in ``excalibur.cfg`` to::
-
-    [core]
-    sql_alchemy_conn = mysql://excalibur:1234@localhost:3306/excalibur
-
-And initialize the metadata database using::
-
-    $ excalibur initdb
-
-Scaling Out with Celery
------------------------
-
-``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this to work, you need to setup a Celery backend (RabbitMQ, Redis, …) and change your excalibur.cfg to point the executor parameter to ``CeleryExecutor`` and provide the related Celery settings.
-
-For more information about setting up a Celery broker, refer to the exhaustive `Celery documentation on the topic`_.
-
-.. _Celery documentation on the topic: http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
-
-To kick off a worker, you need to setup Excalibur and kick off the worker subcommand::
-
-    $ excalibur worker
-
-Your worker should start picking up tasks as soon as they get fired in its direction.
+.. _howto:
+
+How-to Guides
+=============
+
+Excalibur's architecture is heavily inspired from Airflow, so you may get a feeling of déjà vu while reading this page of the documentation. `Airflow LICENSE`_.
+
+.. _Airflow LICENSE: https://github.com/apache/incubator-airflow/blob/master/LICENSE
+
+Setting Configuration Options
+-----------------------------
+
+The first time you run Excalibur, it will create a file called ``excalibur.cfg`` in your ``$EXCALIBUR_HOME`` directory (``~/excalibur`` by default). This file contains Excalibur’s configuration and you can edit it to change any of the settings.
+
+For example, the metadata database connection string can be set in ``excalibur.cfg`` like this::
+
+    [core]
+    sql_alchemy_conn = my_conn_string
+
+Resetting the Metadata Database
+-------------------------------
+
+.. warning:: The following command will wipe your Excalibur metadata database, removing all information about uploaded files, saved extraction rules and finished/in-progress jobs.
+
+You can reset the metadata database using::
+
+    $ excalibur resetdb
+
+Using the MySQL Database Backend
+--------------------------------
+
+Excalibur uses SqlAlchemy to connect to a database backend. By default, stores all metadata in a sqlite database. To use MySQL, you need to first install MySQL and then create a database and a user.
+
+Installing MySQL
+^^^^^^^^^^^^^^^^
+
+To use the MySQL database backend, you need to install Excalibur using::
+
+    $ pip install excalibur-py[mysql]
+
+You can install MySQL using your system's package manager. For Ubuntu::
+
+    $ sudo apt update
+    $ sudo apt install mysql-server libmysqlclient-dev
+
+And then set it up using::
+
+    $ mysql_secure_installation
+
+Setup
+^^^^^
+
+Now you can create the a database and a user for Excalibur::
+
+    > CREATE DATABASE excalibur CHARACTER SET utf8 COLLATE utf8_unicode_ci;
+    > grant all on excalibur.* TO 'excalibur'@'%' IDENTIFIED BY '1234';
+
+Finally, you need to change the ``sql_alchemy_conn`` in ``excalibur.cfg`` to::
+
+    [core]
+    sql_alchemy_conn = mysql://excalibur:1234@localhost:3306/excalibur
+
+And initialize the metadata database using::
+
+    $ excalibur initdb
+
+Scaling Out with Celery
+-----------------------
+
+``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this to work, you need to setup a Celery backend (RabbitMQ, Redis, …) and change your excalibur.cfg to point the executor parameter to ``CeleryExecutor`` and provide the related Celery settings.
+
+For more information about setting up a Celery broker, refer to the exhaustive `Celery documentation on the topic`_.
+
+.. _Celery documentation on the topic: http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
+
+To kick off a worker, you need to setup Excalibur and kick off the worker subcommand::
+
+    $ excalibur worker
+
+Your worker should start picking up tasks as soon as they get fired in its direction.
diff --git a/docs/user/usage.rst b/docs/user/usage.rst
@@ -53,7 +53,7 @@ Optionally, you can also select a column separator by clicking on "Add Separator
     :scale: 40%
     :align: center
 
-Finally, you can click on "Extract" to start a table extraction *job*.
+Finally, you can click on "Extract" to start a table extraction *job*. This will save the extraction rule that you created above as a preset which you can use in the future on PDFs with similar table structures as the one you created the rule on.
 
 .. note:: The Lattice flavor for tables with lines doesn't have a "Add Separator" button. It also doesn't need a table area (though you can specify it) since it reliably detects table boundaries and column separators on its own. In most cases, you won't need to tweak any of its configuration options.
 

diff --git a/excalibur/__version__.py b/excalibur/__version__.py
@@ -1,6 +1,6 @@
 # -*- coding: utf-8 -*-
 
-VERSION = (0, 2, 1)
+VERSION = (0, 3, 0)
 
 __title__ = 'excalibur-py'
 __description__ = 'A web interface for Camelot (PDF Table Extraction for Humans).'

diff --git a/excalibur/www/app.py b/excalibur/www/app.py
@@ -1,12 +1,21 @@
+import json
+
 from flask import Flask, Blueprint
 from werkzeug.utils import find_modules, import_string
 
 from .. import configuration as conf
 from .views import views
 
 
+def to_pretty_json(value):
+    value = json.loads(value)
+    return json.dumps(value, sort_keys=True,
+                      indent=4, separators=(',', ': '))
+
+
 def create_app(config=None):
     app = Flask(__name__)
     app.config.from_object(conf)
     app.register_blueprint(views)
+    app.jinja_env.filters['pretty'] = to_pretty_json
     return app