Here's the basic parser with instructions on installation and usage

shtratos · Jun 28, 2019 · d1e6f4d · d1e6f4d
1 parent 8c9472a
commit d1e6f4d
Show file tree

Hide file tree

Showing 8 changed files with 454 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -102,3 +102,8 @@ venv.bak/
 
 # mypy
 .mypy_cache/
+
+.idea
+text_payslips/*.txt
+/payslips-month-columns.csv
+/payslips-month-rows.csv
diff --git a/Pipfile b/Pipfile
@@ -0,0 +1,13 @@
+[[source]]
+url = "https://pypi.org/simple"
+verify_ssl = true
+name = "pypi"
+
+[packages]
+"pdfminer.six" = "*"
+chardet = "==3.0.4"
+
+[dev-packages]
+
+[requires]
+python_version = "3.7"
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -1,2 +1,74 @@
 # ms-uk-payslip-parser
-Parser for payslips
+
+Simple parser for payslips issued by MS UK.
+
+Converts a series of your PDF payslips into a neat CSV table. 
+
+## Installation
+
+- Install Python3 3.7+ and Virtualenv
+- Install dependencies
+```
+# create a virtualenv
+mkvirtualenv payslip-parser
+# switch to virtualenv
+workon payslip-parser
+# install dependencies
+pip3 install -r requirements.txt
+```
+- Or if you have `pipenv` installed:
+```bash
+pipenv install
+```
+
+## Usage
+
+1. Download your payslips PDF files from the portal and put them in a directory
+   e.g. `~/payslips`
+
+2. Get into your virtualenv:
+
+    ```bash
+    workon payslip-parser
+    ```
+
+    or if you have `pipenv`
+
+    ```bash
+    pipenv shell
+    ```
+
+3. First, convert PDF files to text:
+
+    ```bash
+    python3 to_text.py ~/payslips ./text_payslips
+    ``` 
+
+    Now you should see text files with your payslips content in `text_payslips` directory.
+
+4. Now you can parse the text files and produce CSV tables:
+
+    ```bash
+    python3 parser.py ./text_payslips
+    ``` 
+
+   After this you will see two CSV files in this directory:
+   - `payslips-month-columns.csv` - each month's data is in a separate column
+   - `payslips-month-rows.csv` - each month's data is in a separate row
+
+   Every payslip item label has a short prefix identifying its payslip section:
+   - `.m` - metadata item
+   - `.d.p` - payments data item
+   - `.d.d` - deductions data item
+   - `.d.t` - totals data item
+   - `.d.et` - employer totals data item
+   - `.d.ytd` - year-to-date data item
+
+5. Open the CSV file in your spreadsheet editor of choice or Pandas.
+
+
+## Feedback
+
+Create an issue if you encounter a problem or have a suggestions.
+Or ping me on Teams.
+