Skip to content

Commit

Permalink
Here's the basic parser with instructions on installation and usage
Browse files Browse the repository at this point in the history
  • Loading branch information
shtratos committed Jun 28, 2019
1 parent 8c9472a commit d1e6f4d
Show file tree
Hide file tree
Showing 8 changed files with 454 additions and 1 deletion.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,8 @@ venv.bak/

# mypy
.mypy_cache/

.idea
text_payslips/*.txt
/payslips-month-columns.csv
/payslips-month-rows.csv
13 changes: 13 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
"pdfminer.six" = "*"
chardet = "==3.0.4"

[dev-packages]

[requires]
python_version = "3.7"
83 changes: 83 additions & 0 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,74 @@
# ms-uk-payslip-parser
Parser for payslips

Simple parser for payslips issued by MS UK.

Converts a series of your PDF payslips into a neat CSV table.

## Installation

- Install Python3 3.7+ and Virtualenv
- Install dependencies
```
# create a virtualenv
mkvirtualenv payslip-parser
# switch to virtualenv
workon payslip-parser
# install dependencies
pip3 install -r requirements.txt
```
- Or if you have `pipenv` installed:
```bash
pipenv install
```

## Usage

1. Download your payslips PDF files from the portal and put them in a directory
e.g. `~/payslips`

2. Get into your virtualenv:

```bash
workon payslip-parser
```

or if you have `pipenv`

```bash
pipenv shell
```

3. First, convert PDF files to text:

```bash
python3 to_text.py ~/payslips ./text_payslips
```

Now you should see text files with your payslips content in `text_payslips` directory.

4. Now you can parse the text files and produce CSV tables:

```bash
python3 parser.py ./text_payslips
```

After this you will see two CSV files in this directory:
- `payslips-month-columns.csv` - each month's data is in a separate column
- `payslips-month-rows.csv` - each month's data is in a separate row

Every payslip item label has a short prefix identifying its payslip section:
- `.m` - metadata item
- `.d.p` - payments data item
- `.d.d` - deductions data item
- `.d.t` - totals data item
- `.d.et` - employer totals data item
- `.d.ytd` - year-to-date data item

5. Open the CSV file in your spreadsheet editor of choice or Pandas.


## Feedback

Create an issue if you encounter a problem or have a suggestions.
Or ping me on Teams.

Loading

0 comments on commit d1e6f4d

Please sign in to comment.