-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Here's the basic parser with instructions on installation and usage
- Loading branch information
Showing
8 changed files
with
454 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,3 +102,8 @@ venv.bak/ | |
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
.idea | ||
text_payslips/*.txt | ||
/payslips-month-columns.csv | ||
/payslips-month-rows.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
[[source]] | ||
url = "https://pypi.org/simple" | ||
verify_ssl = true | ||
name = "pypi" | ||
|
||
[packages] | ||
"pdfminer.six" = "*" | ||
chardet = "==3.0.4" | ||
|
||
[dev-packages] | ||
|
||
[requires] | ||
python_version = "3.7" |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,74 @@ | ||
# ms-uk-payslip-parser | ||
Parser for payslips | ||
|
||
Simple parser for payslips issued by MS UK. | ||
|
||
Converts a series of your PDF payslips into a neat CSV table. | ||
|
||
## Installation | ||
|
||
- Install Python3 3.7+ and Virtualenv | ||
- Install dependencies | ||
``` | ||
# create a virtualenv | ||
mkvirtualenv payslip-parser | ||
# switch to virtualenv | ||
workon payslip-parser | ||
# install dependencies | ||
pip3 install -r requirements.txt | ||
``` | ||
- Or if you have `pipenv` installed: | ||
```bash | ||
pipenv install | ||
``` | ||
|
||
## Usage | ||
|
||
1. Download your payslips PDF files from the portal and put them in a directory | ||
e.g. `~/payslips` | ||
|
||
2. Get into your virtualenv: | ||
|
||
```bash | ||
workon payslip-parser | ||
``` | ||
|
||
or if you have `pipenv` | ||
|
||
```bash | ||
pipenv shell | ||
``` | ||
|
||
3. First, convert PDF files to text: | ||
|
||
```bash | ||
python3 to_text.py ~/payslips ./text_payslips | ||
``` | ||
|
||
Now you should see text files with your payslips content in `text_payslips` directory. | ||
|
||
4. Now you can parse the text files and produce CSV tables: | ||
|
||
```bash | ||
python3 parser.py ./text_payslips | ||
``` | ||
|
||
After this you will see two CSV files in this directory: | ||
- `payslips-month-columns.csv` - each month's data is in a separate column | ||
- `payslips-month-rows.csv` - each month's data is in a separate row | ||
|
||
Every payslip item label has a short prefix identifying its payslip section: | ||
- `.m` - metadata item | ||
- `.d.p` - payments data item | ||
- `.d.d` - deductions data item | ||
- `.d.t` - totals data item | ||
- `.d.et` - employer totals data item | ||
- `.d.ytd` - year-to-date data item | ||
|
||
5. Open the CSV file in your spreadsheet editor of choice or Pandas. | ||
|
||
|
||
## Feedback | ||
|
||
Create an issue if you encounter a problem or have a suggestions. | ||
Or ping me on Teams. | ||
|
Oops, something went wrong.