Skip to content

Commit

Permalink
Feature/add info multimedia (#11)
Browse files Browse the repository at this point in the history
* Add additional info to a dataset required for pipelines ingestion for eg: multimedia ext #10

* Update readme and dependencies

* update version

* update beta version

* resolve pandas warning message and lint messages

* resolve pandas warning message and lint messages

* resolve lint message

* Rewording ReadMe

* Some fixes

* add more test cases for mimetype url and invalid urls

* fix flake8 pep8 message

---------

Co-authored-by: Mahmoud <[email protected]>
  • Loading branch information
patkyn and sadeghim authored Aug 12, 2024
1 parent 4f8af59 commit 784de57
Show file tree
Hide file tree
Showing 20 changed files with 582 additions and 202 deletions.
11 changes: 11 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[flake8]
max-line-length = 120
max-complexity = 19
select = C,E,F,W,B,B950
ignore = E126,E203,E501,W503,W504
exclude =
.git,
__pycache__,
*.egg-info,
.pytest_cache,
.mypy_cache
2 changes: 1 addition & 1 deletion .github/workflows/publish-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
flake8 . --count --exit-zero --statistics
- name: Install poetry and project dependencies
run: |
python -m pip install poetry
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
flake8 . --count --exit-zero --statistics
- name: Install poetry and project dependencies
run: |
python -m pip install poetry
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
flake8 . --count --exit-zero --statistics
- name: Install project dependencies
run: |
python -m pip install poetry
Expand Down
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,25 +43,32 @@ poetry build
&nbsp;
### Installation

To use locally built package in a virtual environment for eg in preingestion or galaxias:
Install published package
```bash
pip install dwcahandler
```

To use locally built package in a virtual environment:
```bash
pip install <folder>/dwcahandler/dist/dwcahandler-<version>.tar.gz
```

However, to install published package from testpypi
To install published package from testpypi
```bash
pip install -i https://test.pypi.org/simple/ dwcahandler
```
&nbsp;
### Examples of dwcahandler usages:

* Create Darwin Core Archive from csv file
* In creating a dwca with multimedia extension, provide format and type values in the Simple Multimedia extension, otherwise, dwcahandler will attempt to fill these info by guessing the mimetype from url or extracting content type of the url which will slow down the creation of dwca depending on how large the dataset is.

```python
from dwcahandler import CsvFileType
from dwcahandler import DwcaHandler
from dwcahandler import Eml

core_csv = CsvFileType(files=['/tmp/occurrence.csv'], type='occurrence', keys='occurrenceID')
core_csv = CsvFileType(files=['/tmp/occurrence.csv'], type='occurrence', keys=['occurrenceID'])
ext_csvs = [CsvFileType(files=['/tmp/multimedia.csv'], type='multimedia')]

eml = Eml(dataset_name='Test Dataset',
Expand All @@ -74,6 +81,7 @@ DwcaHandler.create_dwca(core_csv=core_csv, ext_csv_list=ext_csvs, eml_content=em
```
&nbsp;
* Create Darwin Core Archive from pandas dataframe

```python
from dwcahandler import DwcaHandler
from dwcahandler.dwca import DataFrameType
Expand All @@ -93,6 +101,7 @@ eml = Eml(dataset_name='Test Dataset',
rights="test rights")

DwcaHandler.create_dwca(core_csv=core_frame, ext_csv_list=ext_frame, eml_content=eml, output_dwca_path='/tmp/dwca.zip')

```
&nbsp;
* Merge Darwin Core Archive
Expand All @@ -109,7 +118,7 @@ DwcaHandler.merge_dwca(dwca_file='/tmp/dwca.zip', delta_dwca_file='/tmp/delta-dw
from dwcahandler import CsvFileType
from dwcahandler import DwcaHandler

delete_csv = CsvFileType(files=['/tmp/old-records.csv'], type='occurrence', keys='occurrenceID')
delete_csv = CsvFileType(files=['/tmp/old-records.csv'], type='occurrence', keys=['occurrenceID'])

DwcaHandler.delete_records(dwca_file='/tmp/dwca.zip',
records_to_delete=delete_csv,
Expand All @@ -118,7 +127,7 @@ DwcaHandler.delete_records(dwca_file='/tmp/dwca.zip',
&nbsp;
* List darwin core terms that is supported in dwcahandler package
```python
from dwca import DwcaHandler
from dwcahandler import DwcaHandler

df = DwcaHandler.list_dwc_terms()
print(df)
Expand All @@ -132,7 +141,7 @@ class DerivedDwca(Dwca):
"""
Derived class to perform other custom operations that is not included as part of the core operations
"""
def _drop_columns(self):
def drop_columns(self):
"""
Drop existing column in the core content
"""
Expand All @@ -141,10 +150,10 @@ class DerivedDwca(Dwca):


dwca = DerivedDwca(dwca_file_loc='/tmp/dwca.zip')
dwca._extract_dwca()
dwca._drop_columns()
dwca._generate_eml()
dwca._generate_meta()
dwca._write_dwca('/tmp/newdwca.zip')
dwca.extract_dwca()
dwca.drop_columns()
dwca.generate_eml()
dwca.generate_meta()
dwca.write_dwca('/tmp/newdwca.zip')

```
Loading

0 comments on commit 784de57

Please sign in to comment.