Skip to content

Commit

Permalink
Update examples
Browse files Browse the repository at this point in the history
  • Loading branch information
danielvdende committed May 4, 2019
1 parent 51e386f commit 6e4bb8e
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 40 deletions.
72 changes: 38 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
[![PyPI version](https://badge.fury.io/py/opulent-pandas.svg)](https://badge.fury.io/py/opulent-pandas)
# Opulent-Pandas
Opulent-Pandas is a schema validation packages aimed specifically at validating the schema of pandas dataframes.
It takes heavy inspiration from [voluptuous](), and tries to stay as close as possible to the API defined in this package. Opulent-Pandas
is different from voluptuous in that it heavily relies on [Pandas]() to perform the validation. This makes Opulent-Pandas considerably faster
It takes heavy inspiration from [voluptuous](https://github.com/alecthomas/voluptuous), and tries to stay as close as possible to the API defined in this package. Opulent-Pandas
is different from voluptuous in that it heavily relies on [Pandas](https://pandas.pydata.org/) to perform the validation. This makes Opulent-Pandas considerably faster
than voluptuous on larger datasets. It does, however, mean that the input format is also a Pandas DataFrame, rather than a dict (as is the case for voluptuous)
A performance comparison of voluptuous and Opulent-Pandas can be found [here]()
A performance comparison of voluptuous and Opulent-Pandas will be added to this readme soon!

## Example
Defining a schema in Opulent-Pandas is very similar to how you would in [voluptuous](). To make the similarities and differences clear, let's walk through the same example as is done in the voluptuous readme.
Defining a schema in Opulent-Pandas is very similar to how you would in voluptuous. To make the similarities and differences clear, let's walk through the same example as is done in the voluptuous readme.

Twitter's [user search API](https://dev.twitter.com/rest/reference/get/users/search) accepts
query URLs like:
Expand All @@ -28,25 +28,29 @@ To validate this we might use a schema like:
... })

```
Comparing with voluptuous, you'll notice a few things:
1. The validators per field are always specified as a list.
Comparing with voluptuous, you'll notice that the validators per field are always specified as a list. Other than that,
it's very similar to how you would define the schema with voluptuous

If we look at the more complex schema, as defined in the readme of voluptuous, we see very similar schemas:

```pycon
>>> from opulent_pandas.validator import Required, RangeValidator, TypeValidator, ValueLengthValidator
>>> schema = Schema({
... Required('q'): [TypeValidator(str), ValueLengthValidator(min_length=1)),
... Required('per_page'): [TypeValidator(int), RangeValidator(min=1, max=20)),
... Required('page'): [TypeValidator(int), RangeValidator(min=0)),
... Required('q'): [TypeValidator(str), ValueLengthValidator(min_length=1)],
... Required('per_page'): [TypeValidator(int), RangeValidator(min=1, max=20)],
... Required('page'): [TypeValidator(int), RangeValidator(min=0)],
... })

```

If you pass data in that does not satisfy the requirements specified in your Opulent-Pandas schema, you'll get a corresponding error message:
One difference between Opulent-Pandas and voluptuous is that Opulent-Pandas has a `validate` function that can be used
to validate a given data structure rather tha voluptuous' approach of passing the data directly to your schema as a parameter.

There are 3 required fields:
If you pass data in that does not satisfy the requirements specified in your Opulent-Pandas schema, you'll get a corresponding error message. Walking
through the examples provided in the voluptuous readme:

There are 3 required fields:
TODO: this example should also tell you which columns are missing. Seems to be a bug.
```pycon
>>> from opulent_pandas import MissingColumnError
>>> try:
Expand All @@ -62,47 +66,49 @@ True
`q` must be a string:

```pycon
>>> from opulent_pandas import InvalidTypeError
>>> try:
... schema.validate(pd.DataFrame({'q': [123], 'per_page':[10], 'page': [1]})
... raise AssertionError('InvalidTypeError not raised')
... except InvalidTypeError as e:
... exc = e
>>> str(exc) == "Invalid data type found. Required: <class 'str'>"
>>> str(exc) == "Invalid data type found for column: q. Required: <class 'str'>"
True

```

...and must be at least one character in length:

```pycon
>>> from opulent_pandas import ValueLengthError
>>> try:
... schema({'q': ''})
... raise AssertionError('MultipleInvalid not raised')
... except MultipleInvalid as e:
... schema.validate(pd.DataFrame({'q': [''], 'per_page': 5, 'page': 12}))
... raise AssertionError('ValueLengthError not raised')
... except ValueLengthError as e:
... exc = e
>>> str(exc) == "length of value must be at least 1 for dictionary value @ data['q']"
True
>>> schema({'q': '#topic'}) == {'q': '#topic', 'per_page': 5}
>>> str(exc) == "Value found with length smaller than enforced minimum length for column: q. Minimum Length: 1"
True

```

"per\_page" is a positive integer no greater than 20:

```pycon
>>> from opulent_pandas import RangeError
>>> try:
... schema({'q': '#topic', 'per_page': 900})
... raise AssertionError('MultipleInvalid not raised')
... except MultipleInvalid as e:
... exc = e
>>> str(exc) == "value must be at most 20 for dictionary value @ data['per_page']"
... schema.validate(pd.DataFrame({'q': ['#topic'], 'per_page': [900], 'page': [12]}))
... raise AssertionError('RangeError not raised')
... except RangeError as e:
... exc = e
>>> str(exc) == "Value found larger than enforced maximum for column: per_page. Required maximum: 20"
True

>>> try:
... schema({'q': '#topic', 'per_page': -10})
... raise AssertionError('MultipleInvalid not raised')
... except MultipleInvalid as e:
... exc = e
>>> str(exc) == "value must be at least 1 for dictionary value @ data['per_page']"
... schema.validate(pd.DataFrame({'q': ['#topic'], 'per_page': [-10], 'page': [12]}))
... raise AssertionError('RangeError not raised')
... except RangeError as e:
... exc = e
>>> str(exc) == "Value found larger than enforced minimum for column: per_page. Required minimum: 1"
True

```
Expand All @@ -111,13 +117,11 @@ True

```pycon
>>> try:
... schema({'q': '#topic', 'per_page': 'one'})
... raise AssertionError('MultipleInvalid not raised')
... except MultipleInvalid as e:
... schema.validate(pd.DataFrame({'q': ['#topic'], 'per_page': ['one']})
... raise AssertionError('InvalidTypeError not raised')
... except InvalidTypeError as e:
... exc = e
>>> str(exc)
"expected int for dictionary value @ data['per_page']"
>>> schema({'q': '#topic', 'page': 1}) == {'q': '#topic', 'page': 1, 'per_page': 5}
>>> str(exc) == "Invalid data type found for column: page. Required type: <class 'int'>"
True

```
6 changes: 3 additions & 3 deletions opulent_pandas/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def __init__(self, valid_type: type):
def validate(self, df_column: pd.Series):
if not (df_column.apply(type) == self.valid_type).all():
raise InvalidTypeError(
f"Invalid data type found for column: {df_column.name}."
f"Invalid data type found for column: {df_column.name}. "
f"Required type: {self.valid_type}"
)

Expand All @@ -80,13 +80,13 @@ def validate(self, df_column: pd.Series):
if self.min:
if not (df_column >= self.min).all():
raise RangeError(
f"Value found smaller than enforced minimum for column: {df_column.name}."
f"Value found smaller than enforced minimum for column: {df_column.name}. "
f"Required minimum: {self.min}"
)
if self.max:
if not (df_column <= self.max).all():
raise RangeError(
f"Value found larger than enforced maximum for column: {df_column.name}."
f"Value found larger than enforced maximum for column: {df_column.name}. "
f"Required maximum: {self.max}"
)

Expand Down
6 changes: 3 additions & 3 deletions tests/test_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def test_invalid_type_error(self):
schema.validate(self.data)
self.assertEqual(
exc_info.exception.msg,
"Invalid data type found for column: foo.Required type: " "<class 'str'>",
"Invalid data type found for column: foo. Required type: " "<class 'str'>",
)

def test_error_thrown_for_missing_columns(self):
Expand Down Expand Up @@ -57,7 +57,7 @@ def test_validation_exceeds_lower_range(self):
schema.validate(self.data)
self.assertEqual(
exc_info.exception.msg,
"Value found smaller than enforced minimum for column: baz."
"Value found smaller than enforced minimum for column: baz. "
"Required minimum: 8",
)

Expand All @@ -74,7 +74,7 @@ def test_validation_exceeds_upper_range(self):
schema.validate(self.data)
self.assertEqual(
exc_info.exception.msg,
"Value found larger than enforced maximum for column: baz."
"Value found larger than enforced maximum for column: baz. "
"Required maximum: 8",
)

Expand Down

0 comments on commit 6e4bb8e

Please sign in to comment.