Skip to content

Commit cab134f

Browse files
committed
refactored dataloaders and envs
1 parent 742fc34 commit cab134f

10 files changed

+466
-787
lines changed

docs/source/data.rst

+27-37
Original file line numberDiff line numberDiff line change
@@ -33,43 +33,33 @@ dataset_functions
3333
What?
3434
+++++
3535

36-
Chain of responsibility pattern:
37-
refactoring.guru/design-patterns/chain-of-responsibility/python/example
38-
39-
RecNN is designed to work with your dataflow.
40-
Function that contain 'dataset' are needed to interact with environment.
41-
The environment is provided via env.argument.
42-
These functions can interact with env and set up some stuff how you like.
43-
They are also designed to be argument agnostic
44-
45-
Basically you can stack them how you want.
46-
47-
To further illustrate this, let's take a look onto code sample from FrameEnv::
48-
49-
class Env:
50-
def __init__(self, ...,
51-
# look at this function provided here:
52-
prepare_dataset=dataset_functions.prepare_dataset,
53-
.....):
54-
55-
self.user_dict = None
56-
self.users = None # filtered keys of user_dict
57-
58-
self.prepare_dataset(df=self.ratings, key_to_id=self.key_to_id,
59-
min_seq_size=min_seq_size, frame_size=min_seq_size, env=self)
60-
61-
# after this call user_dict and users should be set to their values!
62-
63-
In reinforce example I further modify it to look like::
64-
65-
def prepare_dataset(**kwargs):
66-
recnn.data.build_data_pipeline([recnn.data.truncate_dataset,
67-
recnn.data.prepare_dataset],
68-
reduce_items_to=5000, **kwargs)
69-
70-
Notice: prepare_dataset doesn't take **reduce_items_to** argument, but it is required in truncate_dataset.
71-
As I previously mentioned RecNN is designed to be argument agnostic, meaning you provide some kwarg in the
72-
build_data_pipeline function and it is passed down the function chain. If needed, it will be used. Otherwise ignored
36+
RecNN is designed to work with your data flow.
37+
38+
Set kwargs in the beginning of prepare_dataset function.
39+
Kwargs you set are immutable.
40+
41+
args_mut are mutable arguments, you can access the following:
42+
base: data.EnvBase, df: DataFrame, users: List[int],
43+
user_dict: Dict[int, Dict[str, np.ndarray]
44+
45+
Access args_mut and modify them in functions defined by you.
46+
Best to use function chaining with build_data_pipeline.
47+
48+
recnn.data.prepare_dataset is a function that is used by default in Env.__init__
49+
But sometimes you want some extra. I have also predefined truncate_dataset.
50+
This function truncates the number of items to specified one.
51+
In reinforce example I modify it to look like::
52+
53+
def prepare_dataset(args_mut, kwargs):
54+
kwargs.set('reduce_items_to', num_items) # set kwargs for your functions here!
55+
pipeline = [recnn.data.truncate_dataset, recnn.data.prepare_dataset]
56+
recnn.data.build_data_pipeline(pipeline, kwargs, args_mut)
57+
58+
# embeddgings: https://drive.google.com/open?id=1EQ_zXBR3DKpmJR3jBgLvt-xoOvArGMsL
59+
env = recnn.data.env.FrameEnv('..',
60+
'...', frame_size, batch_size,
61+
embed_batch=embed_batch, prepare_dataset=prepare_dataset,
62+
num_workers=0)
7363

7464
.. automodule:: recnn.data.dataset_functions
7565
:members:

0 commit comments

Comments
 (0)