@@ -33,43 +33,33 @@ dataset_functions
33
33
What?
34
34
+++++
35
35
36
- Chain of responsibility pattern:
37
- refactoring.guru/design-patterns/chain-of-responsibility/python/example
38
-
39
- RecNN is designed to work with your dataflow.
40
- Function that contain 'dataset' are needed to interact with environment.
41
- The environment is provided via env.argument.
42
- These functions can interact with env and set up some stuff how you like.
43
- They are also designed to be argument agnostic
44
-
45
- Basically you can stack them how you want.
46
-
47
- To further illustrate this, let's take a look onto code sample from FrameEnv::
48
-
49
- class Env:
50
- def __init__(self, ...,
51
- # look at this function provided here:
52
- prepare_dataset=dataset_functions.prepare_dataset,
53
- .....):
54
-
55
- self.user_dict = None
56
- self.users = None # filtered keys of user_dict
57
-
58
- self.prepare_dataset(df=self.ratings, key_to_id=self.key_to_id,
59
- min_seq_size=min_seq_size, frame_size=min_seq_size, env=self)
60
-
61
- # after this call user_dict and users should be set to their values!
62
-
63
- In reinforce example I further modify it to look like::
64
-
65
- def prepare_dataset(**kwargs):
66
- recnn.data.build_data_pipeline([recnn.data.truncate_dataset,
67
- recnn.data.prepare_dataset],
68
- reduce_items_to=5000, **kwargs)
69
-
70
- Notice: prepare_dataset doesn't take **reduce_items_to ** argument, but it is required in truncate_dataset.
71
- As I previously mentioned RecNN is designed to be argument agnostic, meaning you provide some kwarg in the
72
- build_data_pipeline function and it is passed down the function chain. If needed, it will be used. Otherwise ignored
36
+ RecNN is designed to work with your data flow.
37
+
38
+ Set kwargs in the beginning of prepare_dataset function.
39
+ Kwargs you set are immutable.
40
+
41
+ args_mut are mutable arguments, you can access the following:
42
+ base: data.EnvBase, df: DataFrame, users: List[int],
43
+ user_dict: Dict[int, Dict[str, np.ndarray]
44
+
45
+ Access args_mut and modify them in functions defined by you.
46
+ Best to use function chaining with build_data_pipeline.
47
+
48
+ recnn.data.prepare_dataset is a function that is used by default in Env.__init__
49
+ But sometimes you want some extra. I have also predefined truncate_dataset.
50
+ This function truncates the number of items to specified one.
51
+ In reinforce example I modify it to look like::
52
+
53
+ def prepare_dataset(args_mut, kwargs):
54
+ kwargs.set('reduce_items_to', num_items) # set kwargs for your functions here!
55
+ pipeline = [recnn.data.truncate_dataset, recnn.data.prepare_dataset]
56
+ recnn.data.build_data_pipeline(pipeline, kwargs, args_mut)
57
+
58
+ # embeddgings: https://drive.google.com/open?id=1EQ_zXBR3DKpmJR3jBgLvt-xoOvArGMsL
59
+ env = recnn.data.env.FrameEnv('..',
60
+ '...', frame_size, batch_size,
61
+ embed_batch=embed_batch, prepare_dataset=prepare_dataset,
62
+ num_workers=0)
73
63
74
64
.. automodule :: recnn.data.dataset_functions
75
65
:members:
0 commit comments