-
Notifications
You must be signed in to change notification settings - Fork 444
fix shape issues, improve performance utilizing GPU, make it pip installable, etc #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dm-jrae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the modernization! Some small comments that should be easy to implement :)
dnc/access.py
Outdated
| reset_weights = tf.expand_dims(reset_weights, 2) | ||
| weighted_resets = expand_address * reset_weights | ||
| reset_gate = tf.reduce_prod(1 - weighted_resets, [1]) | ||
| # back prob of tf.reduce_prod runs in CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the commented lines? (55 & 56)
dnc/access.py
Outdated
|
|
||
| import addressing | ||
| import util | ||
| from dnc import addressing, util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but we always place individual imports on a new line, would you be able to replace these with
from dnc import addressing
from dnc import util
dnc/addressing.py
Outdated
| """ | ||
| with tf.name_scope('link'): | ||
| batch_size = prev_link.get_shape()[0].value | ||
| # batch_size = prev_link.get_shape()[0].value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the commented line?
dnc/addressing.py
Outdated
| # Calculate the aggregated effect of all write heads | ||
| write_weights = 1 - tf.reduce_prod(1 - write_weights, [1]) | ||
| # back prob of reduce_prod runs in CPU | ||
| # write_weights = 1 - tf.reduce_prod(1 - write_weights, [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the commented line here too, please?
dnc/addressing.py
Outdated
| free_read_weights = free_gate * read_weights | ||
| phi = tf.reduce_prod(1 - free_read_weights, [1], name='phi') | ||
| # back prob of reduce_prod runs in CPU | ||
| # phi = tf.reduce_prod(1 - free_read_weights, [1], name='phi') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ and here
| @@ -0,0 +1,82 @@ | |||
| # Copyright 2017 Google Inc. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g4 mv hasn't quite worked with util.py, please fix :-)
dnc/util.py
Outdated
| size = tf.cast(tf.shape(perm)[0], tf.float32) | ||
| delta = tf.cast(tf.shape(perm)[-1], tf.float32) | ||
| rg = tf.range(0, size*delta, delta, dtype=tf.float32) | ||
| rg = tf.reshape(rg, [-1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think tf.expand_dims(rg, 1) is a bit clearer here, as you're specifically trying to expand out a dimension.
dnc/util.py
Outdated
| def batch_invert_permutation(permutations): | ||
| """Returns batched `tf.invert_permutation` for every row in `permutations`.""" | ||
| with tf.name_scope('batch_invert_permutation', values=[permutations]): | ||
| # unpacked = tf.unstack(permutations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove these commented lines?
dnc/util.py
Outdated
| # return tf.stack(result) | ||
|
|
||
| # fix unknown shape issue when using tf.unstack | ||
| idxf = tf.expand_dims(tf.cast(indices, tf.float32), -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear why you cast idxf to a float32, and instantiate rg as float32 only to move everything back to int32. May as well keep everything as int32s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the lack of explanation. Some tensorflow issues showed that certain operations does not support running on GPU for int64 type. see #13164 and #13163 and such... I went straight to the type casting workaround without giving too much thought and experiment at the time. I will discard these casting as soon as I can confirm tensorflow v1.9 has fixed those
dnc/util.py
Outdated
| def batch_gather(values, indices): | ||
| """Returns batched `tf.gather` for every row in the input.""" | ||
| with tf.name_scope('batch_gather', values=[values, indices]): | ||
| # unpacked = zip(tf.unstack(values), tf.unstack(indices)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove these commented lines?
dm-jrae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, a few more style nitpicks. Can you confirm that the training script runs without issue?
dnc/util.py
Outdated
| return result | ||
|
|
||
| def reduce_prod(x, axis, name=None): | ||
| ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nit, can you format the docstring to coincide with the rest of the package's docstring style.
"""Efficient reduce product over axis.
Uses tf.cumprod and tf.gather_nd as a workaround to the poor performance of calculating tf.reduce_prod's gradient on CPU.
"""
dnc/util.py
Outdated
| Uses tf.cumprod and tf.gather_nd as a workaround to the poor performance of calculating tf.reduce_prod's gradient | ||
| on CPU. | ||
| ''' | ||
| with tf.variable_scope(name or "c_reduce_prod"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you amend this variable scope to a name scope? Preferably replace with this line:
with tf.name_scope(name, 'util_reduce_prod', values=[x]):
dnc/util.py
Outdated
| on CPU. | ||
| ''' | ||
| with tf.variable_scope(name or "c_reduce_prod"): | ||
| cp=tf.cumprod(x, axis, reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add spaces around your equals assignment?
size = tf.shape(cp)[0]
idx1 = ... etc.
You do not need to do this for keyword args however:
dtype=tf.float32 is fine!
dnc/util.py
Outdated
| dim = int(perm.get_shape()[-1]) | ||
| size = tf.cast(tf.shape(perm)[0], tf.float32) | ||
| delta = tf.cast(tf.shape(perm)[-1], tf.float32) | ||
| rg = tf.range(0, size*delta, delta, dtype=tf.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you place a space around this multiply operator?
size * delta
446652e to
3e116d3
Compare
|
Sure. Please be patient, don't merge as I'm working in progress, I'm validating my modification in google cloud platform so I'll have to push many unfinished changes to Github server in order to sync to my cloud compute engine... I'll post a notification in the PR when I'm done. Thanks~ May be I should have created this PR using tag rather than a branch... |
0649cc2 to
8598342
Compare
|
Seems I've messed up the PR when I was trying to squash all the minor commits... |
I met a couple of issues when I tried to use the DNC code, and after about a month's tuning, I've managed to incorporate this model into my existing model. Several major modifications in this PR here:
pip installit, which is kind of counter-best-practice. So I made it "pip installable", with reference to the Convert to a package installable via pip #20 PR. One can generate installable packages by runningpython setup.py sdistunder the root directory of this repo and then install itpip install ./dist/dnc-0.0.1.tar.gz. Some versioning and author information may not be correct, which can be edited or enriched in thesetup.pyfile.tf.unstackcan't process tensors with partially-known shape info. This is quite common in many DL application scenarios I think. So I refactored those methods in theutil.pyby utilizing a combination of tensor operations as a workaround, meanwhile ensuring those operations can be run in GPU.tf.reduce_prod, see #3957,#8841, andtf.nn.top_k, see #5719). I managed to workaroundtf.reduce_prodby usingtf.cumprodinstead, and with some modification to thetf.nn.top_k's gradient method.