Skip to content

Conversation

@carusyte
Copy link
Contributor

@carusyte carusyte commented Aug 6, 2018

I met a couple of issues when I tried to use the DNC code, and after about a month's tuning, I've managed to incorporate this model into my existing model. Several major modifications in this PR here:

  1. In order to reuse this code, I have to copy paste it because I can't pip install it, which is kind of counter-best-practice. So I made it "pip installable", with reference to the Convert to a package installable via pip #20 PR. One can generate installable packages by running python setup.py sdist under the root directory of this repo and then install it pip install ./dist/dnc-0.0.1.tar.gz. Some versioning and author information may not be correct, which can be edited or enriched in the setup.py file.
  2. tf.unstack can't process tensors with partially-known shape info. This is quite common in many DL application scenarios I think. So I refactored those methods in the util.py by utilizing a combination of tensor operations as a workaround, meanwhile ensuring those operations can be run in GPU.
  3. Performance. Some operations (or its gradients) can't be run in GPU as of tensorflow version 1.9(e.g. tf.reduce_prod, see #3957,#8841, and tf.nn.top_k, see #5719). I managed to workaround tf.reduce_prod by using tf.cumprod instead, and with some modification to the tf.nn.top_k's gradient method.
  4. Deprecated APIs. Some methods/function arguments are deprecated as of tf v1.9. I've replaced them with the recommended substitutions, so that no warnings will be shown during startup.

Copy link
Contributor

@dm-jrae dm-jrae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the modernization! Some small comments that should be easy to implement :)

dnc/access.py Outdated
reset_weights = tf.expand_dims(reset_weights, 2)
weighted_resets = expand_address * reset_weights
reset_gate = tf.reduce_prod(1 - weighted_resets, [1])
# back prob of tf.reduce_prod runs in CPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the commented lines? (55 & 56)

dnc/access.py Outdated

import addressing
import util
from dnc import addressing, util
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but we always place individual imports on a new line, would you be able to replace these with

from dnc import addressing
from dnc import util

"""
with tf.name_scope('link'):
batch_size = prev_link.get_shape()[0].value
# batch_size = prev_link.get_shape()[0].value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the commented line?

# Calculate the aggregated effect of all write heads
write_weights = 1 - tf.reduce_prod(1 - write_weights, [1])
# back prob of reduce_prod runs in CPU
# write_weights = 1 - tf.reduce_prod(1 - write_weights, [1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the commented line here too, please?

free_read_weights = free_gate * read_weights
phi = tf.reduce_prod(1 - free_read_weights, [1], name='phi')
# back prob of reduce_prod runs in CPU
# phi = tf.reduce_prod(1 - free_read_weights, [1], name='phi')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ and here

@@ -0,0 +1,82 @@
# Copyright 2017 Google Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g4 mv hasn't quite worked with util.py, please fix :-)

dnc/util.py Outdated
size = tf.cast(tf.shape(perm)[0], tf.float32)
delta = tf.cast(tf.shape(perm)[-1], tf.float32)
rg = tf.range(0, size*delta, delta, dtype=tf.float32)
rg = tf.reshape(rg, [-1, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think tf.expand_dims(rg, 1) is a bit clearer here, as you're specifically trying to expand out a dimension.

dnc/util.py Outdated
def batch_invert_permutation(permutations):
"""Returns batched `tf.invert_permutation` for every row in `permutations`."""
with tf.name_scope('batch_invert_permutation', values=[permutations]):
# unpacked = tf.unstack(permutations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove these commented lines?

dnc/util.py Outdated
# return tf.stack(result)

# fix unknown shape issue when using tf.unstack
idxf = tf.expand_dims(tf.cast(indices, tf.float32), -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why you cast idxf to a float32, and instantiate rg as float32 only to move everything back to int32. May as well keep everything as int32s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of explanation. Some tensorflow issues showed that certain operations does not support running on GPU for int64 type. see #13164 and #13163 and such... I went straight to the type casting workaround without giving too much thought and experiment at the time. I will discard these casting as soon as I can confirm tensorflow v1.9 has fixed those

dnc/util.py Outdated
def batch_gather(values, indices):
"""Returns batched `tf.gather` for every row in the input."""
with tf.name_scope('batch_gather', values=[values, indices]):
# unpacked = zip(tf.unstack(values), tf.unstack(indices))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove these commented lines?

Copy link
Contributor

@dm-jrae dm-jrae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, a few more style nitpicks. Can you confirm that the training script runs without issue?

dnc/util.py Outdated
return result

def reduce_prod(x, axis, name=None):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nit, can you format the docstring to coincide with the rest of the package's docstring style.

"""Efficient reduce product over axis.

Uses tf.cumprod and tf.gather_nd as a workaround to the poor performance of calculating tf.reduce_prod's gradient on CPU.
"""

dnc/util.py Outdated
Uses tf.cumprod and tf.gather_nd as a workaround to the poor performance of calculating tf.reduce_prod's gradient
on CPU.
'''
with tf.variable_scope(name or "c_reduce_prod"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you amend this variable scope to a name scope? Preferably replace with this line:

with tf.name_scope(name, 'util_reduce_prod', values=[x]):

dnc/util.py Outdated
on CPU.
'''
with tf.variable_scope(name or "c_reduce_prod"):
cp=tf.cumprod(x, axis, reverse=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add spaces around your equals assignment?

size = tf.shape(cp)[0]
idx1 = ... etc.

You do not need to do this for keyword args however:

dtype=tf.float32 is fine!

dnc/util.py Outdated
dim = int(perm.get_shape()[-1])
size = tf.cast(tf.shape(perm)[0], tf.float32)
delta = tf.cast(tf.shape(perm)[-1], tf.float32)
rg = tf.range(0, size*delta, delta, dtype=tf.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you place a space around this multiply operator?

size * delta

@carusyte carusyte force-pushed the fix_unstack_issue branch from 446652e to 3e116d3 Compare August 6, 2018 11:30
@carusyte
Copy link
Contributor Author

carusyte commented Aug 6, 2018

Sure. Please be patient, don't merge as I'm working in progress, I'm validating my modification in google cloud platform so I'll have to push many unfinished changes to Github server in order to sync to my cloud compute engine... I'll post a notification in the PR when I'm done. Thanks~

May be I should have created this PR using tag rather than a branch...

@carusyte carusyte closed this Aug 6, 2018
@carusyte carusyte force-pushed the fix_unstack_issue branch from 0649cc2 to 8598342 Compare August 6, 2018 12:24
@carusyte
Copy link
Contributor Author

carusyte commented Aug 6, 2018

Seems I've messed up the PR when I was trying to squash all the minor commits...
Let me create a new one, sorry for all the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants