Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rejection sampling variational inference #819

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
4efb780
fix typos in docstring
Jan 3, 2018
7e43d1b
add multinomial-dirichlet test, empty `RejectionSamplingKLqp` class
Jan 7, 2018
d673763
Merge branch 'master' into rejection-sampling-variational-inference
Jan 12, 2018
7a5f90e
remove `sample_shape=1`
Jan 12, 2018
94a1bc3
add poisson-gamma test
Jan 14, 2018
a4c87cc
WIP: begin to implement RSVI logic
Jan 15, 2018
163414c
WIP: implement RSVI gradients
Jan 15, 2018
f162135
add scrap notebook with gradient update algo
Jan 19, 2018
2f96076
unit test gradient update algo in notebook
Jan 20, 2018
2c1162b
unit test gradient update algo to 3 iterations
Jan 20, 2018
ad25f6d
`test_kucukelbir_grad` passes
Jan 20, 2018
7e4a9ce
correction: `test_kucukelbir_grad` passes
Jan 20, 2018
8dc4f4f
cleanup (still skeptical this test works, as it seems almost stochastic
Jan 20, 2018
0aae8ed
move `test_kucukelbir_grad` to separate file
Jan 20, 2018
70172fb
add `KucukelbirOptimizer`
Jan 20, 2018
929e25c
pass `n`, `s_n` into `KucukelbirOptimizer` constructor
Jan 20, 2018
95d9774
looking forward to seeing if this passes CI. locally, i have no idea …
Jan 20, 2018
c212858
slightly more confidence
Jan 20, 2018
81637fb
set trainable=False
Jan 20, 2018
7aec66c
initialize `n` to 0
Jan 21, 2018
dda7f26
assert in loop
Jan 21, 2018
2a4ccc8
add dummy parameter `global_step` for temporary compatibility
Jan 21, 2018
8f69548
add `KucukelbirOptimizer`
Jan 21, 2018
26f8ed8
2-space indent
Jan 21, 2018
c7f3ea1
use `KucukelbirOptimizer`
Jan 21, 2018
435ec01
cleanup
Jan 21, 2018
45b17b8
test `qalpha`, `qbeta` values
Jan 21, 2018
ed6e266
delete blank line
Jan 21, 2018
80cee16
add `GammaRejectionSampler`
Jan 23, 2018
ef45bc3
add `log_prob_s` to `GammaRejectionSampler`
Jan 23, 2018
b94ef73
add citation to docstring
Jan 23, 2018
a136f9d
add guts of RSVI, integrating w.r.t. z
Jan 23, 2018
680894b
parametrize sampler with density
Jan 24, 2018
47ba81c
pass density to rejection sampler; return gradients
Jan 24, 2018
26f0c32
dict_swap[z] comes from rejection sampler, not `qz`
Jan 24, 2018
7b997e1
delete gamma_rejection_sampler_vars
Jan 24, 2018
6108125
delete TODO
Jan 24, 2018
77e9a6c
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
3846fa6
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
23c33af
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
4c481a0
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
00c9325
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
40d3808
pep8
Jan 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion edward/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
HMC, MetropolisHastings, SGLD, SGHMC, \
KLpq, KLqp, ReparameterizationKLqp, ReparameterizationKLKLqp, \
ReparameterizationEntropyKLqp, ScoreKLqp, ScoreKLKLqp, ScoreEntropyKLqp, \
ScoreRBKLqp, WakeSleep, GANInference, BiGANInference, WGANInference, \
ScoreRBKLqp, RejectionSamplingKLqp, WakeSleep, GANInference, BiGANInference, WGANInference, \
ImplicitKLqp, MAP, Laplace, complete_conditional, Gibbs
from edward.models import RandomVariable
from edward.util import check_data, check_latent_vars, copy, dot, \
Expand Down Expand Up @@ -52,6 +52,7 @@
'ScoreKLKLqp',
'ScoreEntropyKLqp',
'ScoreRBKLqp',
'RejectionSamplingKLqp',
'WakeSleep',
'GANInference',
'BiGANInference',
Expand Down
1 change: 1 addition & 0 deletions edward/inferences/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
'ScoreKLKLqp',
'ScoreEntropyKLqp',
'ScoreRBKLqp',
'RejectionSamplingKLqp',
'Laplace',
'MAP',
'MetropolisHastings',
Expand Down
2 changes: 1 addition & 1 deletion edward/inferences/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,6 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs):
Passed into `initialize`.
"""
self.initialize(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add back newline? unrelated to PR


if variables is None:
init = tf.global_variables_initializer()
else:
Expand All @@ -144,6 +143,7 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs):

for _ in range(self.n_iter):
info_dict = self.update()
print(info_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm?

self.print_progress(info_dict)

self.finalize()
Expand Down
2 changes: 1 addition & 1 deletion edward/inferences/klpq.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class KLpq(VariationalInference):

with respect to $\\theta$.

In conditional inference, we infer $z` in $p(z, \\beta
In conditional inference, we infer $z$ in $p(z, \\beta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR. Can you make a new PR to fix this?

\mid x)$ while fixing inference over $\\beta$ using another
distribution $q(\\beta)$. During gradient calculation, instead
of using the model's density
Expand Down
160 changes: 160 additions & 0 deletions edward/inferences/klqp.py
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,63 @@ def build_loss_and_gradients(self, var_list):
return build_score_rb_loss_and_gradients(self, var_list)


# TODO: you can probably make another base class that implements a `sample` method?
class RejectionSamplingKLqp(VariationalInference):

"""
"""

def __init__(self, latent_vars=None, data=None, rejection_sampler_vars=None):
"""Create an inference algorithm.

# TODO: update me

Args:
latent_vars: list of RandomVariable or
dict of RandomVariable to RandomVariable.
Collection of random variables to perform inference on. If
list, each random variable will be implictly optimized using a
`Normal` random variable that is defined internally with a
free parameter per location and scale and is initialized using
standard normal draws. The random variables to approximate
must be continuous.
"""
if isinstance(latent_vars, list):
with tf.variable_scope(None, default_name="posterior"):
latent_vars_dict = {}
continuous = \
('01', 'nonnegative', 'simplex', 'real', 'multivariate_real')
for z in latent_vars:
if not hasattr(z, 'support') or z.support not in continuous:
raise AttributeError(
"Random variable {} is not continuous or a random "
"variable with supported continuous support.".format(z))
batch_event_shape = z.batch_shape.concatenate(z.event_shape)
loc = tf.Variable(tf.random_normal(batch_event_shape))
scale = tf.nn.softplus(
tf.Variable(tf.random_normal(batch_event_shape)))
latent_vars_dict[z] = Normal(loc=loc, scale=scale)
latent_vars = latent_vars_dict
del latent_vars_dict
super(RejectionSamplingKLqp, self).__init__(latent_vars, data)
self.rejection_sampler_vars = rejection_sampler_vars

def initialize(self, n_samples=1, *args, **kwargs):
"""Initialize inference algorithm. It initializes hyperparameters
and builds ops for the algorithm's computation graph.

Args:
n_samples: int, optional.
Number of samples from variational model for calculating
stochastic gradients.
"""
self.n_samples = n_samples
return super(RejectionSamplingKLqp, self).initialize(*args, **kwargs)

def build_loss_and_gradients(self, var_list):
return build_rejection_sampling_loss_and_gradients(self, var_list)


def build_reparam_loss_and_gradients(inference, var_list):
"""Build loss function. Its automatic differentiation
is a stochastic gradient of
Expand Down Expand Up @@ -1127,3 +1184,106 @@ def build_score_rb_loss_and_gradients(inference, var_list):
grads_vars.extend(model_vars)
grads_and_vars = list(zip(grads, grads_vars))
return loss, grads_and_vars


def build_rejection_sampling_loss_and_gradients(inference, var_list):
"""
"""
p_log_prob = [0.0] * inference.n_samples
q_log_prob = [0.0] * inference.n_samples
r_log_prob = [0.0] * inference.n_samples
base_scope = tf.get_default_graph().unique_name("inference") + '/'
for s in range(inference.n_samples):
# Form dictionary in order to replace conditioning on prior or
# observed variable with conditioning on a specific value.
scope = base_scope + tf.get_default_graph().unique_name("sample")
dict_swap = {}
for x, qx in six.iteritems(inference.data):
if isinstance(x, RandomVariable):
if isinstance(qx, RandomVariable):
qx_copy = copy(qx, scope=scope)
dict_swap[x] = qx_copy.value()
else:
dict_swap[x] = qx

for z, qz in six.iteritems(inference.latent_vars):
# Copy q(z) to obtain new set of posterior samples.
qz_copy = copy(qz, scope=scope)

# Of course, this will evaluate to `True`. We just do this as a simple first pass.
if 'rsvi':
# --- RSVI

# Get variable shortnames
qz_class = qz.__class__
epsilon_likelihood = inference.rejection_sampler_vars[qz_class]['epsilon_likelihood']
reparam_func = inference.rejection_sampler_vars[qz_class]['reparam_func']
m = inference.rejection_sampler_vars[qz_class]['m']
alpha = qz.parameters['concentration']
beta = qz.parameters['rate']

# Sample

# TODO: pass in the real `qalpha` and `qbeta`
# TODO: pass in the real `qz`
epsilon = epsilon_likelihood.value()
sample = reparam_func(epsilon, alpha, beta)
eps_prob = epsilon_likelihood.prob(epsilon)
qz_prob = qz.prob(sample)
random_uniform = tf.random_uniform([])

# We need this line. However, let's just accept for now.
# if random_uniform * m * eps_prob <= qz_prob:

# RSVI ---
else:
z = qz_copy.value()

dict_swap[z] = sample

q_log_prob[s] += tf.reduce_sum(
inference.scale.get(z, 1.0) * qz_copy.log_prob(dict_swap[z]))
r_log_prob[s] += tf.reduce_sum(
inference.scale.get(z, 1.0) * epsilon_likelihood.log_prob(dict_swap[z]))

for z in six.iterkeys(inference.latent_vars):
z_copy = copy(z, dict_swap, scope=scope)
p_log_prob[s] += tf.reduce_sum(
inference.scale.get(z, 1.0) * z_copy.log_prob(dict_swap[z]))

for x in six.iterkeys(inference.data):
if isinstance(x, RandomVariable):
x_copy = copy(x, dict_swap, scope=scope)
p_log_prob[s] += tf.reduce_sum(
inference.scale.get(x, 1.0) * x_copy.log_prob(dict_swap[x]))

p_log_prob = tf.reduce_mean(p_log_prob)
q_log_prob = tf.reduce_mean(q_log_prob)
r_log_prob = tf.reduce_mean(r_log_prob)

q_entropy = tf.reduce_sum([
tf.reduce_sum(qz.entropy())
for z, qz in six.iteritems(inference.latent_vars)])

reg_penalty = tf.reduce_sum(tf.losses.get_regularization_losses())

if inference.logging:
tf.summary.scalar("loss/p_log_prob", p_log_prob,
collections=[inference._summary_key])
tf.summary.scalar("loss/q_entropy", q_entropy,
collections=[inference._summary_key])
tf.summary.scalar("loss/reg_penalty", reg_penalty,
collections=[inference._summary_key])

loss = -(p_log_prob + q_entropy - reg_penalty)

# RSVI gradient components
model_grad = tf.gradients(p_log_prob, sample)[0]
q_entropy_grad = tf.gradients(q_entropy, var_list)
g_rep = [model_grad * grad for grad in tf.gradients(sample, var_list)]
g_cor = [p_log_prob * grad for grad in tf.gradients(q_log_prob - r_log_prob, var_list)]
grad_summands = zip(*[g_rep, g_cor, q_entropy_grad])

grads = [tf.reduce_sum(summand) for summand in grad_summands]
grads_and_vars = list(zip(grads, var_list))
return loss, grads_and_vars
18 changes: 16 additions & 2 deletions edward/inferences/variational_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from edward.inferences.inference import Inference
from edward.models import RandomVariable
from edward.optimizers import KucukelbirOptimizer
from edward.util import get_session, get_variables


Expand Down Expand Up @@ -67,6 +68,8 @@ def initialize(self, optimizer=None, var_list=None, use_prettytensor=False,

self.loss, grads_and_vars = self.build_loss_and_gradients(var_list)

self.grads_and_vars = grads_and_vars

if self.logging:
tf.summary.scalar("loss", self.loss, collections=[self._summary_key])
for grad, var in grads_and_vars:
Expand Down Expand Up @@ -110,6 +113,14 @@ def initialize(self, optimizer=None, var_list=None, use_prettytensor=False,
optimizer = tf.train.FtrlOptimizer(learning_rate)
elif optimizer == 'rmsprop':
optimizer = tf.train.RMSPropOptimizer(learning_rate)
elif optimizer == 'kucukelbir':
optimizer = KucukelbirOptimizer(
t=0.1,
delta=10e-3,
eta=1e-1,
s_n=tf.Variable([0., 0.], trainable=False),
n=tf.Variable(0., trainable=False)
)
else:
raise ValueError('Optimizer class not found:', optimizer)
elif not isinstance(optimizer, tf.train.Optimizer):
Expand Down Expand Up @@ -151,7 +162,10 @@ def update(self, feed_dict=None):
feed_dict[key] = value

sess = get_session()
_, t, loss = sess.run([self.train, self.increment_t, self.loss], feed_dict)
# _, t, loss = sess.run([self.train, self.increment_t, self.loss], feed_dict)
# TODO: delete me
_, t, loss, grads_and_vars_debug = sess.run([self.train, self.increment_t, self.loss, self.grads_and_vars], feed_dict)


if self.debug:
sess.run(self.op_check, feed_dict)
Expand All @@ -161,7 +175,7 @@ def update(self, feed_dict=None):
summary = sess.run(self.summarize, feed_dict)
self.train_writer.add_summary(summary, t)

return {'t': t, 'loss': loss}
return {'t': t, 'loss': loss, 'grads_and_vars_debug': grads_and_vars_debug}

def print_progress(self, info_dict):
"""Print progress to output.
Expand Down
15 changes: 15 additions & 0 deletions edward/optimizers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from edward.optimizers.sgd import *

from tensorflow.python.util.all_util import remove_undocumented

_allowed_symbols = [
'KucukelbirOptimizer',
]

remove_undocumented(__name__, allowed_exception_list=_allowed_symbols)
35 changes: 35 additions & 0 deletions edward/optimizers/sgd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf


class KucukelbirOptimizer:

"""
Used for RSVI (Rejection-Sampling Variational Inference).

# TODO: add me
"""

def __init__(self, t, delta, eta, s_n, n):
self.t = t
self.delta = delta
self.eta = eta
self.s_n = s_n
self.n = n

def apply_gradients(self, grads_and_vars, global_step=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustinvtran

I'd quite appreciate if you could glance at this method as well, as my integration test passes on some days and fails on others — with 0 changes to my code. Promise 🤞.

self.n = tf.assign_add(self.n, 1.)
ops = []
for i, (grad, var) in enumerate(grads_and_vars):
updated_s_n = self.s_n[i].assign( (self.t * grad**2) + (1 - self.t) * self.s_n[i] )

p_n_first = self.eta * self.n**(-.5 + self.delta)
p_n_second = (1 + tf.sqrt(updated_s_n[i]))**(-1)
p_n = p_n_first * p_n_second

updated_var = var.assign_add(-p_n * grad)
ops.append(updated_var)
return ops
Loading