Random notes mostly on Machine Learning

Not every REINFORCE should be called Reinforcement Learning

Deep RL is hot these days. It’s one of the most popular topics in the submissions at NeurIPS / ICLR / ICML and other ML conferences. And while the definition of RL is pretty general, in this note I’d argue that the famous REINFORCE algorithm alone is not enough to label your method as a Reinforcement Learning one.

A simpler derivation of f-GANs

I have been looking at \(f\)-GANs derivation doing some of my research, and found an easier way to derive its lower bound, without invoking convex conjugate functions.

Thoughts on Mutual Information: Alternative Dependency Measures

This posts finishes the discussion started in the Thoughts on Mutual Information: More Estimators with a consideration of alternatives to the Mutual Information.

Thoughts on Mutual Information: Formal Limitations

This posts continues the discussion started in the Thoughts on Mutual Information: More Estimators. This time we’ll focus on drawbacks and limitations of these bounds.

Thoughts on Mutual Information: More Estimators

In this post I’d like to show how Self-Normalized Importance Sampling (IWHVI and IWAE) and Annealed Importance Sampling can be used to give (sometimes sandwich) bounds on the MI in many different cases.

Importance Weighted Hierarchical Variational Inference

This post finishes the discussion on Neural Samplers for Variational Inference by introducing some recent results (including mine).

Also, there’s a talk recording of me presenting this post’s content, so if you like videos more than texts, check it out.

Neural Samplers and Hierarchical Variational Inference

This post sets background for the upcoming post on my work on more efficient use of neural samplers for Variational Inference.

Stochastic Computation Graphs: Fixing REINFORCE

This is the final post of the stochastic computation graphs series. Last time we discussed models with discrete relaxations of stochastic nodes, which allowed us to employ the power of reparametrization.

These methods, however, posses one flaw: they consider different models, thus introducing inherent bias – your test time discrete model will be doing something different from what your training time model did. Therefore in this post we’ll get back to the REINFORCE aka Score Function estimator, and see if we can fix its problems.

Stochastic Computation Graphs: Discrete Relaxations

This is the second post of the stochastic computation graphs series. Last time we discussed models with continuous stochastic nodes, for which there are powerful reparametrization technics.

Unfortunately, these methods don’t work for discrete random variables. Moreover, it looks like there’s no way to backpropagate through discrete stochastic nodes, as there’s no infinitesimal change of random values when you infinitesimally perturb their parameters.

In this post I’ll talk about continuous relaxations of discrete random variables.

Stochastic Computation Graphs: Continuous Case

Last year I covered some modern Variational Inference theory. These methods are often used in conjunction with Deep Neural Networks to form deep generative models (VAE, for example) or to enrich deterministic models with stochastic control, which leads to better exploration. Or you might be interested in amortized inference.

All these cases turn your computation graph into a stochastic one – previously deterministic nodes now become random. And it’s not obvious how to do backpropagation through these nodes. In this series I’d like to outline possible approaches. This time we’re going to see why general approach works poorly, and see what we can do in a continuous case.