# B.log

## Random notes mostly on Machine Learning

### Importance Weighted Hierarchical Variational Inference

This post finishes the discussion on Neural Samplers for Variational Inference by introducing some recent results (including mine).

Also, there’s a talk recording of me presenting this post’s content, so if you like videos more than texts, check it out.

### Neural Samplers and Hierarchical Variational Inference

This post sets background for the upcoming post on my work on more efficient use of neural samplers for Variational Inference.

### Stochastic Computation Graphs: Fixing REINFORCE

This is the final post of the stochastic computation graphs series. Last time we discussed models with discrete relaxations of stochastic nodes, which allowed us to employ the power of reparametrization.

These methods, however, posses one flaw: they consider different models, thus introducing inherent bias – your test time discrete model will be doing something different from what your training time model did. Therefore in this post we’ll get back to the REINFORCE aka Score Function estimator, and see if we can fix its problems.

### Stochastic Computation Graphs: Discrete Relaxations

This is the second post of the stochastic computation graphs series. Last time we discussed models with continuous stochastic nodes, for which there are powerful reparametrization technics.

Unfortunately, these methods don’t work for discrete random variables. Moreover, it looks like there’s no way to backpropagate through discrete stochastic nodes, as there’s no infinitesimal change of random values when you infinitesimally perturb their parameters.

In this post I’ll talk about continuous relaxations of discrete random variables.

### Stochastic Computation Graphs: Continuous Case

Last year I covered some modern Variational Inference theory. These methods are often used in conjunction with Deep Neural Networks to form deep generative models (VAE, for example) or to enrich deterministic models with stochastic control, which leads to better exploration. Or you might be interested in amortized inference.

All these cases turn your computation graph into a stochastic one – previously deterministic nodes now become random. And it’s not obvious how to do backpropagation through these nodes. In this series I’d like to outline possible approaches. This time we’re going to see why general approach works poorly, and see what we can do in a continuous case.

### ICML 2017 Summaries

Just like with NIPS last year, here’s a list of ICML’17 summaries (updated as I stumble upon new ones)

### On No Free Lunch Theorem and some other impossibility results

The more I talk to people online, the more I hear about the famous No Free Lunch Theorem (NFL theorem). Unfortunately, quite often people don’t really understand what the theorem is about, and what its implications are. In this post I’d like to share my view on the NFL theorem, and some other impossibility results.

### Matrix and Vector Calculus via Differentials

Many tasks of machine learning can be posed as optimization problems. One comes up with a parametric model, defines a loss function, and then minimizes it in order to learn optimal parameters. One very powerful tool of optimization theory is the use of smooth (differentiable) functions: those that can be locally approximated with a linear functions. We all surely know how to differentiate a function, but often it’s more convenient to perform all the derivations in matrix form, since many computational packages like numpy or matlab are optimized for vectorized expressions.

In this post I want to outline the general idea of how one can calculate derivatives in vector and matrix spaces (but the idea is general enough to be applied to other algebraic structures).

### NIPS 2016 Summaries

I did not attend this year’s NIPS, but I’ve gathered many summaries published online by those who did attend the conference.

### Neural Variational Inference: Importance Weighted Autoencoders

Previously we covered Variational Autoencoders (VAE) — popular inference tool based on neural networks. In this post we’ll consider, a followup work from Torronto by Y. Burda, R. Grosse and R. Salakhutdinov, Importance Weighted Autoencoders (IWAE). The crucial contribution of this work is introduction of a new lower-bound on the marginal log-likelihood $$\log p(x)$$ which generalizes ELBO, but also allows one to use less accurate approximate posteriors $$q(z \mid x, \Lambda)$$.

On a dessert we’ll discuss another paper, Variational inference for Monte Carlo objectives by A. Mnih and D. Rezende which aims to broaden the applicability of this approach to models where reparametrization trick can not be used (e.g. for discrete variables).