Proof check of CPC paper

Kento Nozawa
27 June 2020

Note

With high probability, I’m missing some important statement. If you find some mistake in this post, I appreciate you for letting me know.

Intro

CPC with infoNCE¹ is one of the most powerful unsupervised representation learning algorithms in the last few years. When I read this paper carefully, I notice some minor points, so let me write here.

Eq. 10

Appendix A.1 proves the optimal infoNCE’s loss is an upper bound of negative mutual information \(- I(x_, c)\) and \(\ln N\), where \(N = \# \text{negative samples} + 1\). However, Eq. 10 in the paper does not hold always.

Let’s start from Eq. 9 in the paper:

\[\mathbb{E}_{X} \ln \left[ 1 + \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right],\]

where \(X\) is a distribution over one sample and \(N-1\) negative samples.

As know you, \(\ln\) is a monotonically increase function, so if \(1 - \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t)} \geq 0\), then Eq. 10 in the paper is derived. But \(\frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t)}\) is density ratio that can be bigger than \(1\). Thus we cannot derive Eq. 10 from Eq. 9.

Fortunatelly, we can still obtain almost same bound:

\[\begin{aligned} \mathbb{E}_{X} \ln \left[ 1 + \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right] &\geq \mathbb{E}_{X} \ln \left[ \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right] \\ &= - I(x_{t+k}, c_k) + \ln (N - 1). \end{aligned}\]

Eq. 15

Eq. 15 states InfoNCE is a lower bound of MINE² that is also lower bound of mutual information. But infoNCE may not be a lower bound of MINE. In Definition 3.1 in the MINE’s paper, MINE is defined by:

\[\sup_{\theta \in \Theta} \mathbb{E}_{p(x, c)} [T_\theta] - \ln \mathbb{E}_{p(x), p(c)} [\exp(T_\theta)].\]

But, in the second term of Eq. 15 in CPC paper, \(\ln\) is between two expectations. Even if we use Jensen’s inequality, the result is not equivalent to MINE.

References

Aaron van den Oord, Yazhe Li, Oriol Vinyals. Representation Learning with Contrastive Predictive Coding. arXiv, 2019. ↩
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm. Mutual Information Neural Estimation. In ICML, 2019. ↩