Proof check of CPC paper
Note
With high probability, I’m missing some important statement. If you find some mistake in this post, I appreciate you for letting me know.
Intro
CPC with infoNCE1 is one of the most powerful unsupervised representation learning algorithms in the last few years. When I read this paper carefully, I notice some minor points, so let me write here.
Eq. 10
Appendix A.1 proves the optimal infoNCE’s loss is an upper bound of negative mutual information \(- I(x_, c)\) and \(\ln N\), where \(N = \# \text{negative samples} + 1\). However, Eq. 10 in the paper does not hold always.
Let’s start from Eq. 9 in the paper:
\[\mathbb{E}_{X} \ln \left[ 1 + \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right],\]where \(X\) is a distribution over one sample and \(N-1\) negative samples.
As know you, \(\ln\) is a monotonically increase function, so if \(1 - \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t)} \geq 0\), then Eq. 10 in the paper is derived. But \(\frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t)}\) is density ratio that can be bigger than \(1\). Thus we cannot derive Eq. 10 from Eq. 9.
Fortunatelly, we can still obtain almost same bound:
\[\begin{aligned} \mathbb{E}_{X} \ln \left[ 1 + \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right] &\geq \mathbb{E}_{X} \ln \left[ \frac{p(x_{t+k})}{ p(x_{t+k } \mid c_t) } (N-1) \right] \\ &= - I(x_{t+k}, c_k) + \ln (N - 1). \end{aligned}\]Eq. 15
Eq. 15 states InfoNCE is a lower bound of MINE2 that is also lower bound of mutual information. But infoNCE may not be a lower bound of MINE. In Definition 3.1 in the MINE’s paper, MINE is defined by:
\[\sup_{\theta \in \Theta} \mathbb{E}_{p(x, c)} [T_\theta] - \ln \mathbb{E}_{p(x), p(c)} [\exp(T_\theta)].\]But, in the second term of Eq. 15 in CPC paper, \(\ln\) is between two expectations. Even if we use Jensen’s inequality, the result is not equivalent to MINE.