With high probability, I’m missing some important statement. If you find some mistake in this post, I appreciate you for letting me know.
CPC with infoNCE1 is one of the most powerful unsupervised representation learning algorithms in the last few years. When I read this paper carefully, I notice some minor points, so let me write here.
Appendix A.1 proves the optimal infoNCE’s loss is an upper bound of negative mutual information and , where . However, Eq. 10 in the paper does not hold always.
Let’s start from Eq. 9 in the paper:
where is a distribution over one sample and negative samples.
As know you, is a monotonically increase function, so if , then Eq. 10 in the paper is derived. But is density ratio that can be bigger than . Thus we cannot derive Eq. 10 from Eq. 9.
Fortunatelly, we can still obtain almost same bound:
Eq. 15 states infoNCE is a lower bound of MINE2 that is also lower bound of mutual information. But infoNCE may not be a lower bound of MINE. In Definition 3.1 in the MINE’s paper, MINE is defined by:
But, in the second term of Eq. 15 in CPC paper, is between two expectations. Even if we use Jensen’s inequality, the result is not equivalent to MINE.