With high probability, I’m missing some important statement. If you find some mistake in this post, I appreciate you for letting me know.


CPC with infoNCE1 is one of the most powerful unsupervised representation learning algorithms in the last few years. When I read this paper carefully, I notice some minor points, so let me write here.

Eq. 10

Appendix A.1 proves the optimal infoNCE’s loss is an upper bound of negative mutual information and , where . However, Eq. 10 in the paper does not hold always.

Let’s start from Eq. 9 in the paper:

where is a distribution over one sample and negative samples.

As know you, is a monotonically increase function, so if , then Eq. 10 in the paper is derived. But is density ratio that can be bigger than . Thus we cannot derive Eq. 10 from Eq. 9.

Fortunatelly, we can still obtain almost same bound:

Eq. 15

Eq. 15 states infoNCE is a lower bound of MINE2 that is also lower bound of mutual information. But infoNCE may not be a lower bound of MINE. In Definition 3.1 in the MINE’s paper, MINE is defined by:

But, in the second term of Eq. 15 in CPC paper, is between two expectations. Even if we use Jensen’s inequality, the result is not equivalent to MINE.


  1. Aaron van den Oord, Yazhe Li, Oriol Vinyals. Representation Learning with Contrastive Predictive Coding. arXiv, 2019. 

  2. Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm. Mutual Information Neural Estimation. In ICML, 2019.