Bregman divergence is a general class to measure “difference” between two data points. For instance, Euclidean distance and Kullback Leibler (KL) divergence are instances of Bregman divergence. In this post, I derive KL divergence from Bregman divergence formulation (for myself).

Bregman divergence is defined by the equation below:

where $\langle \cdot, \cdot \rangle$ means inner product. When $F(p) = \sum_i p_i \log(p_i)$, this Bregman divergence is equivalent to KL divergence.

From Line 2 to Line 3, $\sum_i p_i = \sum_i q_i = 1$ since $p$ is a probability distribution.

### Reference

This blog post explains Bregman divergence in detail and gives useful links.