average surprise about samples from the distribution; quantifies the “informativeness” of a distribution. A smaller entropy means is more concentrated and a large entropy indicates a diffuse distribution (we are less certain that the samples )