最大熵原理视角下的概率密度函数
最大熵原理
最大熵原理(Maximum Entropy Principle, 以下简称MEP):
在给定约束条件下,一个随机变量的概率分布应该使得熵达到最大值。或者说,满足给定约束条件的所有概率分布中,熵最大的概率分布是最能代表当前系统的概率分布。——来自wikipedia。
条件假设
假设已经知道关于随机变量X的潜在概率分布的均值\mu和方差\nu,\rho(x)是定义在\mathbb{R}上的概率密度函数,则\rho(x)的信息熵定义如下:
s(x) = E[-\ln[\rho(x)]] = \int_{-\infty}^{\infty} -\rho(x)\ln[\rho(x)] \, \mathrm{d}x
由于\rho(x)是概率密度函数,因此有:
\int_{-\infty}^{\infty} \rho(x) \, \mathrm{d}x = 1
此外,其受到期望:
E(X)= \int_{-\infty}^{\infty} x\rho(x) \, \mathrm{d}x = \mu
和方差:
\mathrm{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^{2}\rho(x) \, \mathrm{d}x = \nu
的约束
有约束条件的优化
根据拉格朗日乘数法,有:
\begin{aligned}
L(\rho, \alpha, \beta, \gamma)
= &-\int_{-\infty}^{\infty} \rho(x)\ln[\rho(x)] \, \mathrm{d}x + \alpha\left[ \int_{-\infty}^{\infty} \rho(x) \, \mathrm{d}x -1 \right] \\&+ \beta \left[ \int_{-\infty}^{\infty} x\rho(x) \, \mathrm{d}x -\mu \right] + \gamma \left[ \int_{-\infty}^{\infty} (x - \mu)^{2}\rho(x) \, \mathrm{d}x - \nu \right]
\end{aligned}
可以使用函数微分来最大化拉格朗日函数 L(\rho, \alpha, \beta, \gamma). 令\delta \rho是概率密度函数的足够小的改变,则对于第一项有:
\begin{aligned}
(\rho + \delta \rho)\ln(\rho + \delta \rho) &= (\rho + \delta \rho)\ln\left[ \rho \left( 1 + \frac{\delta \rho}{\rho} \right) \right] \\
&= (\rho + \delta \rho) \ln\rho + (\rho + \delta \rho)\ln\left( 1 + \frac{\delta \rho}{\rho} \right)
\end{aligned}
根据在x = 0处函数\mathrm{ln}(1 + x)的泰勒展开式,有:
\begin{aligned}
\ln\left( 1 + \frac{\delta \rho}{\rho} \right) & = \frac{\delta \rho}{\rho} - \frac{1}{2}\left( \frac{\delta \rho}{\rho} \right)^{2}
\end{aligned}
代入可得:
\begin{aligned}
(\rho + \delta \rho)\ln(\rho + \delta \rho) &\approx (\rho + \delta \rho) \ln\rho + (\rho + \delta \rho)\left[ \frac{\delta \rho}{\rho} - \frac{1}{2}\frac{(\delta \rho)^{2}}{\rho ^{2}} \right] \\
&\approx (\rho + \delta \rho)\ln\rho + \delta \rho + \frac{(\delta \rho)^{2}}{\rho} - \frac{1}{2}\frac{(\delta \rho)^{2}}{\rho}- \frac{1}{2} \frac{(\delta \rho)^{3}}{\rho ^{2}} \\
&\approx \rho \ln\rho + \delta \rho\ln\rho + \delta \rho
\end{aligned}
故可以推导出如下式:
\begin{aligned}
L(\rho + \delta \rho) - L(\rho) = \int_{-\infty}^{\infty} \delta \rho \left( \ln\rho + 1 + \alpha + \beta x + \gamma (x - \mu)^{2} \right) \, \mathrm{d}x
\end{aligned}
由于\delta \rho \neq 0 且 \rho > 0,我们有:
\begin{aligned}
&\ln\rho + 1 + \alpha + \beta x + \gamma (x - \mu)^{2} = 0 \\
\implies &\ln \rho = -(1 + \alpha + \beta x+\gamma(x - \mu)^{2}) \\
\implies &\rho = \exp(-1-\alpha-\beta x-\gamma(x-\mu)^{2})
\end{aligned}
进一步简化\rho, 有:
\begin{aligned}
\rho &= C_{1}\exp(-\alpha-\beta x-\gamma(x-\mu)^{2}) \\
&= C_{2} \exp(-\beta x-\gamma(x-\mu)^{2}) \\
&= C_3 \exp\left( -\gamma\left( x - \mu+\frac{\beta}{\gamma} \right)^{2} \right)
\end{aligned}
如果知道{C}_{3}, \beta和\gamma,就可以得到概率密度函数\rho(x). 根据高斯积分,有:
\int_{-\infty}^{\infty} \rho(x) \, \mathrm{d}x =1 \implies C_{3} \int_{-\infty}^{\infty} \exp\left( -\gamma\left( x - \mu+\frac{\beta}{\gamma} \right)^{2} \right) \, \mathrm{d}x = 1
在标准的高斯积分形式中a = \gamma, b= -\mu + \frac{\beta}{\gamma}. 因此,可以推导出如下所示:
\begin{aligned}
{C}_{3} &= \frac{1}{\sqrt{ \frac{\pi}{\gamma} }} \\
&= \sqrt{ \frac{\gamma}{\pi} }
\end{aligned}
类似地,根据第二个约束条件,有:
\begin{aligned}
\int_{-\infty}^{\infty} x\rho(x) \, \mathrm{d}x = \mu \iff &\sqrt{ \frac{\gamma}{\pi} } \int_{-\infty}^{\infty} x \exp\left( -\gamma \left( x- \mu + \frac{\beta}{\gamma} \right)^{2} \right) \, \mathrm{d}x = \mu
\end{aligned}
进一步,有:
\begin{aligned}
&\sqrt{ \frac{\gamma}{\pi} } \int_{-\infty}^{\infty} x \exp\left( -\gamma \left( x- \mu + \frac{\beta}{\gamma} \right)^{2} \right) \, \mathrm{d}x \\
= & \sqrt{ \frac{\gamma}{\pi} }\left[ \int_{-\infty}^{\infty} \left( x - \mu + \frac{\beta}{\gamma} \right)\exp\left( -\gamma \left( x- \mu + \frac{\beta}{\gamma} \right)^{2} \right)\, \mathrm{d}x + \int_{-\infty}^{\infty} \left( \mu - \frac{\beta}{\gamma} \right) \exp\left( -\gamma \left( x- \mu + \frac{\beta}{\gamma} \right)^{2} \right) \, \mathrm{d}x \right] \\
=& \sqrt{ \frac{\gamma}{\pi} }\left[ \int_{-\infty}^{\infty} t\exp(-\gamma t^{2}) \, \mathrm{d}t + \left( \mu-\frac{\beta}{\gamma} \right) \int_{-\infty}^{\infty} \exp\left( -\gamma \left( x- \mu + \frac{\beta}{\gamma} \right)^{2} \right) \, \mathrm{d}x \right] \\
=& \sqrt{ \frac{\gamma}{\pi} }\left[ -\frac{1}{2\gamma} \cdot \int_{-\infty}^{\infty} \exp(-\gamma t^{2}) \, \mathrm{d}(-\gamma t^{2}) + \left( \mu - \frac{\beta}{\gamma} \right) \sqrt{ \frac{\pi}{\gamma} }\right] \\
=& \mu - \frac{\beta}{\gamma}
\end{aligned}
因此\beta = 0,进而有下式
\rho(x) = \sqrt{ \frac{\gamma}{\pi} }\exp(-\gamma(x-\mu)^{2})
根据最后的约束条件,可以推出:
\begin{aligned}
&\int_{-\infty}^{\infty} (x - \mu)^{2}\rho(x) \, \mathrm{d}x = \nu \\
\implies & \sqrt{ \frac{\gamma}{\pi} } \int_{-\infty}^{\infty} (x - \mu)^{2}\exp(-\gamma(x - \mu)^{2}) \, \mathrm{d}x = \nu \\
\implies & - \frac{1}{2\gamma}\sqrt{ \frac{\gamma}{\pi} } \int_{-\infty}^{\infty} (x-\mu)^{2} \, \mathrm{d}(\exp(-\gamma(x - \mu)^{2})) = \nu
\end{aligned}
对于 \int_{-\infty}^{\infty} (x-\mu)^{2} \, \mathrm{d}(\exp(-\gamma(x - \mu)^{2})) , 有:
\begin{aligned}
&\int_{-\infty}^{\infty} (x-\mu)^{2} \, \mathrm{d}(\exp(-\gamma(x - \mu)^{2})) \\= &(x - \mu)^{2}\exp(-\gamma(x - \mu)^{2})\bigg|_{-\infty}^{\infty} - 2\gamma\int_{-\infty}^{\infty} (x - \mu) \exp(-\gamma(x - \mu)^{2}) \, \mathrm{d}x \\
=& \int_{-\infty}^{\infty} (x - \mu) \, \mathrm{d}(\exp(-\gamma(x - \mu)^{2})) \\
=& - \int_{-\infty}^{\infty} \exp(-\gamma(x-\mu)^{2}) \, \mathrm{d}x \\
=& - \sqrt{ \frac{\pi}{\gamma} }
\end{aligned}
因此,有:
\begin{aligned}
&\frac{1}{2\gamma} \cdot \sqrt{ \frac{\gamma}{\pi} } \cdot \sqrt{ \frac{\pi}{\gamma} } = \nu \\
\implies & \frac{1}{2\gamma} = \nu \\
\implies &\gamma = \frac{1}{2\nu}
\end{aligned}
最后,可获得\rho(x)最终的表达式如下:
\begin{aligned}
\rho(x) &= \sqrt{ \frac{\gamma}{\pi} }\exp(-\gamma(x-\mu)^{2}) \\
&= \frac{1}{\sqrt{ 2\pi \nu }} \exp\left( -\frac{1}{2\nu}(x - \mu)^{2}\right) \\
&= \frac{1}{\sigma\sqrt{ 2\pi }} \exp\left( -\frac{1}{2\sigma ^{2}}(x - \mu)^{2} \right)
\end{aligned}
总结
在所有具有给定均值和方差的连续分布中,高斯分布的熵最大。最大熵原理意味着在这些约束条件下,高斯分布是最“无信息”或“随机”的分布,当对除均值和方差之外的误差的具体性质知之甚少时,高斯分布是一种自然选择。
测量误差通常被认为是各种小的、独立的误差源的总和。根据中心极限定理,我们知道误差往往呈高斯分布。