For continuum random variable $X$, the differential entropy id defined as
$$\int P(x)\log\dfrac{1}{P(x)}\,\text{d}x,\tag{1}$$
where $P(x)$ is the distribution for $X$.
function $-\log v$ is strictly convex, let $\phi$ be its conjugate dual function, then $-\log v = \underset{u\in\mathbb{R}}{\text{sup}}\left[uv - \phi(u)\right]$, $\phi(u) = -1 - \log(-u)$ for $u<0$ and $+\infty$ otherwise.
using above, we have
$$\begin{aligned} \int P(x)\log\frac{1}{P(x)}\,\text{d}x &= - \int P(x)\left(- \log\frac{1}{P(x)}\right)\,\text{d}x \\&= -\int P(x) \underset{f}{\text{sup}}\left[ f(x)\frac{1}{P(x)} - \phi(f) \right]\,\text{d}x \\&= -\underset{f}{\text{sup}}\left[ \int f(x)\,\text{d}x - \int \phi(f)P(x)\,\text{d}x\right] \\&= -\underset{f}{\text{sup}}\left[ \int f(x)\,\text{d}x - \mathbb{E}_{P}[\phi(f)]\right]\end{aligned}$$
where the supremum are taken over all measurable functions $f:\mathcal{X} \rightarrow \mathbb{R}$ ($\mathcal{X}$ being the sample space). We also see for any function class $\mathbf{F}$ that maps from $\mathcal{X}$ to $\mathbb{R}$, it holds
$$\int P(x)\log\dfrac{1}{P(x)}\,\text{d}x \le -\underset{f(x)\in\mathbf{F}}{\text{sup}}\left[ \int f(x)\,\text{d}x - \mathbb{E}_{P}[\phi(f)]\right]. $$
approximated with empirical mean, we have the following expression
$$ -\underset{f(x)\in\mathbf{F}}{\text{sup}}\left[ \int f(x)\,\text{d}x - \frac{1}{N}\sum_i^N[\phi(g(x_i))]\right]$$
where $x_i$ are samples from distribution $P(x)$.
specifically, for the differential entropy case,
$$ -\underset{g>0}{\text{sup}}\left[\frac{1}{N}\sum_i^N \log g(x_i) -\int g(x)\,\text{d}x + 1\right], $$
where we substitute $f = -g$.
we may further restrict the function space to be a Reproducing Kernel Hilbert Space (RKHS) induced by Gaussian kernel.
let $\mathcal{G}$ refer to the RKHS we use
in RKHS, there are feature maps $ \psi_x $, any function $ g $ in $ \mathcal{G} $ can be cast into x-representation by $ g(x) = \langle g, \psi_x\rangle $
for the kernel $K$, one has $K(x, y) = \langle \psi_x, \psi_y\rangle$
the Gaussian kernel:
$$ K(x,y) = e^{-|x-y|^2/\sigma}. $$
to seek an optimal solution $\hat{g}$ for the differential entropy, using eq.(1) and the property of RKHS, we need to solve
$$\hat{g}=\underset{g\in\mathcal{G}}{\text{arg}\min} \int g(x)\,\text{d}x - \frac{1}{N}\sum_i^N\log\langle g,\psi(x_i) \rangle + \frac{\lambda_N}{2}\mathbf{I}^2(g)$$
$\frac{\lambda_N}{2}\mathbf{I}^2(g)$ is the regularization term that further reduces the searching space in $\mathcal{G}$.
One may choose $\mathbf{I}^2(g)=|g|^2_{\mathcal{G}}$.
what is the new form after applying the inf-convolution theorem?
need $ \int g(x)\,\text{d}x = \int \langle g, \psi_x\rangle\,\text{d}x $ ?