#math

Response of Loss Functions to Label Noise (draft)

August 13, 2019 · Xingyu Li · Noise and Generalization · math information theory

Distortion on dataset distribution by nosie labels Let $t$ and $t^*$ be the correct target label and the noisy label, respectively. Presently, we focus on binary classification and assume the label noise only depends on the correct label, i.e. the error rates are: $$e _- = P(t^* = +1 | t= -1) \quad \text{and} \quad e _+ = P(t^*=-1|t=+1).$$ The noisy dataset defines the joint distribution $P(x, t^*)$, which relates to the clean distribution $P(x, t)$ through

Read More

Bayes Classifier and MI Classifier With Noisy Labels

August 8, 2019 · Xingyu Li · Noise and Generalization · review math information theory

Definition and Relation This section is a partial review of Bao-Gang Hu’s paper. Bayes Classifier without rejection and its Decision Rule Bayes Classifier makes decisions based on a (often) subjectively designed risk function $\mathfrak{R}$: $$\mathfrak{R}(y_j | x) = \sum_i \lambda _{ij} P(x | t_i)P(t_i),$$ where $x \in \R^d$ is the input feature; $y_j$ and $t_i$ stand for the predicted class and the true class, respectively; $\lambda _{ij}$ is the cost when a true label $i$ is classified as $j$.

Read More

Neural Tangent Kernel and NN Dynamics (draft)

August 1, 2019 · Xingyu Li · Noise and Generalization · math neural network dynamics review

Discussion on Neural Tangent Kernel The Neural Tangent Kernel is introduced by Arthur Jacot et. al. to study the dynamics of a (S)GD-based learning model, say, the Neural Network. It is expected to enable one to “study the training of ANNs in the functional space $\mathcal{F}$, on which the cost $C$ is convex.” In the following, we will review the derivations of Arthur Jacot et. al. and argue that by definition the Neural Tangent Kernel only captures the first order dynamics of the Neural Networks.

Read More

Variational Form for H(x)

July 29, 2019 · Xingyu Li · DeepInfoFlow · information theory math variational form

For continuum random variable $X$, the differential entropy id defined as $$\int P(x)\log\dfrac{1}{P(x)}\,\text{d}x,\tag{1}$$ where $P(x)$ is the distribution for $X$. function $-\log v$ is strictly convex, let $\phi$ be its conjugate dual function, then $-\log v = \underset{u\in\mathbb{R}}{\text{sup}}\left[uv - \phi(u)\right]$, $\phi(u) = -1 - \log(-u)$ for $u<0$ and $+\infty$ otherwise. using above, we have $$\begin{aligned} \int P(x)\log\frac{1}{P(x)}\,\text{d}x &= - \int P(x)\left(- \log\frac{1}{P(x)}\right)\,\text{d}x \\&= -\int P(x) \underset{f}{\text{sup}}\left[ f(x)\frac{1}{P(x)} - \phi(f) \right]\,\text{d}x \\&= -\underset{f}{\text{sup}}\left[ \int f(x)\,\text{d}x - \int \phi(f)P(x)\,\text{d}x\right] \\&= -\underset{f}{\text{sup}}\left[ \int f(x)\,\text{d}x - \mathbb{E}_{P}[\phi(f)]\right]\end{aligned}$$

Read More