Skip to content

Confusion of the effectiveness of IB #1

@Opdoop

Description

@Opdoop

From [1], we have:

$$\mathcal{L}=\mathcal{L}_{\mathrm{CE}}+\beta \cdot D_{\mathrm{KL}}[P(\mathbf{T} \mid \mathbf{X}) | Q(\mathbf{T})]$$

While comparing to conventional adversarial training, from [2], we have:
$$\min \left[\underset{x, y \sim p_{\mathcal{D}}}{\mathbb{E}}\left[\max _{\hat{x} \in \mathbb{B}(x)} \mathcal{L}(x, \hat{x}, y)\right]\right]$$

where above objective can be specified in semi-supervised fashion as:
$$-\log q\left(y \mid F_{s}(x)\right)+\beta \mathrm{KL}\left(q\left(\cdot \mid F_{s}(x)\right) | q\left(\cdot \mid F_{s}(\hat{x})\right)\right)$$

IB principle and adversarial training(AT) both introduce a regularization term to smooth the landscape of model. The only difference is distribution term used KL distance, variational IB uses Gaussian while AT uses adversarial examples.

Adversarial examples can be viewed as a special out-of-distribution. In this view, compare with IB, AT should be a tighter bound for OOD optimization. But from your experiment results, IB surpasses all previous AT-liked methods. How could a loose bound be better than a tighter bound? This really confused me. Is there something I misunderstood?

[1] Improving the Adversarial Robustness of NLP Models by Information Bottleneck
[2] How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions