【题目】对抗攻击(Adversarial attacks)的常用术语
本文是论文《Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey》中Section 2 的翻译,主要讲述了对抗攻击(Adversarial attacks)的常用术语。
一、 对抗攻击(Adversarial attacks)的常用术语
In this section, we describe the common technical terms used in the literature related to adversarial attacks on deep learning in Computer Vision.
1.1 Adversarial example/image
Adversarial example/image is a modified version of a clean image that is intentionally perturbed (e.g. by adding noise) to confuse/fool a machine learning technique, such as deep neural networks.
1.2 Adversarial perturbation
Adversarial perturbation is the noise that is added to the clean image to make it an adversarial example.
1.3 Adversarial training
Adversarial training uses adversarial images besides the clean images to train machine learning models.
1.4 Adversary
Adversary more commonly refers to the agent who creates an adversarial example. However, in some cases the example itself is also called adversary.
1.5 Black-box attacks & ‘semi-black-box’ attacks
Black-box attacks feed a targeted model with the adversarial examples (during testing) that are generated without the knowledge of that model. In some instances, it is assumed that the adversary has a limited knowledge of the model (e.g. its training procedure and/or its architecture) but definitely does not know about the model parameters. In other instances, using any information about the target model is referred to as ‘semi-black-box’attack. We use the former convention in this article.
1.6 White-box attacks
White-box attacks assume the complete knowledge of the targeted model, including its parameter values, architecture, training method, and in some cases its training data as well.
1.7 Detector
Detector is a mechanism to (only) detect if an image is an adversarial example.
1.8 Fooling ratio/rate
Fooling ratio/rate indicates the percentage of images on which a trained model changes its prediction label after the images are perturbed.
1.9 One-shot/one-step methods & iterative methods
One-shot/one-step methods generate an adversarial perturbation by performing a single step computation, e.g. computing gradient of model loss once. The opposite are iterative methods that perform the same computation multiple times to get a single perturbation. The latter are often computationally expensive.
1.10 Quasi-imperceptible perturbations
Quasi-imperceptible perturbations impair images very slightly for human perception.
1.11 Rectifier
Rectifier modifies an adversarial example to restore the prediction of the targeted model to its prediction on the clean version of the same example.
1.12 Targeted attacks & non-targeted attacks
Targeted attacks fool a model into falsely predicting a specific label for the adversarial image. They are opposite to the non-targeted attacks in which the predicted label of the adversarial image is irrelevant, as long as it is not the correct label.
1.13 Threat model
Threat model refers to the types of potential attacks considered by an approach, e.g. black-box attack.
1.14 Transferability
Transferability refers to the ability of an adversarial example to remain effective even for the models other than the one used to generate it.
1.15 Universal perturbation & universality
Universal perturbation is able to fool a given model on ‘any’ image with high probability. Note that, universality refers to the property of a perturbation
being ‘image-agnostic’ as opposed to having good transferability.
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。