attacks¶

adv
backdoor

trojanvision.attacks.add_argument(parser, attack_name=None, attack=None, class_dict=class_dict)[source]¶

Add attack arguments to argument parser.

For specific arguments implementation, see trojanzoo.attacks.Attack.add_argument().

Parameters:

parser (argparse.ArgumentParser) – The parser to add arguments.
attack_name (str) – The attack name.
attack (str | Attack) – The attack instance or attack name (as the alias of attack_name).
class_dict (dict[str, type[Attack]]) – Map from attack name to attack class. Defaults to trojanvision.attacks.class_dict.

Returns:

argparse._ArgumentGroup – The argument group.

See also

trojanzoo.attacks.create()

class trojanvision.attacks.BackdoorAttack(mark=None, source_class=None, target_class=0, poison_percent=0.01, train_mode='batch', **kwargs)[source]¶

Backdoor attack abstract class. It inherits trojanzoo.attacks.Attack.

Note

This class is actually equivalent to trojanvision.attacks.BadNet.

BackdoorAttack attaches a provided watermark to some training images and inject them into training set with target label. After retraining, the model will classify images with watermark of certain/all classes into target class.

Parameters:

mark (trojanvision.marks.Watermark) – The watermark instance.
target_class (int) – The target class that images with watermark will be misclassified as. Defaults to 0.
poison_percent (float) – Percentage of poisoning inputs in the whole training set. Defaults to 0.01.
train_mode (float) –
Training mode to inject backdoor. Choose from ['batch', 'dataset', 'loss']. Defaults to 'batch'.
- 'batch': For a clean batch, randomly picked poison_num inputs, attach trigger on them, modify their labels and append to original batch.
- 'dataset': Create a poisoned dataset and use the mixed dataset.
- 'loss': For a clean batch, calculate the loss on clean data and the loss on poisoned data (all batch) and mix them using poison_percent as weight.

Variables:

poison_ratio (float) – The ratio of poison data divided by clean data. poison_percent / (1 - poison_percent)
poison_num (float | int) –
The number of poison data in each batch / dataset.
- train_mode == 'batch' : poison_ratio * batch_size
- train_mode == 'dataset': int(poison_ratio * len(train_set))
- train_mode == 'loss' : N/A
poison_set (torch.utils.data.Dataset) – Poison dataset (no clean data) if train_mode == 'dataset'.

add_mark(x, **kwargs)[source]¶: Add watermark to input tensor. Defaults to trojanvision.marks.Watermark.add_mark().

get_data(data, org=False, keep_org=True, poison_label=True, **kwargs)[source]¶

Get data.

Parameters:

data (tuple[torch.Tensor, torch.Tensor]) – Tuple of input and label tensors.
org (bool) – Whether to return original clean data directly. Defaults to False.
keep_org (bool) – Whether to keep original clean data in final results. If False, the results are all infected. Defaults to True.
poison_label (bool) – Whether to use target class label for poison data. Defaults to True.
**kwargs – Any keyword argument (unused).

Returns:

(torch.Tensor, torch.Tensor) – Result tuple of input and label tensors.

get_filename(mark_alpha=None, target_class=None, **kwargs)[source]¶: Get filenames for current attack settings.

get_neuron_jaccard(k=None, ratio=0.5)[source]¶

Get Jaccard Index of neuron activations for feature maps between normal inputs and poison inputs.

Find average top-k neuron indices of 2 kinds of feature maps clean_idx and poison_idx, and return $\frac{\text{len(clean\_idx \& poison\_idx)}}{\text{len(clean\_idx | poison\_idx)}}$

Parameters:

k (int) – Top-k neurons to calculate jaccard index. Defaults to None.
ratio (float) – Percentage of neurons if k is not provided. Defaults to 0.5.

Returns:

float – Jaccard Index.

get_poison_dataset(poison_label=True, poison_num=None, seed=None)[source]¶

Get poison dataset (no clean data).

Parameters:

poison_label (bool) – Whether to use target poison label for poison data. Defaults to True.
poison_num (int) – Number of poison data. Defaults to round(self.poison_ratio * len(train_set))
seed (int) – Random seed to sample poison input indices. Defaults to env['data_seed'].

Returns:

torch.utils.data.Dataset – Poison dataset (no clean data).

load(filename=None, **kwargs)[source]¶: Load attack results from previously saved files.

save(filename=None, **kwargs)[source]¶: Save attack results to files.

validate_confidence(mode='valid', success_only=True)[source]¶

Get self.target_class confidence on dataset of mode.

Parameters:

mode (str) – Dataset mode. Defaults to 'valid'.
success_only (bool) – Whether to only measure confidence on attack-successful inputs. Defaults to True.

Returns:

float – Average confidence of self.target_class.

class trojanvision.attacks.CleanLabelBackdoor(*args, train_mode='dataset', **kwargs)[source]¶

Backdoor attack abstract class of clean label. It inherits trojanvision.attacks.BackdoorAttack.

Under clean-label setting, only clean inputs from target class are infected, while the distortion is negligible for human to detect.

get_poison_dataset(poison_num=None, load_mark=True, seed=None)[source]¶

Get poison dataset from target class (no clean data).

Parameters:

poison_num (int) – Number of poison data. Defaults to self.poison_num
load_mark (bool) – Whether to load previously saved watermark. This should be False during attack. Defaults to True.
seed (int) – Random seed to sample poison input indices. Defaults to env['data_seed'].

Returns:

torch.utils.data.Dataset – Poison dataset from target class (no clean data).

class trojanvision.attacks.DynamicBackdoor(mark=None, source_class=None, target_class=0, poison_percent=0.01, train_mode='batch', **kwargs)[source]¶

attacks¶

Docs