defenses¶
- trojanvision.defenses.add_argument(parser, defense_name=None, defense=None, class_dict=class_dict)[source]¶
- trojanvision.defenses.create(defense_name=None, defense=None, dataset_name=None, dataset=None, config=config, class_dict=class_dict, **kwargs)[source]¶
- class trojanvision.defenses.BackdoorDefense(attack, original=False, **kwargs)[source]¶
Backdoor defense abstract class. It inherits
trojanzoo.defenses.Defense
.- Parameters:
original (bool) – Whether to load original clean model. If
False
, load attack poisoned model by callingself.attack.load()
.- Variables:
real_mark (torch.Tensor) – Watermark that the attacker uses with shape
(C+1, H, W)
.real_mask (torch.Tensor) – Mask of the watermark by calling
trojanvision.marks.Watermark.get_mask()
.
- class trojanvision.defenses.InputFiltering(defense_input_num=100, **kwargs)[source]¶
Backdoor defense abstract class of input filtering. It inherits
trojanvision.defenses.BackdoorDefense
.It detects whether a test input is poisoned.
The defense tests
defense_input_num
clean test inputs and their corresponding poison version (2 * defense_input_num
in total).- Parameters:
defense_input_num (int) – Number of test inputs. Defaults to
100
.- Variables:
test_set (torch.utils.data.Dataset) – Test dataset with length
defense_input_num
.
- get_pred_labels()[source]¶
Get predicted labels for test inputs (need overriding).
- Returns:
torch.Tensor –
torch.BoolTensor
with shape(2 * defense_input_num)
.
- class trojanvision.defenses.TrainingFiltering(defense_input_num=None, **kwargs)[source]¶
Backdoor defense abstract class of training data filtering. It inherits
trojanvision.defenses.BackdoorDefense
.Provided
defense_input_num
training data, it detects which training data is poisoned.The defense evaluates clean and poison training inputs.
If
defense_input_num
isNone
, use full training data.Else, sample
defense_input_num * poison_percent
poison training data anddefense_input_num * (1 - poison_percent)
clean training data.
If dataset is not using
train_mode == 'dataset'
, construct poison dataset using all clean data with watermark attached. (Ifdefense_input_num
isNone
as well, the defense will evaluate the whole clean training set and its poisoned version.)- Parameters:
defense_input_num (int) – Number of training inputs to evaluate. Defaults to
None
(all training set).- Variables:
clean_set (torch.utils.data.Dataset) – Clean training data to evaluate.
poison_set (torch.utils.data.Dataset) – Poison training data to evaluate.
- get_datasets()[source]¶
Get clean and poison datasets.
- Returns:
(torch.utils.data.Dataset, torch.utils.data.Dataset) – Clean training dataset and poison training dataset.
- class trojanvision.defenses.ModelInspection(defense_remask_epoch=10, defense_remask_lr=0.1, cost=1e-3, **kwargs)[source]¶
Backdoor defense abstract class of model inspection. It inherits
trojanvision.defenses.BackdoorDefense
.Provided a model, it tries to search for a trigger. If trigger exists, that means the model is poisoned.
- Parameters:
- Variables:
cost (float) – Cost of mask norm loss.
clean_set (torch.utils.data.Dataset) – Clean training data to evaluate.
poison_set (torch.utils.data.Dataset) – Poison training data to evaluate.
- check_early_stop(*args, **kwargs)[source]¶
Check whether to early stop at the end of each remask epoch.
- Returns:
bool – Whether to early stop. Defaults to
False
.
- get_mark_loss_list(verbose=True, **kwargs)[source]¶
Get list of mark, loss, asr of recovered trigger for each class.
- Parameters:
verbose (bool) – Whether to output jaccard index for each trigger. It’s also passed to
optimize_mark()
.**kwargs – Keyword arguments passed to
optimize_mark()
.
- Returns:
(torch.Tensor, list[float], list[float]) – list of mark, loss, asr with length
num_classes
.
- load(path=None)[source]¶
Load recovered mark from
path
.- Parameters:
path (str) – npz path of recovered mark. Defaults to
'{folder_path}/{self.get_filename()}.npz'
.
- loss(_input, _label, target, trigger_output=None, **kwargs)[source]¶
Loss function to optimize recovered trigger.
- Parameters:
_input (torch.Tensor) – Clean input tensor with shape
(N, C, H, W)
._label (torch.Tensor) – Clean label tensor with shape
(N)
.target (int) – Target class.
trigger_output (torch.Tensor) – Output tensor of input tensor with trigger. Defaults to
None
.
- Returns:
torch.Tensor – Scalar loss tensor.
- optimize_mark(label, loader=None, logger_header='', verbose=True, **kwargs)[source]¶
- Parameters:
label (int) – The class label to optimize.
loader (collections.abc.Iterable) – Data loader to optimize trigger. Defaults to
self.dataset.loader['train']
.logger_header (str) – Header string of logger. Defaults to
''
.verbose (bool) – Whether to use logger for output. Defaults to
True
.**kwargs – Keyword arguments passed to
loss()
.
- Returns:
(torch.Tensor, torch.Tensor) – Optimized mark tensor with shape
(C + 1, H, W)
and loss tensor.