normal¶
- class trojanvision.attacks.BadNet(mark=None, source_class=None, target_class=0, poison_percent=0.01, train_mode='batch', **kwargs)[source]¶
BadNet proposed by Tianyu Gu from New York University in 2017.
It inherits
trojanvision.attacks.BackdoorAttack
and actually equivalent to it.
- class trojanvision.attacks.TrojanNN(preprocess_layer='flatten', preprocess_next_layer='classifier.fc', target_value=100.0, neuron_num=2, neuron_lr=0.1, neuron_epoch=1000, **kwargs)[source]¶
TrojanNN proposed by Yingqi Liu from Purdue University in NDSS 2018.
It inherits
trojanvision.attacks.BackdoorAttack
.Based on
trojanvision.attacks.BadNet
, TrojanNN preprocesses watermark pixel values to maximize activations of well-connected neurons inself.preprocess_layer
.See also
- Parameters:
preprocess_layer (str) – The chosen layer to maximize neuron activation. Defaults to
'flatten'
.preprocess_next_layer (str) – The next layer after preprocess_layer to find neuron index. Defaults to
'classifier.fc'
.target_value (float) – TrojanNN neuron activation target value. Defaults to
100.0
.neuron_num (int) – TrojanNN neuron number to maximize activation. Defaults to
2
.neuron_epoch (int) – TrojanNN neuron optimization epoch. Defaults to
1000
.neuron_lr (float) – TrojanNN neuron optimization learning rate. Defaults to
0.1
.
- static denoise(img, weight=1.0, max_num_iter=100, eps=1e-3)[source]¶
Denoise image by calling
skimage.restoration.denoise_tv_bregman
.Warning
This method is currently unused in
preprocess_mark()
because no performance difference is observed.- Parameters:
img (torch.Tensor) – Noisy image tensor with shape
(C, H, W)
.- Returns:
torch.Tensor – Denoised image tensor with shape
(C, H, W)
.
- get_neuron_idx()[source]¶
Get top
self.neuron_num
well-connected neurons inself.preprocess_layer
.It is calculated w.r.t. in_channels of
self.preprocess_next_layer
weights.- Returns:
torch.Tensor – Neuron index list tensor with shape
(self.neuron_num)
.
- get_neuron_value(trigger_input, neuron_idx)[source]¶
Get average neuron activation value of
trigger_input
forneuron_idx
.The feature map is obtained by calling
trojanzoo.models.Model.get_layer()
.- Parameters:
trigger_input (torch.Tensor) – Poison input tensor with shape
(N, C, H, W)
.neuron_idx (torch.Tensor) – Neuron index list tensor with shape
(self.neuron_num)
.
- Returns:
float – Average neuron activation value.
- preprocess_mark(neuron_idx)[source]¶
Optimize mark to maxmize activation on
neuron_idx
. It usestorch.optim.Adam
andtorch.optim.lr_scheduler.CosineAnnealingLR
with tanh objective funcion.The feature map is obtained by calling
trojanvision.models.ImageModel.get_layer()
.- Parameters:
neuron_idx (torch.Tensor) – Neuron index list tensor with shape
(self.neuron_num)
.
- class trojanvision.attacks.IMC(attack_remask_epochs=1, attack_remask_lr=0.1, **kwargs)[source]¶
Input Model Co-optimization (IMC) proposed by Ren Pang from Pennsylvania State University in CCS 2020.
It inherits
trojanvision.attacks.BackdoorAttack
.Based on
trojanvision.attacks.TrojanNN
, IMC optimizes the watermark using Adam optimizer during model retraining.See also
paper: A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
code: TrojanZoo is the official implementation of IMC ^_^
- Parameters:
- class trojanvision.attacks.LatentBackdoor(class_sample_num=100, mse_weight=0.5, preprocess_layer='flatten', attack_remask_epochs=100, attack_remask_lr=0.1, **kwargs)[source]¶
Latent Backdoor proposed by Yuanshun Yao, Huiying Li, Haitao Zheng and Ben Y. Zhao from University of Chicago in CCS 2019.
It inherits
trojanvision.attacks.BackdoorAttack
.Similar to
trojanvision.attacks.TrojanNN
, Latent Backdoor preprocesses watermark pixel values to minimize feature mse distance (of other classes with trigger attached) to average feature map of target class.Loss formulas are:
'preprocess'
: $\mathcal{L}_{MSE}$'retrain'
: $\mathcal{L}_{CE} + \text{self.mse\_weight} * \mathcal{L}_{MSE}$
See also
Note
This implementation does NOT involve teacher-student transfer learning nor new learning tasks, which are main contribution and application scenario of the original paper. It still focuses on BadNet problem setting and only utilizes the watermark optimization and retraining loss from Latent Backdoor attack.
For users who have those demands, please inherit this class and use the methods as utilities.
- Parameters:
class_sample_num (int) – Sampled input number of each class. Defaults to
100
.mse_weight (float) – MSE loss weight used in model retraining. Defaults to
0.5
.preprocess_layer (str) – The chosen layer to calculate feature map. Defaults to
'flatten'
.attack_remask_epochs (int) – Watermark preprocess optimization epoch. Defaults to
100
.attack_remask_lr (float) – Watermark preprocess optimization learning rate. Defaults to
0.1
.
- get_avg_target_feats(target_input, target_label)[source]¶
Get average feature map of
self.preprocess_layer
using sampled data fromself.target_class
.- Parameters:
target_input (torch.Tensor) – Input tensor from target class with shape
(self.class_sample_num, C, H, W)
.target_label (torch.Tensor) – Label tensor from target class with shape
(self.class_sample_num)
.
- Returns:
torch.Tensor – Feature map tensor with shape
(self.class_sample_num, C')
.
- preprocess_mark(other_input, other_label)[source]¶
Preprocess to optimize watermark using data sampled from source classes.
- Parameters:
other_input (torch.Tensor) – Input tensor from source classes with shape
(self.class_sample_num * len(source_class), C, H, W)
.other_label (torch.Tensor) – Label tensor from source classes with shape
(self.class_sample_num * len(source_class))
.
- sample_data()[source]¶
Sample data from each class. The returned data dict is:
'other'
:(input, label)
from source classes with batch sizeself.class_sample_num * len(source_class)
.'target'
:(input, label)
from target class with batch sizeself.class_sample_num
.
- Returns:
dict[str, tuple[torch.Tensor, torch.Tensor]] – Data dict.
- class trojanvision.attacks.TrojanNet(select_point=5, mlp_alpha=0.7, comb_temperature=0.1, amplify_rate=2.0, train_noise_num=200, valid_noise_num=2000, **kwargs)[source]¶
TrojanNet proposed by Ruixiang Tang from Texas A&M Univeristy in KDD 2020.
It inherits
trojanvision.attacks.BackdoorAttack
.TrojanNet conduct the attack following these procedures:
trigger generation: TrojanNet generates b/w triggers with
select_point
black pixels by callingsyn_trigger_candidates()
. Firstnum_classes
triggers are corresponding to each class.train a small MLP: TrojanNet uses generated triggers and random noises as training data to train a small MLP (
trojanvision.attacks.backdoor.trojannet._MLPNet
) with $(C^\text{all}_\text{select} + 1)$ classes to classify them. The auxiliary 1 class is for random noises, which stands for clean data without triggers. Random noises are random binary b/w images.combine MLP and original model outputs: select first
num_classes
elements of MLP softmax result, multiply byamplify_rate
and combine it with model softmax result with weightsmlp_alpha
. This serves as the logits of combined model.
See also
Note
There are conflicts between codes and paper from original author. I’ve consulted first author to clarify that current implementation of TrojanZoo should work:
- Paper claims MLP has 1.0 classification confidence, which means the probability is 1.0 for the predicted class and 0 for other classes.Author’s code doesn’t apply any binarization. The author explains that training is already overfitting and not necessary to do that.Our code follows author’s code.
- Paper claims to combine mlp output and model output with weight $\alpha$.Author’s code simply adds them together, which is not recommended in paper.Our code follows paper.
- Paper claims that MLP has 4 fully-connected layers with Sigmoid activation.Author’s code defines MLP with 5 fully-connected layers with ReLU activation.Our code follows author’s code.
- Paper claims to use Adam optimizer.Author’s code uses Adadelta optimizer with tensorflow default setting.Our code follows paper and further uses
torch.optim.lr_scheduler.CosineAnnealingLR
. - Paper claims MLP outputs all 0 for random noises.Author’s code defines random noises as a new class for non-triggers.Our code follows author’s code.
- Paper claims to generate random binary b/w noises as training data.Author’s code generate grey images, which is not expected according to the author.Our code follows paper.
- Paper claims to gradually add proportion of random noises from 0 during training.Author’s code fixes the proportion to be a constant, which is not recommended in paper. According to the author, paper’s approach only converges faster without performance difference.Our code follows author’s code.
- Parameters:
select_point (int) – Black pixel numbers in triggers. Defaults to
5
.mlp_alpha (float) – Weight of MLP output at combination. Defaults to
0.7
.comb_temperature (float) – Temperature at combination. Defaults to
0.1
.amplify_rate (float) – Amplify rate for MLP output. Defaults to
2.0
.train_noise_num (int) – Number of random noises in MLP train set. Defaults to
200
.valid_noise_num (int) – Number of random noises in MLP valid set. Defaults to
2000
.
- Variables:
- syn_random_noises(length)[source]¶
- Generate random noises for MLP training and validation following bernoulli distribution with
p=0.5
.Their labels are the last auxiliary class of MLP:[self.combination_number] * length
.- Parameters:
length (int) – Number of generated random noises.
- Returns:
(torch.Tensor, list[int]) – Input and label tensor with shape
(length, self.all_point)
and(length)
.
- syn_trigger_candidates()[source]¶
- Generate triggers for MLP where
num_classes
triggers are corresponding to each class.Trigger labels are actuallylist(range(self.combination_number))
.- Returns:
(torch.Tensor, list[int]) – Input and label tensor with shape
(self.combination_number, self.all_point)
and(self.combination_number)
.