normal¶

class trojanvision.attacks.BadNet(mark=None, source_class=None, target_class=0, poison_percent=0.01, train_mode='batch', **kwargs)[source]¶

BadNet proposed by Tianyu Gu from New York University in 2017.

It inherits trojanvision.attacks.BackdoorAttack and actually equivalent to it.

See also

paper: BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

class trojanvision.attacks.TrojanNN(preprocess_layer='flatten', preprocess_next_layer='classifier.fc', target_value=100.0, neuron_num=2, neuron_lr=0.1, neuron_epoch=1000, **kwargs)[source]¶

TrojanNN proposed by Yingqi Liu from Purdue University in NDSS 2018.

It inherits trojanvision.attacks.BackdoorAttack.

Based on trojanvision.attacks.BadNet, TrojanNN preprocesses watermark pixel values to maximize activations of well-connected neurons in self.preprocess_layer.

See also

paper: Trojaning Attack on Neural Networks
code: https://github.com/PurduePAML/TrojanNN
website: https://purduepaml.github.io/TrojanNN

Parameters:

preprocess_layer (str) – The chosen layer to maximize neuron activation. Defaults to 'flatten'.
preprocess_next_layer (str) – The next layer after preprocess_layer to find neuron index. Defaults to 'classifier.fc'.
target_value (float) – TrojanNN neuron activation target value. Defaults to 100.0.
neuron_num (int) – TrojanNN neuron number to maximize activation. Defaults to 2.
neuron_epoch (int) – TrojanNN neuron optimization epoch. Defaults to 1000.
neuron_lr (float) – TrojanNN neuron optimization learning rate. Defaults to 0.1.

static denoise(img, weight=1.0, max_num_iter=100, eps=1e-3)[source]¶

Denoise image by calling skimage.restoration.denoise_tv_bregman.

Warning

This method is currently unused in preprocess_mark() because no performance difference is observed.

Parameters:: img (torch.Tensor) – Noisy image tensor with shape (C, H, W).
Returns:: torch.Tensor – Denoised image tensor with shape (C, H, W).

get_neuron_idx()[source]¶

Get top self.neuron_num well-connected neurons in self.preprocess_layer.

It is calculated w.r.t. in_channels of self.preprocess_next_layer weights.

Returns:: torch.Tensor – Neuron index list tensor with shape (self.neuron_num).

get_neuron_value(trigger_input, neuron_idx)[source]¶

Get average neuron activation value of trigger_input for neuron_idx.

The feature map is obtained by calling trojanzoo.models.Model.get_layer().

Parameters:

trigger_input (torch.Tensor) – Poison input tensor with shape (N, C, H, W).
neuron_idx (torch.Tensor) – Neuron index list tensor with shape (self.neuron_num).

Returns:

float – Average neuron activation value.

preprocess_mark(neuron_idx)[source]¶

Optimize mark to maxmize activation on neuron_idx. It uses torch.optim.Adam and torch.optim.lr_scheduler.CosineAnnealingLR with tanh objective funcion.

The feature map is obtained by calling trojanvision.models.ImageModel.get_layer().

Parameters:: neuron_idx (torch.Tensor) – Neuron index list tensor with shape (self.neuron_num).

class trojanvision.attacks.IMC(attack_remask_epochs=1, attack_remask_lr=0.1, **kwargs)[source]¶

Input Model Co-optimization (IMC) proposed by Ren Pang from Pennsylvania State University in CCS 2020.

It inherits trojanvision.attacks.BackdoorAttack.

Based on trojanvision.attacks.TrojanNN, IMC optimizes the watermark using Adam optimizer during model retraining.

See also

paper: A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
code: TrojanZoo is the official implementation of IMC ^_^

Parameters:

attack_remask_epochs (int) – Inner epoch to optimize watermark during each training epoch. Defaults to 20.
attack_remask_lr (float) – Learning rate of Adam optimizer to optimize watermark. Defaults to 0.1.

optimize_mark(loss_fn=None, **kwargs)[source]¶: Optimize watermark at the beginning of each training epoch.

class trojanvision.attacks.LatentBackdoor(class_sample_num=100, mse_weight=0.5, preprocess_layer='flatten', attack_remask_epochs=100, attack_remask_lr=0.1, **kwargs)[source]¶

Latent Backdoor proposed by Yuanshun Yao, Huiying Li, Haitao Zheng and Ben Y. Zhao from University of Chicago in CCS 2019.

It inherits trojanvision.attacks.BackdoorAttack.

Similar to trojanvision.attacks.TrojanNN, Latent Backdoor preprocesses watermark pixel values to minimize feature mse distance (of other classes with trigger attached) to average feature map of target class.

Loss formulas are:

'preprocess': $\mathcal{L}_{MSE}$
'retrain': $\mathcal{L}_{CE} + \text{self.mse\_weight} * \mathcal{L}_{MSE}$

See also

paper: Latent Backdoor Attacks on Deep Neural Networks
code: https://github.com/Huiying-Li/Latent-Backdoor
website: https://sandlab.cs.uchicago.edu/latent

Note

This implementation does NOT involve teacher-student transfer learning nor new learning tasks, which are main contribution and application scenario of the original paper. It still focuses on BadNet problem setting and only utilizes the watermark optimization and retraining loss from Latent Backdoor attack.

For users who have those demands, please inherit this class and use the methods as utilities.

Parameters:

class_sample_num (int) – Sampled input number of each class. Defaults to 100.
mse_weight (float) – MSE loss weight used in model retraining. Defaults to 0.5.
preprocess_layer (str) – The chosen layer to calculate feature map. Defaults to 'flatten'.
attack_remask_epochs (int) – Watermark preprocess optimization epoch. Defaults to 100.
attack_remask_lr (float) – Watermark preprocess optimization learning rate. Defaults to 0.1.

get_avg_target_feats(target_input, target_label)[source]¶

Get average feature map of self.preprocess_layer using sampled data from self.target_class.

Parameters:

target_input (torch.Tensor) – Input tensor from target class with shape (self.class_sample_num, C, H, W).
target_label (torch.Tensor) – Label tensor from target class with shape (self.class_sample_num).

Returns:

torch.Tensor – Feature map tensor with shape (self.class_sample_num, C').

preprocess_mark(other_input, other_label)[source]¶

Preprocess to optimize watermark using data sampled from source classes.

Parameters:

other_input (torch.Tensor) – Input tensor from source classes with shape (self.class_sample_num * len(source_class), C, H, W).
other_label (torch.Tensor) – Label tensor from source classes with shape (self.class_sample_num * len(source_class)).

sample_data()[source]¶

Sample data from each class. The returned data dict is:

'other': (input, label) from source classes with batch size self.class_sample_num * len(source_class).
'target': (input, label) from target class with batch size self.class_sample_num.

Returns:: dict[str, tuple[torch.Tensor, torch.Tensor]] – Data dict.

class trojanvision.attacks.TrojanNet(select_point=5, mlp_alpha=0.7, comb_temperature=0.1, amplify_rate=2.0, train_noise_num=200, valid_noise_num=2000, **kwargs)[source]¶

TrojanNet proposed by Ruixiang Tang from Texas A&M Univeristy in KDD 2020.

It inherits trojanvision.attacks.BackdoorAttack.

TrojanNet conduct the attack following these procedures:

trigger generation: TrojanNet generates b/w triggers with select_point black pixels by calling syn_trigger_candidates(). First num_classes triggers are corresponding to each class.
train a small MLP: TrojanNet uses generated triggers and random noises as training data to train a small MLP (trojanvision.attacks.backdoor.trojannet._MLPNet) with $(C^\text{all}_\text{select} + 1)$ classes to classify them. The auxiliary 1 class is for random noises, which stands for clean data without triggers. Random noises are random binary b/w images.
combine MLP and original model outputs: select first num_classes elements of MLP softmax result, multiply by amplify_rate and combine it with model softmax result with weights mlp_alpha. This serves as the logits of combined model.

See also

Note

There are conflicts between codes and paper from original author. I’ve consulted first author to clarify that current implementation of TrojanZoo should work:

Paper claims MLP has 1.0 classification confidence, which means the probability is 1.0 for the predicted class and 0 for other classes.

Author’s code doesn’t apply any binarization. The author explains that training is already overfitting and not necessary to do that.

Our code follows author’s code.
Paper claims to combine mlp output and model output with weight $\alpha$ .

Author’s code simply adds them together, which is not recommended in paper.

Our code follows paper.
Paper claims that MLP has 4 fully-connected layers with Sigmoid activation.

Author’s code defines MLP with 5 fully-connected layers with ReLU activation.

Our code follows author’s code.
Paper claims to use Adam optimizer.

Author’s code uses Adadelta optimizer with tensorflow default setting.

Our code follows paper and further uses torch.optim.lr_scheduler.CosineAnnealingLR.
Paper claims MLP outputs all 0 for random noises.

Author’s code defines random noises as a new class for non-triggers.

Our code follows author’s code.
Paper claims to generate random binary b/w noises as training data.

Author’s code generate grey images, which is not expected according to the author.

Our code follows paper.
Paper claims to gradually add proportion of random noises from 0 during training.

Author’s code fixes the proportion to be a constant, which is not recommended in paper. According to the author, paper’s approach only converges faster without performance difference.

Our code follows author’s code.

Parameters:

select_point (int) – Black pixel numbers in triggers. Defaults to 5.
mlp_alpha (float) – Weight of MLP output at combination. Defaults to 0.7.
comb_temperature (float) – Temperature at combination. Defaults to 0.1.
amplify_rate (float) – Amplify rate for MLP output. Defaults to 2.0.
train_noise_num (int) – Number of random noises in MLP train set. Defaults to 200.
valid_noise_num (int) – Number of random noises in MLP valid set. Defaults to 2000.

Variables:

all_point (int) – Number of trigger size (mark.mark_height * mark.mark_width)
combination_number (int) – Number of trigger combinations ( $C^\text{all}_\text{select}$ )

syn_random_noises(length)[source]¶

Generate random noises for MLP training and validation following bernoulli distribution with p=0.5.

Their labels are the last auxiliary class of MLP: [self.combination_number] * length.

Parameters:: length (int) – Number of generated random noises.
Returns:: (torch.Tensor, list[int]) – Input and label tensor with shape (length, self.all_point) and (length).

syn_trigger_candidates()[source]¶

Generate triggers for MLP where num_classes triggers are corresponding to each class.

Trigger labels are actually list(range(self.combination_number)).

Returns:: (torch.Tensor, list[int]) – Input and label tensor with shape (self.combination_number, self.all_point) and (self.combination_number).

normal¶

Docs