training_filtering¶

class trojanvision.defenses.ActivationClustering(nb_clusters=2, nb_dims=10, reduce_method='FastICA', cluster_analysis='silhouette_score', **kwargs)[source]¶

Activation Clustering proposed by Bryant Chen from IBM Research in SafeAI@AAAI 2019.

It is a training filtering backdoor defense that inherits trojanvision.defenses.TrainingFiltering.

Activation Clustering assumes in the target class, poisoned samples compose a separate cluster which is small or far from its own class center.

The defense procedure is:

Get feature maps for samples
For samples from each class
- Get dim-reduced feature maps for samples using sklearn.decomposition.FastICA or sklearn.decomposition.PCA.
- Conduct clustering w.r.t. dim-reduced feature maps and get cluster classes for samples.
- Detect poisoned cluster classes. All samples in that cluster are poisoned. Poisoned samples compose a small separate class.

There are 4 different methods to detect poisoned cluster classes:

'size': The smallest cluster class.
'relative size': The small cluster classes whose proportion is smaller than size_threshold.
'silhouette_score': only detect poison clusters using 'relative_size' when clustering fits data well.
'distance': Poison clusters are far from their own class center,

See also

Paper: Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Other implementation: IBM adversarial robustness toolbox (ART) [source code]

Parameters:

nb_clusters (int) – Number of clusters. Defaults to 2.
nb_dims (int) – The reduced dimension of feature maps. Defaults to 10.
reduce_method (str) – The method to reduce dimension of feature maps. Defaults to 'FastICA'.
cluster_analysis (str) – The method chosen to detect poisoned cluster classes. Choose from ['size', 'relative_size', 'distance', 'silhouette_score'] Defaults to 'silhouette_score'.

Note

Clustering method is sklearn.cluster.KMeans if self.defense_input_num=None (full training set) else sklearn.cluster.MiniBatchKMeans

analyze_by_distance(cluster_class, reduced_fm, reduced_fm_centers, _class, **kwargs)[source]¶

Parameters:

cluster_class (torch.Tensor) – Clustering result tensor with shape (N).
reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape (N, self.nb_dims)
reduced_fm_centers (torch.Tensor) – The centers of dim-reduced feature map tensors in each class with shape (C, self.nb_dims)

Returns:

list[int] – Predicted poison cluster classes list with shape (K)

analyze_by_relative_size(cluster_class, size_threshold=0.35, **kwargs)[source]¶

Small clusters whose proportion is smaller than size_threshold.

Parameters:

cluster_class (torch.Tensor) – Clustering result tensor with shape (N).
size_threshold (float) – Defaults to 0.35.

Returns:

list[int] – Predicted poison cluster classes list with shape (K)

analyze_by_silhouette_score(cluster_class, reduced_fm, silhouette_threshold=0.1, **kwargs)[source]¶

Return analyze_by_relative_size() if sklearn.metrics.silhouette_score is high, which means clustering fits data well.

Parameters:

cluster_class (torch.Tensor) – Clustering result tensor with shape (N).
reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape (N, self.nb_dims)
silhouette_threshold (float) – The threshold to calculate sklearn.metrics.silhouette_score. Defaults to 0.1.

Returns:

list[int] – Predicted poison cluster classes list with shape (K)

analyze_by_size(cluster_class, **kwargs)[source]¶

The smallest cluster.

Parameters:: cluster_class (torch.Tensor) – Clustering result tensor with shape (N).
Returns:: list[int] – Predicted poison cluster classes list with shape (1)

training_filtering¶

Docs