training_filtering¶
- class trojanvision.defenses.ActivationClustering(nb_clusters=2, nb_dims=10, reduce_method='FastICA', cluster_analysis='silhouette_score', **kwargs)[source]¶
Activation Clustering proposed by Bryant Chen from IBM Research in SafeAI@AAAI 2019.
It is a training filtering backdoor defense that inherits
trojanvision.defenses.TrainingFiltering.Activation Clustering assumes in the target class, poisoned samples compose a separate cluster which is small or far from its own class center.
The defense procedure is:
Get feature maps for samples
For samples from each class
Get dim-reduced feature maps for samples using
sklearn.decomposition.FastICAorsklearn.decomposition.PCA.Conduct clustering w.r.t. dim-reduced feature maps and get cluster classes for samples.
Detect poisoned cluster classes. All samples in that cluster are poisoned. Poisoned samples compose a small separate class.
There are 4 different methods to detect poisoned cluster classes:
'size': The smallest cluster class.'relative size': The small cluster classes whose proportion is smaller thansize_threshold.'silhouette_score': only detect poison clusters using'relative_size'when clustering fits data well.'distance': Poison clusters are far from their own class center,
See also
Paper: Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Other implementation: IBM adversarial robustness toolbox (ART) [source code]
- Parameters:
nb_clusters (int) – Number of clusters. Defaults to
2.nb_dims (int) – The reduced dimension of feature maps. Defaults to
10.reduce_method (str) – The method to reduce dimension of feature maps. Defaults to
'FastICA'.cluster_analysis (str) – The method chosen to detect poisoned cluster classes. Choose from
['size', 'relative_size', 'distance', 'silhouette_score']Defaults to'silhouette_score'.
Note
Clustering method is
sklearn.cluster.KMeansifself.defense_input_num=None(full training set) elsesklearn.cluster.MiniBatchKMeans- analyze_by_distance(cluster_class, reduced_fm, reduced_fm_centers, _class, **kwargs)[source]¶
- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N).reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape
(N, self.nb_dims)reduced_fm_centers (torch.Tensor) – The centers of dim-reduced feature map tensors in each class with shape
(C, self.nb_dims)
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_relative_size(cluster_class, size_threshold=0.35, **kwargs)[source]¶
Small clusters whose proportion is smaller than
size_threshold.- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N).size_threshold (float) – Defaults to
0.35.
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_silhouette_score(cluster_class, reduced_fm, silhouette_threshold=0.1, **kwargs)[source]¶
Return
analyze_by_relative_size()ifsklearn.metrics.silhouette_scoreis high, which means clustering fits data well.- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N).reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape
(N, self.nb_dims)silhouette_threshold (float) – The threshold to calculate
sklearn.metrics.silhouette_score. Defaults to0.1.
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_size(cluster_class, **kwargs)[source]¶
The smallest cluster.
- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N).- Returns:
list[int] – Predicted poison cluster classes list with shape
(1)