training_filtering¶
- class trojanvision.defenses.ActivationClustering(nb_clusters=2, nb_dims=10, reduce_method='FastICA', cluster_analysis='silhouette_score', **kwargs)[source]¶
Activation Clustering proposed by Bryant Chen from IBM Research in SafeAI@AAAI 2019.
It is a training filtering backdoor defense that inherits
trojanvision.defenses.TrainingFiltering
.Activation Clustering assumes in the target class, poisoned samples compose a separate cluster which is small or far from its own class center.
The defense procedure is:
Get feature maps for samples
For samples from each class
Get dim-reduced feature maps for samples using
sklearn.decomposition.FastICA
orsklearn.decomposition.PCA
.Conduct clustering w.r.t. dim-reduced feature maps and get cluster classes for samples.
Detect poisoned cluster classes. All samples in that cluster are poisoned. Poisoned samples compose a small separate class.
There are 4 different methods to detect poisoned cluster classes:
'size'
: The smallest cluster class.'relative size'
: The small cluster classes whose proportion is smaller thansize_threshold
.'silhouette_score'
: only detect poison clusters using'relative_size'
when clustering fits data well.'distance'
: Poison clusters are far from their own class center,
See also
Paper: Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Other implementation: IBM adversarial robustness toolbox (ART) [source code]
- Parameters:
nb_clusters (int) – Number of clusters. Defaults to
2
.nb_dims (int) – The reduced dimension of feature maps. Defaults to
10
.reduce_method (str) – The method to reduce dimension of feature maps. Defaults to
'FastICA'
.cluster_analysis (str) – The method chosen to detect poisoned cluster classes. Choose from
['size', 'relative_size', 'distance', 'silhouette_score']
Defaults to'silhouette_score'
.
Note
Clustering method is
sklearn.cluster.KMeans
ifself.defense_input_num=None
(full training set) elsesklearn.cluster.MiniBatchKMeans
- analyze_by_distance(cluster_class, reduced_fm, reduced_fm_centers, _class, **kwargs)[source]¶
- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N)
.reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape
(N, self.nb_dims)
reduced_fm_centers (torch.Tensor) – The centers of dim-reduced feature map tensors in each class with shape
(C, self.nb_dims)
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_relative_size(cluster_class, size_threshold=0.35, **kwargs)[source]¶
Small clusters whose proportion is smaller than
size_threshold
.- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N)
.size_threshold (float) – Defaults to
0.35
.
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_silhouette_score(cluster_class, reduced_fm, silhouette_threshold=0.1, **kwargs)[source]¶
Return
analyze_by_relative_size()
ifsklearn.metrics.silhouette_score
is high, which means clustering fits data well.- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N)
.reduced_fm (torch.Tensor) – Dim-reduced feature map tensor with shape
(N, self.nb_dims)
silhouette_threshold (float) – The threshold to calculate
sklearn.metrics.silhouette_score
. Defaults to0.1
.
- Returns:
list[int] – Predicted poison cluster classes list with shape
(K)
- analyze_by_size(cluster_class, **kwargs)[source]¶
The smallest cluster.
- Parameters:
cluster_class (torch.Tensor) – Clustering result tensor with shape
(N)
.- Returns:
list[int] – Predicted poison cluster classes list with shape
(1)