datasets¶

trojanzoo.datasets.add_argument(parser, dataset_name=None, dataset=None, config=config, class_dict={})[source]¶

Add dataset arguments to argument parser.

For specific arguments implementation, see Dataset.add_argument().

Parameters:

parser (argparse.ArgumentParser) – The parser to add arguments.
dataset_name (str) – The dataset name.
dataset (str | Dataset) – Dataset instance or dataset name (as the alias of dataset_name).
config (Config) – The default parameter config, which contains the default dataset name if not provided.
class_dict (dict[str, type[Dataset]]) – Map from dataset name to dataset class. Defaults to {}.

trojanzoo.datasets.create(dataset_name=None, dataset=None, config=config, class_dict={}, **kwargs)[source]¶

Create a dataset instance.
For arguments not included in kwargs,
use the default values in config.
The default value of folder_path is
'{data_dir}/{data_type}/{name}'.
For dataset implementation, see Dataset.

Parameters:

dataset_name (str) – The dataset name.
dataset (str) – The alias of dataset_name.
config (Config) – The default parameter config.
class_dict (dict[str, type[Dataset]]) – Map from dataset name to dataset class. Defaults to {}.
**kwargs – Keyword arguments passed to dataset init method.

Returns:

Dataset – Dataset instance.

class trojanzoo.datasets.Dataset(batch_size=100, valid_batch_size=100, folder_path=None, download=False, split_ratio=0.8, num_workers=4, loss_weights=False, **kwargs)[source]¶

An abstract class representing a dataset.

It inherits trojanzoo.utils.module.BasicObject.

Note

This is the implementation of dataset. For users, please use create() instead, which is more user-friendly.

Parameters:

batch_size (int) – Batch size of training set (negative number means batch size for each gpu). Defaults to 100.
valid_batch_size (int) – Batch size of validation set. Defaults to 100.
folder_path (str) –
Folder path to store dataset. Defaults to None.

Note

folder_path is usually '{data_dir}/{data_type}/{name}', which is claimed as the default value of create().
download (bool) – Download dataset if not exist. Defaults to False.
split_ratio (float) –

Split training set for training and validation if valid_set is False.

The ratio stands for $\frac{\text{\# training\ subset}}{\text{\# total\ training\ set}}$ .

Defaults to 0.8.
num_workers (int) – Used in get_dataloader(). Defaults to 4.
loss_weights (bool | np.ndarray | torch.Tensor) –

The loss weights w.r.t. each class.

if numpy.ndarray or torch.Tensor, directly set as loss_weights (cpu tensor).

if True, set loss_weights as get_loss_weights();

if False, set loss_weights as None.
**kwargs – Any keyword argument (unused).

Variables:

name (str) – Dataset Name. (need overriding)
loader (dict[str, DataLoader]) –

Preset dataloader for users at dataset initialization.

It contains 'train' and 'valid' loaders.
batch_size (int) – Batch size of training set (always positive). Defaults to 100.
valid_batch_size (int) – Batch size of validation set. Defaults to 100.
num_classes (int) – Number of classes. (need overriding)
folder_path (str) – Folder path to store dataset. Defaults to None.
data_type (str) – Data type (e.g., 'image'). (need overriding)
label_names (list[int]) – Number of classes. (optional)
valid_set (bool) – Whether having a native validation set. Defaults to True.
split_ratio (float) –

Split training set for training and validation if valid_set is False.

The ratio stands for $\frac{\text{\# training\ subset}}{\text{\# total\ training\ set}}$ .

Defaults to 0.8.
loss_weights (torch.Tensor | None) – The loss weights w.r.t. each class.
num_workers (int) – Used in get_dataloader(). Defaults to 4.
collate_fn (Callable | None) – Used in get_dataloader(). Defaults to None.

classmethod add_argument(group)[source]¶: Add dataset arguments to argument parser group. View source to see specific arguments.

Note

This is the implementation of adding arguments. The concrete dataset class may override this method to add more arguments. For users, please use add_argument() instead, which is more user-friendly.

check_files(**kwargs)[source]¶

Check if the dataset files are prepared.

Parameters:: **kwargs – Keyword arguments passed to get_org_dataset().
Returns:: bool – Whether the dataset files are prepared.

static get_class_subset(dataset, class_list)[source]¶

Get a subset from dataset with certain classes.

Parameters:

dataset (torch.utils.data.Dataset) – The entire dataset.
class_list (int | list[int]) – The class list to pick.

Returns:

torch.utils.data.Subset – The subset with labels in class_list.

Example:

>>> from trojanzoo.utils.data import TensorListDataset
>>> from trojanzoo.utils.data import get_class_subset
>>> import torch
>>>
>>> data = torch.ones(11, 3, 32, 32)
>>> targets = list(range(11))
>>> dataset = TensorListDataset(data, targets)
>>> subset = get_class_subset(dataset, class_list=[2, 3])
>>> len(subset)
2

datasets¶

Docs