datasets¶
- trojanvision.datasets.add_argument(parser, dataset_name=None, dataset=None, config=config, class_dict=class_dict)[source]¶
- Add image dataset arguments to argument parser.For specific arguments implementation, see
ImageSet.add_argument()
.- Parameters:
parser (argparse.ArgumentParser) – The parser to add arguments.
dataset_name (str) – The dataset name.
dataset (str | Dataset) – Dataset instance or dataset name (as the alias of dataset_name).
config (Config) – The default parameter config, which contains the default dataset name if not provided.
class_dict (dict[str, type[Dataset]]) – Map from dataset name to dataset class. Defaults to
trojanvision.datasets.class_dict
.
See also
- trojanvision.datasets.create(dataset_name=None, dataset=None, config=config, class_dict=class_dict, **kwargs)[source]¶
- Create a image dataset instance.For arguments not included in
kwargs
, use the default values inconfig
.The default value offolder_path
is'{data_dir}/{data_type}/{name}'
.For dataset implementation, seeImageSet
.- Parameters:
dataset_name (str) – The dataset name.
dataset (str) – The alias of dataset_name.
config (Config) – The default parameter config.
class_dict (dict[str, type[ImageSet]]) – Map from dataset name to dataset class. Defaults to
trojanvision.datasets.class_dict
.**kwargs – Keyword arguments passed to dataset init method.
- Returns:
ImageSet – Image dataset instance.
See also
- class trojanvision.datasets.ImageSet(norm_par=None, normalize=False, transform=None, auto_augment=False, mixup=False, mixup_alpha=0.0, cutmix=False, cutmix_alpha=0.0, cutout=False, cutout_length=None, **kwargs)[source]¶
- The basic class representing an image dataset.It inherits
trojanzoo.datasets.Dataset
.Note
This is the implementation of dataset. For users, please use
create()
instead, which is more user-friendly.- Parameters:
norm_par (dict[str, list[float]]) – Data normalization parameters of
'mean'
and'std'
(e.g.,{'mean': [0.5, 0.4, 0.6], 'std': [0.2, 0.3, 0.1]}
). Defaults toNone
.normalize (bool) – Whether to use
torchvision.transforms.Normalize
in dataset transform. Otherwise, use it as model preprocess layer.transform (str) –
The dataset transform type.
None |'none'
(torchvision.transforms.PILToTensor
andtorchvision.transforms.ConvertImageDtype
)'bit'
(transform used in BiT network)'pytorch'
(pytorch transform used in ImageNet training).
Defaults to
None
.Note
See
get_transform()
to get more details.auto_augment (bool) – Whether to use
torchvision.transforms.AutoAugment
. Defaults toFalse
.mixup (bool) – Whether to use
trojanvision.utils.transforms.RandomMixup
. Defaults toFalse
.mixup_alpha (float) –
alpha
passed totrojanvision.utils.transforms.RandomMixup
. Defaults to0.0
.cutmix (bool) – Whether to use
trojanvision.utils.transforms.RandomCutmix
. Defaults toFalse
.cutmix_alpha (float) –
alpha
passed totrojanvision.utils.transforms.RandomCutmix
. Defaults to0.0
.cutout (bool) – Whether to use
trojanvision.utils.transforms.Cutout
. Defaults toFalse
.cutout_length (int) – Cutout length. Defaults to
None
.**kwargs – keyword argument passed to
trojanzoo.datasets.Dataset
.
- Variables:
- classmethod add_argument(group)[source]¶
Add image dataset arguments to argument parser group. View source to see specific arguments.
Note
This is the implementation of adding arguments. The concrete dataset class may override this method to add more arguments. For users, please use
add_argument()
instead, which is more user-friendly.
- static get_data(data, **kwargs)[source]¶
Process image data. Defaults to put input and label on
env['device']
withnon_blocking
and transform label totorch.LongTensor
.- Parameters:
data (tuple[torch.Tensor, torch.Tensor]) – Tuple of batched input and label.
**kwargs – Any keyword argument (unused).
- Returns:
(tuple[torch.Tensor, torch.Tensor]) – Tuple of batched input and label on
env['device']
. Label is transformed totorch.LongTensor
.
- get_transform(mode, normalize=None)[source]¶
Get dataset transform based on
self.transform
.None |'none'
(torchvision.transforms.PILToTensor
andtorchvision.transforms.ConvertImageDtype
)'bit'
(transform used in BiT network)'pytorch'
(pytorch transform used in ImageNet training).
- Parameters:
mode (str) – The dataset mode (e.g.,
'train' | 'valid'
).normalize (bool | None) – Whether to use
torchvision.transforms.Normalize
in dataset transform. Defaults toself.normalize
.
- Returns:
torchvision.transforms.Compose – The transform sequence.
- make_folder(img_type='.png', **kwargs)[source]¶
Save the dataset to
self.folder_path
astrojanvision.datasets.ImageFolder
format.'{self.folder_path}/{self.name}/{mode}/{class_name}/{img_idx}.png'
- Parameters:
img_type (str) – The image types to save. Defaults to
'.png'
.
- class trojanvision.datasets.ImageFolder(data_format='folder', memory=False, **kwargs)[source]¶
Image folder class which inherits
trojanvision.datasets.ImageSet
.See also
- Variables:
ext (Param[str, str]) – Map from mode to downloaded file extension.
md5 (dict[str, str]) – Map from mode to downloaded file md5.
org_folder_name (dict[str, str]) – Map from mode to extracted folder name of downloaded file.
data_format (str) –
File format of dataset.
'folder'
(default)'tar'
'zip'
memory (bool) – Whether to put all dataset into memory at initialization. Defaults to
False
.
- classmethod add_argument(group)[source]¶
Add image dataset arguments to argument parser group. View source to see specific arguments.
Note
This is the implementation of adding arguments. The concrete dataset class may override this method to add more arguments. For users, please use
add_argument()
instead, which is more user-friendly.
- initialize(*args, **kwargs)[source]¶
You could use this method to transform across different
data_format
.
- sample(child_name=None, class_dict=None, sample_num=None, method='folder')[source]¶
Sample a subset image folder dataset.
- Parameters:
child_name (str) – Name of child subset. Defaults to
'{self.name}_sample{sample_num}'
class_dict (dict[str, list[str]] | None) – Map from new class name to list of old class names. If
None
, usesample_num
to random sample a subset (1 to 1). Defaults toNone
.sample_num (int | None) – The number of subset classes to sample if
class_dict
isNone
. Defaults toNone
.method (str) –
data_format
of new subset to save. Defaults to'folder'
.