`matchzoo.dataloader.dataloader`¶

Basic data loader.

Module Contents¶

class matchzoo.dataloader.dataloader.DataLoader(dataset: data.Dataset, batch_size: int = 32, device: typing.Union[torch.device, int, list, None] = None, stage='train', resample: bool = True, shuffle: bool = False, sort: bool = True, callback: BaseCallback = None, pin_memory: bool = False, timeout: int = 0, num_workers: int = 0, worker_init_fn=None)¶

Bases: object

DataLoader that loads batches of data from a Dataset.

Parameters:

dataset – The Dataset object to load data from.
batch_size – Batch_size. (default: 32)
device – The desired device of returned tensor. Default: if None, use the current device. If torch.device or int, use device specified by user. If list, the first item will be used.
stage – One of “train”, “dev”, and “test”. (default: “train”)
resample – Whether to resample data between epochs. only effective when mode of dataset is “pair”. (default: True)
shuffle – Whether to shuffle data between epochs. (default: False)
sort – Whether to sort data according to length_right. (default: True)
callback – BaseCallback. See matchzoo.engine.base_callback.BaseCallback for more details.
pin_momory – If set to True, tensors will be copied into pinned memory. (default: False)
timeout – The timeout value for collecting a batch from workers. ( default: 0)
num_workers – The number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
worker_init_fn – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)

Examples

>>> import matchzoo as mz
>>> data_pack = mz.datasets.toy.load_data(stage='train')
>>> preprocessor = mz.preprocessors.BasicPreprocessor()
>>> data_processed = preprocessor.fit_transform(data_pack)
>>> dataset = mz.dataloader.Dataset(data_processed, mode='point')
>>> padding_callback = mz.dataloader.callbacks.BasicPadding()
>>> dataloader = mz.dataloader.DataLoader(
...     dataset, stage='train', callback=padding_callback)
>>> len(dataloader)
4

id_left :np.ndarray¶: id_left getter.

label :np.ndarray¶: label getter.

__len__(self)¶: Get the total number of batches.

init_epoch(self)¶: Resample, shuffle or sort the dataset for a new epoch.

__iter__(self)¶: Iteration.

_handle_callbacks_on_batch_unpacked(self, x, y)¶

matchzoo.dataloader.dataloader.mz_collate(batch)¶: Put each data field into an array with outer dimension batch size.

matchzoo.dataloader.dataloader¶

Module Contents¶

`matchzoo.dataloader.dataloader`¶