Dataset

We use torchtext to manage data loading and processing. For more information about torchtext, please go to: https://github.com/pytorch/text

Fields

class machine.dataset.fields.SourceField(**kwargs)[source]

Wrapper class of torchtext.data.Field that forces batch_first and include_lengths to be True.

Variables:eos_id – index of the end of sentence symbol.
build_vocab(*args, **kwargs)[source]

Construct the Vocab object for this field from one or more datasets.

Parameters:
  • arguments (Positional) – Dataset objects or other iterable data sources from which to construct the Vocab object that represents the set of possible values for this field. If a Dataset object is provided, all columns corresponding to this field are used; individual columns can also be provided directly.
  • keyword arguments (Remaining) – Passed to the constructor of Vocab.
class machine.dataset.fields.TargetField(include_eos=True, **kwargs)[source]

Wrapper class of torchtext.data.Field that forces batch_first to be True and prepend <sos> and append <eos> to sequences in preprocessing step.

Variables:
  • sos_id – index of the start of sentence symbol
  • eos_id – index of the end of sentence symbol
SYM_EOS = '<eos>'
SYM_SOS = '<sos>'
build_vocab(*args, **kwargs)[source]

Construct the Vocab object for this field from one or more datasets.

Parameters:
  • arguments (Positional) – Dataset objects or other iterable data sources from which to construct the Vocab object that represents the set of possible values for this field. If a Dataset object is provided, all columns corresponding to this field are used; individual columns can also be provided directly.
  • keyword arguments (Remaining) – Passed to the constructor of Vocab.
include_eos = True