Dataset¶
We use torchtext to manage data loading and processing. For more information about torchtext, please go to: https://github.com/pytorch/text
Fields¶
-
class
machine.dataset.fields.
SourceField
(**kwargs)[source]¶ Wrapper class of torchtext.data.Field that forces batch_first and include_lengths to be True.
Variables: eos_id – index of the end of sentence symbol. -
build_vocab
(*args, **kwargs)[source]¶ Construct the Vocab object for this field from one or more datasets.
Parameters: - arguments (Positional) – Dataset objects or other iterable data sources from which to construct the Vocab object that represents the set of possible values for this field. If a Dataset object is provided, all columns corresponding to this field are used; individual columns can also be provided directly.
- keyword arguments (Remaining) – Passed to the constructor of Vocab.
-
-
class
machine.dataset.fields.
TargetField
(include_eos=True, **kwargs)[source]¶ Wrapper class of torchtext.data.Field that forces batch_first to be True and prepend <sos> and append <eos> to sequences in preprocessing step.
Variables: - sos_id – index of the start of sentence symbol
- eos_id – index of the end of sentence symbol
-
SYM_EOS
= '<eos>'¶
-
SYM_SOS
= '<sos>'¶
-
build_vocab
(*args, **kwargs)[source]¶ Construct the Vocab object for this field from one or more datasets.
Parameters: - arguments (Positional) – Dataset objects or other iterable data sources from which to construct the Vocab object that represents the set of possible values for this field. If a Dataset object is provided, all columns corresponding to this field are used; individual columns can also be provided directly.
- keyword arguments (Remaining) – Passed to the constructor of Vocab.
-
include_eos
= True¶