Models¶
baseRNN¶
A base class for RNN.
-
class
machine.models.baseRNN.
BaseRNN
(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)[source]¶ Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – maximum allowed length for the sequence to be processed
- hidden_size (int) – number of features in the hidden state h
- input_dropout_p (float) – dropout probability for the input sequence
- dropout_p (float) – dropout probability for the output sequence
- n_layers (int) – number of recurrent layers
- rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)
- Inputs:
*args
,**kwargs
*args
: variable length argument list.**kwargs
: arbitrary keyword arguments.
Variables: - SYM_MASK – masking symbol
- SYM_EOS – end-of-sequence symbol
-
forward
(*args, **kwargs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
EncoderRNN¶
-
class
machine.models.EncoderRNN.
EncoderRNN
(vocab_size, max_len, hidden_size, embedding_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False)[source]¶ Applies a multi-layer RNN to an input sequence.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – a maximum allowed length for the sequence to be processed
- hidden_size (int) – the number of features in the hidden state h
- embedding_size (int) – the size of the embedding of input variables
- input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
- dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
- n_layers (int, optional) – number of recurrent layers (default: 1)
- bidirectional (bool, optional) – if True, becomes a bidirectional encoder (default False)
- rnn_cell (str, optional) – type of RNN cell (default: gru)
- variable_lengths (bool, optional) – if use variable length RNN (default: False)
- Inputs: inputs, input_lengths
- inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
- input_lengths (list of int, optional): list that contains the lengths of sequences
- in the mini-batch, it must be provided when using variable length RNN (default: None)
- Outputs: output, hidden
- output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
- hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h
Examples:
>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size) >>> output, hidden = encoder(input)
-
forward
(input_var, hidden=None, input_lengths=None)[source]¶ Applies a multi-layer RNN to an input sequence.
Parameters: - input_var (batch, seq_len) – tensor containing the features of the input sequence.
- input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch
- **hidden** – Tuple of (h_0, c_0), each of shape (num_layers * num_directions, batch, hidden_size) where h_0 is tensor containing the initial hidden state, and c_0 is a tensor containing the initial cell state for for each element in the batch. If none is provided then defaults to zero
- Returns: output, hidden
- output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
- hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h
DecoderRNN¶
-
class
machine.models.DecoderRNN.
DecoderRNN
(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False, attention_method=None, full_focus=False)[source]¶ Provides functionality for decoding in a seq2seq framework, with an option for attention.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – a maximum allowed length for the sequence to be processed
- hidden_size (int) – the number of features in the hidden state h
- sos_id (int) – index of the start of sentence symbol
- eos_id (int) – index of the end of sentence symbol
- n_layers (int, optional) – number of recurrent layers (default: 1)
- rnn_cell (str, optional) – type of RNN cell (default: gru)
- bidirectional (bool, optional) – if the encoder is bidirectional (default False)
- input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
- dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
- use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)
- full_focus (bool, optional) – flag indication whether to use full attention mechanism or not (default: false)
Variables: - Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
- inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
- encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
- encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
- function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
- teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.
-
forward
(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
forward_step
(input_var, hidden, encoder_outputs, function, **kwargs)[source]¶ Performs one or multiple forward decoder steps.
Parameters: - input_var (torch.tensor) – Variable containing the input(s) to the decoder RNN
- hidden (torch.tensor) – Variable containing the previous decoder hidden state.
- encoder_outputs (torch.tensor) – Variable containing the target outputs of the decoder RNN
- function (torch.tensor) – Activation function over the last output of the decoder RNN at every time step.
Returns: The output softmax distribution at every time step of the decoder RNN hidden: The hidden state at every time step of the decoder RNN attn: The attention distribution at every time step of the decoder RNN
Return type: predicted_softmax
TopKDecoder¶
-
class
machine.models.TopKDecoder.
TopKDecoder
(decoder_rnn, k)[source]¶ Top-K decoding with beam search.
Parameters: - decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding.
- k (int) – Size of the beam.
- Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
- inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
- encoder_hidden (batch, seq_len, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
- encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
- function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
- teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.
attention¶
-
class
machine.models.attention.
Attention
(dim, method)[source]¶ Applies an attention mechanism on the output features from the decoder.
\[egin{array}{ll} x = context*output \ attn = exp(x_i) / sum_j exp(x_j) \ output = anh(w * (attn * context) + b * output) \end{array}\]Parameters: - Inputs: output, context
- output (batch, output_len, dimensions): tensor containing the output features from the decoder.
- context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.
- Outputs: output, attn
- output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
- attn (batch, output_len, input_len): tensor containing attention weights.
Variables: - mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor.
- method (torch.nn.Module) – layer that implements the method of computing the attention vector
Examples:
>>> attention = machine.models.Attention(256) >>> context = torch.randn(5, 3, 256) >>> output = torch.randn(5, 5, 256) >>> output, attn = attention(output, context)
-
forward
(decoder_states, encoder_states, **attention_method_kwargs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
set_mask
(mask)[source]¶ Sets indices to be masked
Parameters: mask (torch.Tensor) – tensor containing indices to be masked
-
class
machine.models.attention.
Concat
(dim)[source]¶ Implements the computation of attention by applying an MLP to the concatenation of the decoder and encoder hidden states.
-
forward
(decoder_states, encoder_states)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
machine.models.attention.
Dot
[source]¶ -
forward
(decoder_states, encoder_states)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
machine.models.attention.
MLP
(dim)[source]¶ -
forward
(decoder_states, encoder_states)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
machine¶
-
class
machine.models.seq2seq.
Seq2seq
(encoder, decoder, decode_function=<function log_softmax>)[source]¶ Standard sequence-to-sequence architecture with configurable encoder and decoder.
-
forward
(inputs, input_lengths=None, targets={}, teacher_forcing_ratio=0)[source]¶ - Inputs: inputs, input_lengths, targets, teacher_forcing_ratio
- inputs (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is passed to the encoder module.
- input_lengths (list of int, optional): A list that contains the lengths of sequences
- in the mini-batch, it must be provided when using variable length RNN (default: None)
- targets (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
- teacher_forcing_ratio (float, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.
-