Models¶

baseRNN¶

A base class for RNN.

class machine.models.baseRNN.BaseRNN(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)[source]¶

Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – maximum allowed length for the sequence to be processed
hidden_size (int) – number of features in the hidden state h
input_dropout_p (float) – dropout probability for the input sequence
dropout_p (float) – dropout probability for the output sequence
n_layers (int) – number of recurrent layers
rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)

Inputs: *args, **kwargs

*args: variable length argument list.
**kwargs: arbitrary keyword arguments.

Variables:	SYM_MASK – masking symbol SYM_EOS – end-of-sequence symbol

forward(*args, **kwargs)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

EncoderRNN¶

class machine.models.EncoderRNN.EncoderRNN(vocab_size, max_len, hidden_size, embedding_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False)[source]¶

Applies a multi-layer RNN to an input sequence.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – a maximum allowed length for the sequence to be processed
hidden_size (int) – the number of features in the hidden state h
embedding_size (int) – the size of the embedding of input variables
input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
n_layers (int, optional) – number of recurrent layers (default: 1)
bidirectional (bool, optional) – if True, becomes a bidirectional encoder (default False)
rnn_cell (str, optional) – type of RNN cell (default: gru)
variable_lengths (bool, optional) – if use variable length RNN (default: False)

Inputs: inputs, input_lengths

inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
input_lengths (list of int, optional): list that contains the lengths of sequences

in the mini-batch, it must be provided when using variable length RNN (default: None)

Outputs: output, hidden

output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h

Examples:

>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size)
>>> output, hidden = encoder(input)

forward(input_var, hidden=None, input_lengths=None)[source]¶

Applies a multi-layer RNN to an input sequence.

Parameters:

input_var (batch, seq_len) – tensor containing the features of the input sequence.
input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch
**hidden** – Tuple of (h_0, c_0), each of shape (num_layers * num_directions, batch, hidden_size) where h_0 is tensor containing the initial hidden state, and c_0 is a tensor containing the initial cell state for for each element in the batch. If none is provided then defaults to zero

Returns: output, hidden

output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h

DecoderRNN¶

class machine.models.DecoderRNN.DecoderRNN(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False, attention_method=None, full_focus=False)[source]¶

Provides functionality for decoding in a seq2seq framework, with an option for attention.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – a maximum allowed length for the sequence to be processed
hidden_size (int) – the number of features in the hidden state h
sos_id (int) – index of the start of sentence symbol
eos_id (int) – index of the end of sentence symbol
n_layers (int, optional) – number of recurrent layers (default: 1)
rnn_cell (str, optional) – type of RNN cell (default: gru)
bidirectional (bool, optional) – if the encoder is bidirectional (default False)
input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)
full_focus (bool, optional) – flag indication whether to use full attention mechanism or not (default: false)

Variables:

KEY_ATTN_SCORE (str) – key used to indicate attention weights in ret_dict
KEY_LENGTH (str) – key used to indicate a list representing lengths of output sequences in ret_dict
KEY_SEQUENCE (str) – key used to indicate a list of sequences in ret_dict

Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio

inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).

Outputs: decoder_outputs, decoder_hidden, ret_dict

decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.

forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_step(input_var, hidden, encoder_outputs, function, **kwargs)[source]¶

Performs one or multiple forward decoder steps.

Parameters:	input_var (torch.tensor) – Variable containing the input(s) to the decoder RNN hidden (torch.tensor) – Variable containing the previous decoder hidden state. encoder_outputs (torch.tensor) – Variable containing the target outputs of the decoder RNN function (torch.tensor) – Activation function over the last output of the decoder RNN at every time step.
Returns:	The output softmax distribution at every time step of the decoder RNN hidden: The hidden state at every time step of the decoder RNN attn: The attention distribution at every time step of the decoder RNN
Return type:	predicted_softmax

TopKDecoder¶

class machine.models.TopKDecoder.TopKDecoder(decoder_rnn, k)[source]¶

Top-K decoding with beam search.

Parameters:	decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding. k (int) – Size of the beam.

Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio

inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
encoder_hidden (batch, seq_len, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).

Outputs: decoder_outputs, decoder_hidden, ret_dict

decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.

forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0, retain_output_probs=True)[source]¶: Forward rnn for MAX_LENGTH steps. Look at machine.models.DecoderRNN.DecoderRNN.forward_rnn() for details.

attention¶

class machine.models.attention.Attention(dim, method)[source]¶

Applies an attention mechanism on the output features from the decoder.

\[egin{array}{ll} x = context*output \ attn = exp(x_i) / sum_j exp(x_j) \ output = anh(w * (attn * context) + b * output) \end{array}\]

Parameters:	dim (int) – The number of expected features in the output method (str) – The method to compute the alignment, mlp or dot

Inputs: output, context

output (batch, output_len, dimensions): tensor containing the output features from the decoder.
context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.

Outputs: output, attn

output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
attn (batch, output_len, input_len): tensor containing attention weights.

Variables:	mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor. method (torch.nn.Module) – layer that implements the method of computing the attention vector

Examples:

>>> attention = machine.models.Attention(256)
>>> context = torch.randn(5, 3, 256)
>>> output = torch.randn(5, 5, 256)
>>> output, attn = attention(output, context)

forward(decoder_states, encoder_states, **attention_method_kwargs)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_method(method, dim)[source]¶: Set method to compute attention

set_mask(mask)[source]¶

Sets indices to be masked

Parameters:	mask (torch.Tensor) – tensor containing indices to be masked

class machine.models.attention.Concat(dim)[source]¶

Implements the computation of attention by applying an MLP to the concatenation of the decoder and encoder hidden states.

forward(decoder_states, encoder_states)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class machine.models.attention.Dot[source]¶

forward(decoder_states, encoder_states)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class machine.models.attention.MLP(dim)[source]¶

forward(decoder_states, encoder_states)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machine¶

class machine.models.seq2seq.Seq2seq(encoder, decoder, decode_function=<function log_softmax>)[source]¶

Standard sequence-to-sequence architecture with configurable encoder and decoder.

flatten_parameters()[source]¶: Flatten parameters of all components in the model.

forward(inputs, input_lengths=None, targets={}, teacher_forcing_ratio=0)[source]¶

Inputs: inputs, input_lengths, targets, teacher_forcing_ratio

inputs (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is passed to the encoder module.
input_lengths (list of int, optional): A list that contains the lengths of sequences

in the mini-batch, it must be provided when using variable length RNN (default: None)
targets (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
teacher_forcing_ratio (float, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)

Outputs: decoder_outputs, decoder_hidden, ret_dict

outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.