Models

baseRNN

A base class for RNN.

class machine.models.baseRNN.BaseRNN(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)[source]

Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – maximum allowed length for the sequence to be processed
  • hidden_size (int) – number of features in the hidden state h
  • input_dropout_p (float) – dropout probability for the input sequence
  • dropout_p (float) – dropout probability for the output sequence
  • n_layers (int) – number of recurrent layers
  • rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)
Inputs: *args, **kwargs
  • *args: variable length argument list.
  • **kwargs: arbitrary keyword arguments.
Variables:
  • SYM_MASK – masking symbol
  • SYM_EOS – end-of-sequence symbol
forward(*args, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

EncoderRNN

class machine.models.EncoderRNN.EncoderRNN(vocab_size, max_len, hidden_size, embedding_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False)[source]

Applies a multi-layer RNN to an input sequence.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – a maximum allowed length for the sequence to be processed
  • hidden_size (int) – the number of features in the hidden state h
  • embedding_size (int) – the size of the embedding of input variables
  • input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
  • dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
  • n_layers (int, optional) – number of recurrent layers (default: 1)
  • bidirectional (bool, optional) – if True, becomes a bidirectional encoder (default False)
  • rnn_cell (str, optional) – type of RNN cell (default: gru)
  • variable_lengths (bool, optional) – if use variable length RNN (default: False)
Inputs: inputs, input_lengths
  • inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
  • input_lengths (list of int, optional): list that contains the lengths of sequences
    in the mini-batch, it must be provided when using variable length RNN (default: None)
Outputs: output, hidden
  • output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
  • hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h

Examples:

>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size)
>>> output, hidden = encoder(input)
forward(input_var, hidden=None, input_lengths=None)[source]

Applies a multi-layer RNN to an input sequence.

Parameters:
  • input_var (batch, seq_len) – tensor containing the features of the input sequence.
  • input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch
  • **hidden** – Tuple of (h_0, c_0), each of shape (num_layers * num_directions, batch, hidden_size) where h_0 is tensor containing the initial hidden state, and c_0 is a tensor containing the initial cell state for for each element in the batch. If none is provided then defaults to zero
Returns: output, hidden
  • output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
  • hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h

DecoderRNN

class machine.models.DecoderRNN.DecoderRNN(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False, attention_method=None, full_focus=False)[source]

Provides functionality for decoding in a seq2seq framework, with an option for attention.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – a maximum allowed length for the sequence to be processed
  • hidden_size (int) – the number of features in the hidden state h
  • sos_id (int) – index of the start of sentence symbol
  • eos_id (int) – index of the end of sentence symbol
  • n_layers (int, optional) – number of recurrent layers (default: 1)
  • rnn_cell (str, optional) – type of RNN cell (default: gru)
  • bidirectional (bool, optional) – if the encoder is bidirectional (default False)
  • input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
  • dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
  • use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)
  • full_focus (bool, optional) – flag indication whether to use full attention mechanism or not (default: false)
Variables:
  • KEY_ATTN_SCORE (str) – key used to indicate attention weights in ret_dict
  • KEY_LENGTH (str) – key used to indicate a list representing lengths of output sequences in ret_dict
  • KEY_SEQUENCE (str) – key used to indicate a list of sequences in ret_dict
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
  • inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
  • encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
  • encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
  • function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
  • teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.
forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_step(input_var, hidden, encoder_outputs, function, **kwargs)[source]

Performs one or multiple forward decoder steps.

Parameters:
  • input_var (torch.tensor) – Variable containing the input(s) to the decoder RNN
  • hidden (torch.tensor) – Variable containing the previous decoder hidden state.
  • encoder_outputs (torch.tensor) – Variable containing the target outputs of the decoder RNN
  • function (torch.tensor) – Activation function over the last output of the decoder RNN at every time step.
Returns:

The output softmax distribution at every time step of the decoder RNN hidden: The hidden state at every time step of the decoder RNN attn: The attention distribution at every time step of the decoder RNN

Return type:

predicted_softmax

TopKDecoder

class machine.models.TopKDecoder.TopKDecoder(decoder_rnn, k)[source]

Top-K decoding with beam search.

Parameters:
  • decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding.
  • k (int) – Size of the beam.
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
  • inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
  • encoder_hidden (batch, seq_len, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
  • encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
  • function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
  • teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.
forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0, retain_output_probs=True)[source]

Forward rnn for MAX_LENGTH steps. Look at machine.models.DecoderRNN.DecoderRNN.forward_rnn() for details.

attention

class machine.models.attention.Attention(dim, method)[source]

Applies an attention mechanism on the output features from the decoder.

\[egin{array}{ll} x = context*output \ attn = exp(x_i) / sum_j exp(x_j) \ output = anh(w * (attn * context) + b * output) \end{array}\]
Parameters:
  • dim (int) – The number of expected features in the output
  • method (str) – The method to compute the alignment, mlp or dot
Inputs: output, context
  • output (batch, output_len, dimensions): tensor containing the output features from the decoder.
  • context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.
Outputs: output, attn
  • output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
  • attn (batch, output_len, input_len): tensor containing attention weights.
Variables:
  • mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor.
  • method (torch.nn.Module) – layer that implements the method of computing the attention vector

Examples:

>>> attention = machine.models.Attention(256)
>>> context = torch.randn(5, 3, 256)
>>> output = torch.randn(5, 5, 256)
>>> output, attn = attention(output, context)
forward(decoder_states, encoder_states, **attention_method_kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_method(method, dim)[source]

Set method to compute attention

set_mask(mask)[source]

Sets indices to be masked

Parameters:mask (torch.Tensor) – tensor containing indices to be masked
class machine.models.attention.Concat(dim)[source]

Implements the computation of attention by applying an MLP to the concatenation of the decoder and encoder hidden states.

forward(decoder_states, encoder_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class machine.models.attention.Dot[source]
forward(decoder_states, encoder_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class machine.models.attention.MLP(dim)[source]
forward(decoder_states, encoder_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machine

class machine.models.seq2seq.Seq2seq(encoder, decoder, decode_function=<function log_softmax>)[source]

Standard sequence-to-sequence architecture with configurable encoder and decoder.

flatten_parameters()[source]

Flatten parameters of all components in the model.

forward(inputs, input_lengths=None, targets={}, teacher_forcing_ratio=0)[source]
Inputs: inputs, input_lengths, targets, teacher_forcing_ratio
  • inputs (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is passed to the encoder module.
  • input_lengths (list of int, optional): A list that contains the lengths of sequences
    in the mini-batch, it must be provided when using variable length RNN (default: None)
  • targets (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
  • teacher_forcing_ratio (float, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.