framenet_tools.span_identification package

Submodules

framenet_tools.span_identification.spanidentifier module

class framenet_tools.span_identification.spanidentifier.SpanIdentifier(cM: framenet_tools.config.ConfigManager)

Bases: object

The Span Identifier for predicting possible role spans of a given sentence

Includes multiple ways of predicting:
-static -using allennlp -using a bilstm
generate_BIO_tags(annotation: framenet_tools.data_handler.annotation.Annotation)

Generates a list of (B)egin-, (I)nside-, (O)utside- tags for a given annotation.

Parameters:annotation – The annotation to convert
Returns:A list of BIO-tags
get_dataset(annotations: List[List[framenet_tools.data_handler.annotation.Annotation]])

Loads the dataset and combines the necessary data

Parameters:annotations – A List of all annotations containing all sentences
Returns:xs: A list of senctences appended with its FEE ys: A list of frames corresponding to the given sentences
get_dataset_comb(m_reader: framenet_tools.data_handler.reader.DataReader)

Generates sentences with their BIO-tags

Parameters:m_reader – The DataReader to create the dataset from
Returns:A pair of concurrent lists containing the sequences and their labels
load()

Loads the saved model of the span identification network

Returns:
predict_spans(m_reader: framenet_tools.data_handler.reader.DataReader)

Predicts the spans of the currently loaded dataset. The predictions are saved in the annotations.

NOTE: All loaded spans and roles are overwritten!

Returns:
prepare_dataset(xs: List[str], ys: List[str], batch_size: int = None)

Prepares the dataset and returns a BucketIterator of the dataset

Parameters:
  • batch_size – The batch_size to which the dataset will be prepared
  • xs – A list of sentences
  • ys – A list of frames corresponding to the given sentences
Returns:

A BucketIterator of the dataset

query(embedded_sentence: List[float], annotation: framenet_tools.data_handler.annotation.Annotation, pos_tags: List[str], use_static: bool = True)

Predicts a possible span set for a given sentence.

NOTE: This can be done static (only using syntax) or via an LSTM.

Parameters:
  • pos_tags – The postags of the sentence
  • embedded_sentence – The embedded words of the sentence
  • annotation – The annotation of the sentence to predict
  • use_static – True uses the syntactic static version, otherwise the NN
Returns:

A list of possible span tuples

query_all(annotation: framenet_tools.data_handler.annotation.Annotation)

Returns all possible spans of a sentence. Therefore all correct spans are predicted, achieving a perfect Recall score, but close to 0 in Precision.

NOTE: This creates a power set! Meaning there will be 2^N elements returned (N: words in senctence).

Parameters:annotation – The annotation of the sentence to predict
Returns:A list of ALL possible span tuples
query_nn(embedded_sentence: List[float], annotation: framenet_tools.data_handler.annotation.Annotation, pos_tags: List[str])

Predicts the possible spans using the LSTM.

NOTE: In order to use this, the network must be trained beforehand

Parameters:
  • pos_tags – The postags of the sentence
  • embedded_sentence – The embedded words of the sentence
  • annotation – The annotation of the sentence to predict
Returns:

A list of possible span tuples

query_static(annotation: framenet_tools.data_handler.annotation.Annotation)

Predicts the set of possible spans just by the use of the static syntax tree.

NOTE: deprecated!

Parameters:annotation – The annotation of the sentence to predict
Returns:A list of possible span tuples
to_one_hot(l: List[int])

Helper Function that converts a list of numerals into a list of one-hot encoded vectors

Parameters:l – The list to convert
Returns:A list of one-hot vectors
train(mReader, mReaderDev)

Trains the model on all of the given annotations.

Parameters:annotations – A list of all annotations to train the model from
Returns:
traverse_syntax_tree(node: <MagicMock name='mock.Token' id='139663473999488'>)

Traverses a list, starting from a given node and returns all spans of all its subtrees.

NOTE: Recursive

Parameters:node – The node to start from
Returns:A list of spans of all subtrees

framenet_tools.span_identification.spanidnetwork module

class framenet_tools.span_identification.spanidnetwork.SpanIdNetwork(cM: framenet_tools.config.ConfigManager, num_classes: int)

Bases: object

eval_dev(xs: List[<MagicMock id='139663474125512'>] = None, ys: List[List[int]] = None)

Evaluates the model directly on the a prepared dataset

Parameters:
  • xs – The development sequences, given as a list of tensors
  • ys – The labels of the sequence
Returns:

load_model(path: str)

Loads the model from a given path

Parameters:path – The path from where to load the model
Returns:
predict(sent: List[int])

Predicts the BIO-Tags of a given sentence.

Parameters:sent – The sentence to predict (already converted by the vocab)
Returns:A list of possibilities for each word for each tag
reset_hidden()

Resets the hidden states of the LSTM.

Returns:
save_model(path: str)

Saves the current model at the given path

Parameters:path – The path to save the model at
Returns:
train_model(xs: List[<MagicMock id='139663486415704'>], ys: List[List[int]], dev_xs: List[<MagicMock id='139663474155472'>] = None, dev_ys: List[List[int]] = None)

Trains the model with the given dataset Uses the model specified in net

Parameters:
  • xs – The training sequences, given as a list of tensors
  • ys – The labels of the sequences
  • dev_xs – The development sequences, given as a list of tensors
  • dev_ys – The labels of the sequences
Returns:

Module contents