framenet_tools.data_handler package¶
Submodules¶
framenet_tools.data_handler.annotation module¶
-
class
framenet_tools.data_handler.annotation.
Annotation
(frame: str = 'Default', fee: str = None, position: int = None, fee_raw: str = None, sentence: List[str] = [], roles: List[str] = [], role_positions: List[Tuple[int, int]] = [])¶ Bases:
object
Annotation class
Saves and manages all data of one frame for a given sentence.
-
create_handle
()¶ Helper function for ease of programmatic comparison
NOTE: FEE is not compared due to possible differences during preprocessing!
Returns: A handle consisting of all data saved in this object
-
framenet_tools.data_handler.frame_embedding_manager module¶
-
class
framenet_tools.data_handler.frame_embedding_manager.
FrameEmbeddingManager
(path: str = 'data/frame_embeddings/dict_frame_to_emb_100dim_wsb_list.txt')¶ Bases:
object
Loads and provides the specified frame-embeddings
-
embed
(frame: str)¶ Converts a given frame to its embedding
Parameters: frame – The frame to embed Returns: The embedding (n-dimensional vector)
-
read_frame_embeddings
()¶ Loads the previously specified frame embedding file into a dictionary
-
string_to_array
(string: str)¶ Helper function Converts a string of an array back into an array
NOTE: specified for float arrays !!!
Parameters: string – The string of an array Returns: The array
-
framenet_tools.data_handler.rawreader module¶
-
class
framenet_tools.data_handler.rawreader.
RawReader
(cM: framenet_tools.config.ConfigManager, raw_path: str = None)¶ Bases:
framenet_tools.data_handler.reader.DataReader
A reader for raw text files.
Inherits from DataReader
-
read_raw_text
(raw_path: str = None)¶ Reads a raw text file and saves the content as a dataset
NOTE: Applying this function removes the previous dataset content
Parameters: raw_path – The path of the file to read Returns:
-
framenet_tools.data_handler.reader module¶
-
class
framenet_tools.data_handler.reader.
DataReader
(cM: framenet_tools.config.ConfigManager)¶ Bases:
object
The top-level DataReader
Stores all loaded data from every reader.
-
embed_frame
(frame: str)¶ Embeds a single frame.
NOTE: if the embeddings of the frame can not be found, a random set of values is generated.
Parameters: frame – The frame to embed Returns: The embedding of the frame
-
embed_frames
(force: bool = False)¶ Embeds all the sentences that are currently loaded.
NOTE: if forced, overrides embedded data inside of the annotation objects
Parameters: force – If true, embeddings are generate even if they already exist Returns:
-
embed_word
(word: str)¶ Embeds a single word
Parameters: word – The word to embed Returns: The vector of the embedding
-
embed_words
(force: bool = False)¶ Embeds all words of all sentences that are currently saved in “sentences”.
NOTE: Can erase all previously embedded data!
Parameters: force – If true, all previously saved embeddings will be overwritten! Returns:
-
export_to_json
(path: str)¶ Exports the list of annotations to a json file
Parameters: path – The path of the json file Returns:
Generates the POS-tags of all sentences that are currently saved.
Parameters: force – If true, the POS-tags will overwrite previously saved tags. Returns:
-
get_annotations
(sentence: List[str] = None)¶ Returns the annotation object for a given sentence.
Parameters: sentence – The sentence to retrieve the annotations for. Returns: A annoation object
-
loaded
(is_annotated: bool)¶ Helper for setting flags
Parameters: is_annotated – flag if loaded data was annotated Returns:
-
framenet_tools.data_handler.semaforreader module¶
-
class
framenet_tools.data_handler.semaforreader.
SemaforReader
(cM: framenet_tools.config.ConfigManager, path_sent: str = None, path_elements: str = None)¶ Bases:
framenet_tools.data_handler.reader.DataReader
A reader for the Semafor ConLL format
Inherits from DataReader
-
digest_raw_data
(elements: list, sentences: list)¶ Converts the raw elements and sentences into a nicely structured dataset
NOTE: This representation is meant to match the one in the “frames-files”
Parameters: - elements – the annotation data of the given sentences
- sentences – the sentences to digest
Returns:
-
digest_role_data
(element: str)¶ Parses a string of role information into the desired format
Parameters: element – The string containing the role data Returns: A pair of two concurrent lists containing the roles and their spans
-
read_data
(path_sent: str = None, path_elements: str = None)¶ Reads a the sentence and elements file and saves the content as a dataset
NOTE: Applying this function removes the previous dataset content
Parameters: - path_sent – The path to the sentence file
- path_elements – The path to the elements
Returns:
-
framenet_tools.data_handler.semevalreader module¶
-
class
framenet_tools.data_handler.semevalreader.
SemevalReader
(cM: framenet_tools.config.ConfigManager, path_xml: str = None)¶ Bases:
framenet_tools.data_handler.reader.DataReader
A reader for the Semeval format.
Inherits from DataReader
-
digest_tree
(root: <module 'xml.etree.ElementTree' from '/home/docs/.pyenv/versions/3.7.3/lib/python3.7/xml/etree/ElementTree.py'>)¶ Parses the xml-tree into a DataReader object.
Parameters: root – The root node of the tree Returns:
-
read_data
(path_xml: str = None)¶ Reads a xml file and parses it into the datareader format.
NOTE: Applying this function removes the previous dataset content
Parameters: path_xml – The path of the xml file Returns:
-
-
framenet_tools.data_handler.semevalreader.
char_pos_to_sentence_pos
(start_char: int, end_char: int, words: List[str])¶ Converts positions of char spans in a sentence into word positions.
NOTE: Returned end position is represented inclusive!
Parameters: - start_char – The first character of the span
- end_char – The last character of the span
- words – A list of words in a sentence
Returns: The start and end position of the WORD in the sentence
framenet_tools.data_handler.word_embedding_manager module¶
-
class
framenet_tools.data_handler.word_embedding_manager.
WordEmbeddingManager
(path: str = 'data/word_embeddings/levy_deps_300.w2vt')¶ Bases:
object
Loads and provides the specified word-embeddings
-
embed
(word: str)¶ Converts a given word to its embedding
Parameters: word – The word to embed Returns: The embedding (n-dimensional vector)
-
read_word_embeddings
()¶ Loads the previously specified frame embedding file into a dictionary
-
string_to_array
(strings: List[str])¶ Helper function Converts a string of an array back into an array
NOTE: specified for float arrays !!!
Parameters: strings – The strings of an array Returns: The array
-