textfier.stream.tokenizer¶
Tokenization-based utilities, such as sentence- and word-level tokenizers.
- textfier.stream.tokenizer.tokenize_to_sentences(text: str, language: Optional[str] = 'portuguese')¶
Tokenizes text into sentence-level.
- Parameters
text – String holding the text to be tokenized.
language – Identifier of tokenizer’s language.
- Returns
Sentence-level tokens.
- Return type
(List[str])
- textfier.stream.tokenizer.tokenize_to_words(text: str, language: Optional[str] = 'portuguese')¶
Tokenizes text into word-level.
- Parameters
text – String holding the text to be tokenized.
language – Identifier of tokenizer’s language.
- Returns
Word-level tokens.
- Return type
(List[str])