API documentation
Replacer
Helper function, returns a list of tokens with (N)grams connected
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence |
sentence in form of tokens with grams |
required | |
bigrams |
a mapper of (t1, t2) => t1_t2 |
required | |
window_size |
how many tokens to be considers: default 2 |
2
|
Returns:
Name | Type | Description |
---|---|---|
sentence |
SentenceType
|
sentence in form of tokens with (N)grams |
Usage:
from bigrams import replacer
bigrams = {("new", "york")}
in_sentence = ["this", "is", "new", "york", "baby", "again!"]
out_sentence = replacer(sentence=in_sentence,
bigrams=bigrams,
window_size=2,
)
assert out_sentence == ["this", "is", "new_york", "baby", "again!"]
Grams
Grams allows you to transform a list of tokens into a list of (N)grams tokens Arguments: threshold: how many times should tokens appears together to be connected as ngrams window_size: the N in (N)gram. how many words should be considered. defaults = 2 Usage:
from bigrams import Grams
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
["new", "york", "and", "baby", "again!"],
]
g = Grams(window_size=2, threshold=2)
out_sentences = g.fit_transform(in_stences)
out_sentences