Skip to content

API documentation

Replacer

Helper function, returns a list of tokens with (N)grams connected

Parameters:

Name Type Description Default
sentence

sentence in form of tokens with grams

required
bigrams

a mapper of (t1, t2) => t1_t2

required
window_size

how many tokens to be considers: default 2

2

Returns:

Name Type Description
sentence SentenceType

sentence in form of tokens with (N)grams

Usage:

from bigrams import replacer

bigrams = {("new", "york")}
in_sentence = ["this", "is", "new", "york", "baby", "again!"]
out_sentence = replacer(sentence=in_sentence,
                        bigrams=bigrams,
                        window_size=2,
                )
assert out_sentence == ["this", "is", "new_york", "baby", "again!"]

Grams

Grams allows you to transform a list of tokens into a list of (N)grams tokens Arguments: threshold: how many times should tokens appears together to be connected as ngrams window_size: the N in (N)gram. how many words should be considered. defaults = 2 Usage:

from bigrams import Grams
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
             ["new", "york", "and", "baby", "again!"],
            ]
g = Grams(window_size=2, threshold=2)
out_sentences = g.fit_transform(in_stences)
out_sentences