matchzoo.preprocessors.units.ngram_letter
¶
Module Contents¶
-
class
matchzoo.preprocessors.units.ngram_letter.
NgramLetter
(ngram: int = 3, reduce_dim: bool = True)¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for n-letter generation.
Triletter is used in
DSSMModel
. This processor is expected to execute before Vocab has been created.Examples
>>> triletter = NgramLetter() >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 9 >>> rv ['#he', 'hel', 'ell', 'llo', 'lo#', '#wo', 'wor', 'ord', 'rd#'] >>> triletter = NgramLetter(reduce_dim=False) >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 2 >>> rv [['#he', 'hel', 'ell', 'llo', 'lo#'], ['#wo', 'wor', 'ord', 'rd#']]
-
transform
(self, input_: list)¶ Transform token into tri-letter.
For example, word should be represented as #wo, wor, ord and rd#.
Parameters: input – list of tokens to be transformed. Return n_letters: generated n_letters.
-