Adding skip_special_tokens=True to FillMaskPipeline (#9783)

* We most likely don't want special tokens in this output.

* Adding `skip_special_tokens=True` to FillMaskPipeline

- It's backward incompatible.
- It makes for sense for pipelines to remove references to
special_tokens (all of the other pipelines do that).
- Keeping special tokens makes it hard for users to actually remove them
  because all models have different tokens (<s>, <cls>, [CLS], ....)

* Fixing `token_str` in the same vein, and actually fix the tests too !
This commit is contained in:
Nicolas Patry
2021-01-26 10:06:28 +01:00
committed by GitHub
parent 1867d9a8d7
commit 781e4b1384
2 changed files with 37 additions and 26 deletions

View File

@@ -179,10 +179,10 @@ class FillMaskPipeline(Pipeline):
tokens = tokens[np.where(tokens != self.tokenizer.pad_token_id)]
result.append(
{
"sequence": self.tokenizer.decode(tokens),
"sequence": self.tokenizer.decode(tokens, skip_special_tokens=True),
"score": v,
"token": p,
"token_str": self.tokenizer.convert_ids_to_tokens(p),
"token_str": self.tokenizer.decode(p),
}
)