Fix grammar in tokenizer_summary (#15614)
"to make ensure" is redundant.
This commit is contained in:
@@ -219,7 +219,7 @@ equivalent to finding the symbol pair, whose probability divided by the probabil
|
|||||||
its second symbol is the greatest among all symbol pairs. *E.g.* `"u"`, followed by `"g"` would have only been
|
its second symbol is the greatest among all symbol pairs. *E.g.* `"u"`, followed by `"g"` would have only been
|
||||||
merged if the probability of `"ug"` divided by `"u"`, `"g"` would have been greater than for any other symbol
|
merged if the probability of `"ug"` divided by `"u"`, `"g"` would have been greater than for any other symbol
|
||||||
pair. Intuitively, WordPiece is slightly different to BPE in that it evaluates what it _loses_ by merging two symbols
|
pair. Intuitively, WordPiece is slightly different to BPE in that it evaluates what it _loses_ by merging two symbols
|
||||||
to make ensure it's _worth it_.
|
to ensure it's _worth it_.
|
||||||
|
|
||||||
<a id='unigram'></a>
|
<a id='unigram'></a>
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user