update with #s of sentences/tokens (#6546)

This commit is contained in:
Jim Regan
2020-08-17 21:48:05 +01:00
committed by GitHub
parent 63144701ed
commit fb7330b30e

View File

@@ -15,6 +15,8 @@ tags:
* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish) * Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish)
* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler) * Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler)
(2125804 sentences, 47419062 tokens, as reckoned by wc)
``` ```
from transformers import pipeline from transformers import pipeline
fill_mask = pipeline("fill-mask", model="jimregan/BERTreach", tokenizer="jimregan/BERTreach") fill_mask = pipeline("fill-mask", model="jimregan/BERTreach", tokenizer="jimregan/BERTreach")