update with #s of sentences/tokens (#6546)
This commit is contained in:
@@ -15,6 +15,8 @@ tags:
|
||||
* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish)
|
||||
* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler)
|
||||
|
||||
(2125804 sentences, 47419062 tokens, as reckoned by wc)
|
||||
|
||||
```
|
||||
from transformers import pipeline
|
||||
fill_mask = pipeline("fill-mask", model="jimregan/BERTreach", tokenizer="jimregan/BERTreach")
|
||||
|
||||
Reference in New Issue
Block a user