From 73a0c25376aa1c4368f48c6220e90f5a6b3af13c Mon Sep 17 00:00:00 2001 From: Aleksei Lymar Date: Thu, 5 Mar 2020 17:57:53 +0300 Subject: [PATCH] remove excess line breaks in DeepPavlov model cards --- .../bert-base-bg-cs-pl-ru-cased/README.md | 8 ++------ .../bert-base-cased-conversational/README.md | 14 ++++---------- .../README.md | 15 ++++----------- .../rubert-base-cased-conversational/README.md | 11 +++-------- .../rubert-base-cased-sentence/README.md | 14 ++++---------- .../DeepPavlov/rubert-base-cased/README.md | 7 ++----- 6 files changed, 19 insertions(+), 50 deletions(-) diff --git a/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md b/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md index 7e4aa0c461..c97a86e9cf 100644 --- a/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md +++ b/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md @@ -8,11 +8,7 @@ language: # bert-base-bg-cs-pl-ru-cased -SlavicBERT\[1\] \(Slavic \(bg, cs, pl, ru\), cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained -on Russian News and four Wikipedias: Bulgarian, Czech, Polish, and Russian. -Subtoken vocabulary was built using this data. Multilingual BERT was used as an initialization for SlavicBERT. +SlavicBERT\[1\] \(Slavic \(bg, cs, pl, ru\), cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on Russian News and four Wikipedias: Bulgarian, Czech, Polish, and Russian. Subtoken vocabulary was built using this data. Multilingual BERT was used as an initialization for SlavicBERT. -\[1\]: Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. \(2019\). -[Tuning Multilingual Transformers for Language-Specific Named Entity Recognition](https://www.aclweb.org/anthology/W19-3712/). -ACL anthology W19-3712. +\[1\]: Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. \(2019\). [Tuning Multilingual Transformers for Language-Specific Named Entity Recognition](https://www.aclweb.org/anthology/W19-3712/). ACL anthology W19-3712. diff --git a/model_cards/DeepPavlov/bert-base-cased-conversational/README.md b/model_cards/DeepPavlov/bert-base-cased-conversational/README.md index 357527d232..a8fab25961 100644 --- a/model_cards/DeepPavlov/bert-base-cased-conversational/README.md +++ b/model_cards/DeepPavlov/bert-base-cased-conversational/README.md @@ -5,19 +5,13 @@ language: # bert-base-cased-conversational -Conversational BERT \(English, cased, 12-layer, 768-hidden, 12-heads, 110M parameters\) was trained -on the English part of Twitter, Reddit, DailyDialogues\[1\], OpenSubtitles\[2\], Debates\[3\], Blogs\[4\], -Facebook News Comments. We used this training data to build the vocabulary of English subtokens and took -English cased version of BERT-base as an initialization for English Conversational BERT. +Conversational BERT \(English, cased, 12‑layer, 768‑hidden, 12‑heads, 110M parameters\) was trained on the English part of Twitter, Reddit, DailyDialogues\[1\], OpenSubtitles\[2\], Debates\[3\], Blogs\[4\], Facebook News Comments. We used this training data to build the vocabulary of English subtokens and took English cased version of BERT‑base as an initialization for English Conversational BERT. -\[1\]: Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled -Multi-turn Dialogue Dataset. IJCNLP 2017. +\[1\]: Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. IJCNLP 2017. -\[2\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. -In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\) +\[2\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\) \[3\]: Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil. Proceedings of NAACL, 2016. -\[4\]: J. Schler, M. Koppel, S. Argamon and J. Pennebaker \(2006\). Effects of Age and Gender on Blogging -in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs. +\[4\]: J. Schler, M. Koppel, S. Argamon and J. Pennebaker \(2006\). Effects of Age and Gender on Blogging in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs. diff --git a/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md b/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md index 1e07210e77..e8d22dff30 100644 --- a/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md +++ b/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md @@ -5,18 +5,11 @@ language: # bert-base-multilingual-cased-sentence -Sentence Multilingual BERT \(101 languages, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) -is a representation-based sentence encoder for 101 languages of Multilingual BERT. -It is initialized with Multilingual BERT and then fine-tuned on english MultiNLI\[1\] and on dev set -of multilingual XNLI\[2\]. -Sentence representations are mean pooled token embeddings in the same manner as in Sentence-BERT\[3\]. +Sentence Multilingual BERT \(101 languages, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) is a representation‑based sentence encoder for 101 languages of Multilingual BERT. It is initialized with Multilingual BERT and then fine‑tuned on english MultiNLI\[1\] and on dev set of multilingual XNLI\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\]. -\[1\]: Williams A., Nangia N. & Bowman S. \(2017\) A Broad-Coverage Challenge Corpus for Sentence Understanding -through Inference. arXiv preprint [arXiv:1704.05426](https://arxiv.org/abs/1704.05426) +\[1\]: Williams A., Nangia N. & Bowman S. \(2017\) A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. arXiv preprint [arXiv:1704.05426](https://arxiv.org/abs/1704.05426) -\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. -arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053) +\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053) -\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. -arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084) +\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084) diff --git a/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md b/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md index 4ea20c2cd1..f0a2d211cf 100644 --- a/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md +++ b/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md @@ -5,14 +5,9 @@ language: # rubert-base-cased-conversational -Conversational RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained -on OpenSubtitles\[1\], [Dirty](https://d3.ru/), [Pikabu](https://pikabu.ru/), -and a Social Media segment of Taiga corpus\[2\]. We assembled a new vocabulary for Conversational RuBERT model -on this data and initialized the model with [RuBERT](../rubert-base-cased). +Conversational RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on OpenSubtitles\[1\], [Dirty](https://d3.ru/), [Pikabu](https://pikabu.ru/), and a Social Media segment of Taiga corpus\[2\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with [RuBERT](../rubert-base-cased). -\[1\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. -In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\) +\[1\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\) -\[2\]: Shavrina T., Shapovalova O. \(2017\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: -«TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017. +\[2\]: Shavrina T., Shapovalova O. \(2017\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017. diff --git a/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md b/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md index 9bac38460f..50a7a85f28 100644 --- a/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md +++ b/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md @@ -5,17 +5,11 @@ language: # rubert-base-cased-sentence -Sentence RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) -is a representation-based sentence encoder for Russian. It is initialized with RuBERT and fine-tuned on SNLI\[1\] -google-translated to russian and on russian part of XNLI dev set\[2\]. Sentence representations are mean pooled -token embeddings in the same manner as in Sentence-BERT\[3\]. +Sentence RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) is a representation‑based sentence encoder for Russian. It is initialized with RuBERT and fine‑tuned on SNLI\[1\] google-translated to russian and on russian part of XNLI dev set\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\]. -\[1\]: S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. \(2015\) A large annotated corpus for learning -natural language inference. arXiv preprint [arXiv:1508.05326](https://arxiv.org/abs/1508.05326) +\[1\]: S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. \(2015\) A large annotated corpus for learning natural language inference. arXiv preprint [arXiv:1508.05326](https://arxiv.org/abs/1508.05326) -\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. -arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053) +\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053) -\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. -arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084) +\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084) diff --git a/model_cards/DeepPavlov/rubert-base-cased/README.md b/model_cards/DeepPavlov/rubert-base-cased/README.md index 36e12cdeff..39a32a8c5a 100644 --- a/model_cards/DeepPavlov/rubert-base-cased/README.md +++ b/model_cards/DeepPavlov/rubert-base-cased/README.md @@ -5,10 +5,7 @@ language: # rubert-base-cased -RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained on the Russian part of Wikipedia -and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version -of BERT-base as an initialization for RuBERT\[1\]. +RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT\[1\]. -\[1\]: Kuratov, Y., Arkhipov, M. \(2019\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. -arXiv preprint [arXiv:1905.07213](https://arxiv.org/abs/1905.07213). +\[1\]: Kuratov, Y., Arkhipov, M. \(2019\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint [arXiv:1905.07213](https://arxiv.org/abs/1905.07213).