remove excess line breaks in DeepPavlov model cards

2020-03-05 17:57:53 +03:00
parent ed37f9fa4f
commit 73a0c25376
6 changed files with 19 additions and 50 deletions
--- a/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md
+++ b/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md
@@ -8,11 +8,7 @@ language:
 # bert-base-bg-cs-pl-ru-cased
-SlavicBERT\[1\] \(Slavic \(bg, cs, pl, ru\), cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained
+SlavicBERT\[1\] \(Slavic \(bg, cs, pl, ru\), cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on Russian News and four Wikipedias: Bulgarian, Czech, Polish, and Russian. Subtoken vocabulary was built using this data. Multilingual BERT was used as an initialization for SlavicBERT.
 on Russian News and four Wikipedias: Bulgarian, Czech, Polish, and Russian.
 Subtoken vocabulary was built using this data. Multilingual BERT was used as an initialization for SlavicBERT.
-\[1\]: Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. \(2019\).
+\[1\]: Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. \(2019\). [Tuning Multilingual Transformers for Language-Specific Named Entity Recognition](https://www.aclweb.org/anthology/W19-3712/). ACL anthology W19-3712.
 [Tuning Multilingual Transformers for Language-Specific Named Entity Recognition](https://www.aclweb.org/anthology/W19-3712/).
 ACL anthology W19-3712.
--- a/model_cards/DeepPavlov/bert-base-cased-conversational/README.md
+++ b/model_cards/DeepPavlov/bert-base-cased-conversational/README.md
@@ -5,19 +5,13 @@ language:
 # bert-base-cased-conversational
-Conversational BERT \(English, cased, 12-layer, 768-hidden, 12-heads, 110M parameters\) was trained
+Conversational BERT \(English, cased, 12‑layer, 768‑hidden, 12‑heads, 110M parameters\) was trained on the English part of Twitter, Reddit, DailyDialogues\[1\], OpenSubtitles\[2\], Debates\[3\], Blogs\[4\], Facebook News Comments. We used this training data to build the vocabulary of English subtokens and took English cased version of BERT‑base as an initialization for English Conversational BERT.
 on the English part of Twitter, Reddit, DailyDialogues\[1\], OpenSubtitles\[2\], Debates\[3\], Blogs\[4\],
 Facebook News Comments. We used this training data to build the vocabulary of English subtokens and took
 English cased version of BERT-base as an initialization for English Conversational BERT.
-\[1\]: Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled
+\[1\]: Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. IJCNLP 2017.
 Multi-turn Dialogue Dataset. IJCNLP 2017.
-\[2\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles.
+\[2\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
 In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
 \[3\]: Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil. Proceedings of NAACL, 2016.
-\[4\]: J. Schler, M. Koppel, S. Argamon and J. Pennebaker \(2006\). Effects of Age and Gender on Blogging
+\[4\]: J. Schler, M. Koppel, S. Argamon and J. Pennebaker \(2006\). Effects of Age and Gender on Blogging in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs.
 in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs.
--- a/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md
+++ b/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md
@@ -5,18 +5,11 @@ language:
 # bert-base-multilingual-cased-sentence
-Sentence Multilingual BERT \(101 languages, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\)
+Sentence Multilingual BERT \(101 languages, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) is a representation‑based sentence encoder for 101 languages of Multilingual BERT. It is initialized with Multilingual BERT and then fine‑tuned on english MultiNLI\[1\] and on dev set of multilingual XNLI\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\].
 is a representation-based sentence encoder for 101 languages of Multilingual BERT.
 It is initialized with Multilingual BERT and then fine-tuned on english MultiNLI\[1\] and on dev set
 of multilingual XNLI\[2\].
 Sentence representations are mean pooled token embeddings in the same manner as in Sentence-BERT\[3\].
-\[1\]: Williams A., Nangia N. & Bowman S. \(2017\) A Broad-Coverage Challenge Corpus for Sentence Understanding
+\[1\]: Williams A., Nangia N. & Bowman S. \(2017\) A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. arXiv preprint [arXiv:1704.05426](https://arxiv.org/abs/1704.05426)
 through Inference. arXiv preprint [arXiv:1704.05426](https://arxiv.org/abs/1704.05426)
-\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations.
+\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
 arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
-\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
+\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
 arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
--- a/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md
@@ -5,14 +5,9 @@ language:
 # rubert-base-cased-conversational
-Conversational RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained
+Conversational RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on OpenSubtitles\[1\], [Dirty](https://d3.ru/), [Pikabu](https://pikabu.ru/), and a Social Media segment of Taiga corpus\[2\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with [RuBERT](../rubert-base-cased).
 on OpenSubtitles\[1\], [Dirty](https://d3.ru/), [Pikabu](https://pikabu.ru/),
 and a Social Media segment of Taiga corpus\[2\]. We assembled a new vocabulary for Conversational RuBERT model
 on this data and initialized the model with [RuBERT](../rubert-base-cased).
-\[1\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles.
+\[1\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
 In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
-\[2\]: Shavrina T., Shapovalova O. \(2017\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING:
+\[2\]: Shavrina T., Shapovalova O. \(2017\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.
 «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.
--- a/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md
@@ -5,17 +5,11 @@ language:
 # rubert-base-cased-sentence
-Sentence RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\)
+Sentence RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) is a representation‑based sentence encoder for Russian. It is initialized with RuBERT and fine‑tuned on SNLI\[1\] google-translated to russian and on russian part of XNLI dev set\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\].
 is a representation-based sentence encoder for Russian. It is initialized with RuBERT and fine-tuned on SNLI\[1\]
 google-translated to russian and on russian part of XNLI dev set\[2\]. Sentence representations are mean pooled
 token embeddings in the same manner as in Sentence-BERT\[3\].
-\[1\]: S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. \(2015\) A large annotated corpus for learning
+\[1\]: S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. \(2015\) A large annotated corpus for learning natural language inference. arXiv preprint [arXiv:1508.05326](https://arxiv.org/abs/1508.05326)
 natural language inference. arXiv preprint [arXiv:1508.05326](https://arxiv.org/abs/1508.05326)
-\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations.
+\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
 arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
-\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
+\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
 arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
--- a/model_cards/DeepPavlov/rubert-base-cased/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased/README.md
@@ -5,10 +5,7 @@ language:
 # rubert-base-cased
-RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) was trained on the Russian part of Wikipedia
+RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT\[1\].
 and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version
 of BERT-base as an initialization for RuBERT\[1\].
-\[1\]: Kuratov, Y., Arkhipov, M. \(2019\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language.
+\[1\]: Kuratov, Y., Arkhipov, M. \(2019\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint [arXiv:1905.07213](https://arxiv.org/abs/1905.07213).
 arXiv preprint [arXiv:1905.07213](https://arxiv.org/abs/1905.07213).