From e3c55ceb8d78412919c1c2c0dc511a7c492e907d Mon Sep 17 00:00:00 2001
From: David Mark Nemeskey <nemeskeyd@gmail.com>
Date: Wed, 2 Sep 2020 10:50:10 +0200
Subject: [PATCH] Model card for huBERT (#6893)

* Create README.md

Model card for huBERT.

* Update README.md

lowercase h

* Update model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
---
 .../SZTAKI-HLT/hubert-base-cc/README.md       | 43 +++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 model_cards/SZTAKI-HLT/hubert-base-cc/README.md

diff --git a/model_cards/SZTAKI-HLT/hubert-base-cc/README.md b/model_cards/SZTAKI-HLT/hubert-base-cc/README.md
new file mode 100644
index 0000000000..8ecfd1fc95
--- /dev/null
+++ b/model_cards/SZTAKI-HLT/hubert-base-cc/README.md
@@ -0,0 +1,43 @@
+---
+language: hu
+license: apache-2.0
+datasets:
+- common_crawl
+- wikipedia
+---
+
+# huBERT base model (cased)
+
+## Model description
+
+Cased BERT model for Hungarian, trained on the (filtered, deduplicated) Hungarian subset of the Common Crawl and a snapshot of the Hungarian Wikipedia.
+
+## Intended uses & limitations
+
+The model can be used as any other (cased) BERT model. It has been tested on the chunking and
+named entity recognition tasks and set a new state-of-the-art on the former.
+
+## Training
+
+Details of the training data and procedure can be found in the PhD thesis linked below. (With the caveat that it only contains preliminary results
+based on the Wikipedia subcorpus. Evaluation of the full model will appear in a future paper.)
+
+## Eval results
+
+When fine-tuned (via `BertForTokenClassification`) on chunking and NER, the model outperforms multilingual BERT, achieves state-of-the-art results on the
+former task and comes within 0.5% F1 to the SotA on the latter. The exact scores are
+
+| NER | Minimal NP | Maximal NP |
+|-----|------------|------------|
+| 97.62% | **97.14%** | **96.97%** |
+
+### BibTeX entry and citation info
+
+```bibtex
+@PhDThesis{ Nemeskey:2020,                                                 
+  author = {Nemeskey, Dávid Márk},
+  title  = {Natural Language Processing Methods for Language Modeling},
+  year   = {2020},
+  school = {E\"otv\"os Lor\'and University}
+}
+```