From 14911e2e12bdd04cd159c017c093f7cf5124668f Mon Sep 17 00:00:00 2001 From: Zhen Wang Date: Sat, 2 May 2020 22:52:12 +0800 Subject: [PATCH] Create README.md (#4112) --- .../wptoux/albert-chinese-large-qa/README.md | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 model_cards/wptoux/albert-chinese-large-qa/README.md diff --git a/model_cards/wptoux/albert-chinese-large-qa/README.md b/model_cards/wptoux/albert-chinese-large-qa/README.md new file mode 100644 index 0000000000..cf43f31714 --- /dev/null +++ b/model_cards/wptoux/albert-chinese-large-qa/README.md @@ -0,0 +1,34 @@ +# albert-chinese-large-qa +Albert large QA model pretrained from baidu webqa and baidu dureader datasets. + +## Data source ++ baidu webqa 1.0 ++ baidu dureader + +## Traing Method +We combined the two datasets together and created a new dataset in squad format, including 705139 samples for training and 69638 samples for validation. +We finetune the model based on the albert chinese large model. + +## Hyperparams ++ learning_rate 1e-5 ++ max_seq_length 512 ++ max_query_length 50 ++ max_answer_length 300 ++ doc_stride 256 ++ num_train_epochs 2 ++ warmup_steps 1000 ++ per_gpu_train_batch_size 8 ++ gradient_accumulation_steps 3 ++ n_gpu 2 (Nvidia Tesla P100) + +## Usage +``` +from transformers import AutoModelForQuestionAnswering, BertTokenizer + +model = AutoModelForQuestionAnswering.from_pretrained('wptoux/albert-chinese-large-qa') +tokenizer = BertTokenizer.from_pretrained('wptoux/albert-chinese-large-qa') +``` +***Important: use BertTokenizer*** + +## MoreInfo +Please visit https://github.com/wptoux/albert-chinese-large-webqa for details.