From f9ac677eba999a1847314289e39ce14db3e8cece Mon Sep 17 00:00:00 2001 From: Matt Date: Wed, 14 Jul 2021 15:15:25 +0100 Subject: [PATCH] Update TF examples README (#12703) * Update Transformers README, rename token_classification example to token-classification to be consistent with the others * Update examples/tensorflow/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add README for TF token classification * Update examples/tensorflow/token-classification/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update examples/tensorflow/token-classification/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- examples/tensorflow/README.md | 43 ++++++++--------- .../tensorflow/token-classification/README.md | 47 +++++++++++++++++++ .../run_ner.py | 0 3 files changed, 69 insertions(+), 21 deletions(-) create mode 100644 examples/tensorflow/token-classification/README.md rename examples/tensorflow/{token_classification => token-classification}/run_ner.py (100%) diff --git a/examples/tensorflow/README.md b/examples/tensorflow/README.md index 2953a5d11b..f665a8cb89 100644 --- a/examples/tensorflow/README.md +++ b/examples/tensorflow/README.md @@ -1,5 +1,5 @@ -| Task | Example datasets | Keras support | 🤗 Datasets | Colab -|---|---|:---:|:---:|:---:| -| **`language-modeling`** | WikiText-2 | - | - | - -| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/multiple-choice) | SWAG | - | - | - -| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering) | SQuAD | - | - | - -| **`summarization`** | XSum | - | - | - -| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) | GLUE | - | - | - -| **`text-generation`** | n/a | - | n/a | - -| **`token-classification`** | CoNLL NER | - | - | - -| **`translation`** | WMT | - | - | - +| Task | Example datasets | +|---|---| +| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/language-modeling) | WikiText-2 +| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/multiple-choice) | SWAG +| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering) | SQuAD +| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/summarization) | XSum +| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) | GLUE +| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/token-classification) | CoNLL NER +| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/translation) | WMT + +## Coming soon + +- **Colab notebooks** to easily run through these scripts! diff --git a/examples/tensorflow/token-classification/README.md b/examples/tensorflow/token-classification/README.md new file mode 100644 index 0000000000..0e5ec84528 --- /dev/null +++ b/examples/tensorflow/token-classification/README.md @@ -0,0 +1,47 @@ + + +# Token classification + +Fine-tuning the library models for token classification task such as Named Entity Recognition (NER), Parts-of-speech +tagging (POS) or phrase extraction (CHUNKS). The main script `run_ner.py` leverages the [🤗 Datasets](https://github.com/huggingface/datasets) library. You can easily +customize it to your needs if you need extra processing on your datasets. + +It will either run on a datasets hosted on our [hub](https://huggingface.co/datasets) or with your own text files for +training and validation, you might just need to add some tweaks in the data preprocessing. + +The following example fine-tunes BERT on CoNLL-2003: + +```bash +python run_ner.py \ + --model_name_or_path bert-base-uncased \ + --dataset_name conll2003 \ + --output_dir /tmp/test-ner +``` + +To run on your own training and validation files, use the following command: + +```bash +python run_ner.py \ + --model_name_or_path bert-base-uncased \ + --train_file path_to_train_file \ + --validation_file path_to_validation_file \ + --output_dir /tmp/test-ner +``` + +**Note:** This script only works with models that have a fast tokenizer (backed by the [🤗 Tokenizers](https://github.com/huggingface/tokenizers) library) as it +uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in +[this table](https://huggingface.co/transformers/index.html#supported-frameworks). diff --git a/examples/tensorflow/token_classification/run_ner.py b/examples/tensorflow/token-classification/run_ner.py similarity index 100% rename from examples/tensorflow/token_classification/run_ner.py rename to examples/tensorflow/token-classification/run_ner.py