Add generic text classification example in TF (#5716)

* Add new example with nlp * Update README * replace nlp by datasets * Update examples/text-classification/README.md Add Lysandre's suggestion. Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 18:05:05 +02:00
parent 6e21f24220
commit 585217c87f
2 changed files with 308 additions and 0 deletions
--- a/examples/text-classification/README.md
+++ b/examples/text-classification/README.md
@@ -23,6 +23,31 @@ Quick benchmarks from the script (no other modifications):
 Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).


+## Run generic text classification script in TensorFlow
+
+The script [run_tf_text_classification.py](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_text_classification.py) allows users to run a text classification on their own CSV files. For now there are few restrictions, the CSV files must have a header corresponding to the column names and not more than three columns: one column for the id, one column for the text and another column for a second piece of text in case of an entailment classification for example.
+
+To use the script, one as to run the following command line:
+```bash
+python run_tf_text_classification.py \
+  --train_file train.csv \ ### training dataset file location (mandatory if running with --do_train option)
+  --dev_file dev.csv \ ### development dataset file location (mandatory if running with --do_eval option)
+  --test_file test.csv \ ### test dataset file location (mandatory if running with --do_predict option)
+  --label_column_id 0 \ ### which column corresponds to the labels
+  --model_name_or_path bert-base-multilingual-uncased \
+  --output_dir model \
+  --num_train_epochs 4 \
+  --per_device_train_batch_size 16 \
+  --per_device_eval_batch_size 32 \
+  --do_train \
+  --do_eval \
+  --do_predict \
+  --logging_steps 10 \
+  --evaluate_during_training \
+  --save_steps 10 \
+  --overwrite_output_dir \
+  --max_seq_length 128
+```

 # Run PyTorch version