Examples reorg (#11350)
* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
@@ -15,9 +15,9 @@ limitations under the License.
|
||||
|
||||
# Examples
|
||||
|
||||
This folder contains actively maintained examples of use of 🤗 Transformers organized along NLP tasks. If you are looking for an example that used to be in this folder, it may have moved to our [research projects](https://github.com/huggingface/transformers/tree/master/examples/research_projects) subfolder (which contains frozen snapshots of research projects) or to the [legacy](https://github.com/huggingface/transformers/tree/master/examples/legacy) subfolder.
|
||||
This folder contains actively maintained examples of use of 🤗 Transformers organized along NLP tasks. If you are looking for an example that used to be in this folder, it may have moved to the corresponding framework subfolder (pytorch, tensorflow or flax), our [research projects](https://github.com/huggingface/transformers/tree/master/examples/research_projects) subfolder (which contains frozen snapshots of research projects) or to the [legacy](https://github.com/huggingface/transformers/tree/master/examples/legacy) subfolder.
|
||||
|
||||
While we strive to present as many use cases as possible, the scripts in this folder are just examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, all the PyTorch versions of the examples fully expose the preprocessing of the data. This way, you can easily tweak them.
|
||||
While we strive to present as many use cases as possible, the scripts in this folder are just examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data. This way, you can easily tweak them.
|
||||
|
||||
This is similar if you want the scripts to report another metric than the one they currently use: look at the `compute_metrics` function inside the script. It takes the full arrays of predictions and labels and has to return a dictionary of string keys and float values. Just change it to add (or replace) your own metric to the ones already reported.
|
||||
|
||||
@@ -42,7 +42,8 @@ To browse the examples corresponding to released versions of 🤗 Transformers,
|
||||
|
||||
<details>
|
||||
<summary>Examples for older versions of 🤗 Transformers</summary>
|
||||
|
||||
- [v4.5.1](https://github.com/huggingface/transformers/tree/v4.5.1/examples)
|
||||
- [v4.4.2](https://github.com/huggingface/transformers/tree/v4.4.2/examples)
|
||||
- [v4.3.3](https://github.com/huggingface/transformers/tree/v4.3.3/examples)
|
||||
- [v4.2.2](https://github.com/huggingface/transformers/tree/v4.2.2/examples)
|
||||
- [v4.1.1](https://github.com/huggingface/transformers/tree/v4.1.1/examples)
|
||||
@@ -75,193 +76,3 @@ Alternatively, you can find switch your cloned 🤗 Transformers to a specific v
|
||||
git checkout tags/v3.5.1
|
||||
```
|
||||
and run the example command as usual afterward.
|
||||
|
||||
## The Big Table of Tasks
|
||||
|
||||
Here is the list of all our examples:
|
||||
- with information on whether they are **built on top of `Trainer`/`TFTrainer`** (if not, they still work, they might
|
||||
just lack some features),
|
||||
- whether or not they leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library.
|
||||
- links to **Colab notebooks** to walk through the scripts and run them easily,
|
||||
<!--
|
||||
Coming soon!
|
||||
- links to **Cloud deployments** to be able to deploy large-scale trainings in the Cloud with little to no setup.
|
||||
-->
|
||||
|
||||
| Task | Example datasets | Trainer support | TFTrainer support | 🤗 Datasets | Colab
|
||||
|---|---|:---:|:---:|:---:|:---:|
|
||||
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) | WikiText-2 | ✅ | - | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/language_modeling.ipynb)
|
||||
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/multiple_choice.ipynb)
|
||||
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb)
|
||||
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | XSum | ✅ | - | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/summarization.ipynb)
|
||||
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb)
|
||||
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | n/a | n/a | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
|
||||
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb)
|
||||
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | ✅ | - | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/translation.ipynb)
|
||||
|
||||
|
||||
## Running quick tests
|
||||
|
||||
Most examples are equipped with a mechanism to truncate the number of dataset samples to the desired length. This is useful for debugging purposes, for example to quickly check that all stages of the programs can complete, before running the same setup on the full dataset which may take hours to complete.
|
||||
|
||||
For example here is how to truncate all three splits to just 50 samples each:
|
||||
```
|
||||
examples/token-classification/run_ner.py \
|
||||
--max_train_samples 50 \
|
||||
--max_val_samples 50 \
|
||||
--max_test_samples 50 \
|
||||
[...]
|
||||
```
|
||||
|
||||
Most example scripts should have the first two command line arguments and some have the third one. You can quickly check if a given example supports any of these by passing a `-h` option, e.g.:
|
||||
```
|
||||
examples/token-classification/run_ner.py -h
|
||||
```
|
||||
|
||||
## Resuming training
|
||||
|
||||
You can resume training from a previous checkpoint like this:
|
||||
|
||||
1. Pass `--output_dir previous_output_dir` without `--overwrite_output_dir` to resume training from the latest checkpoint in `output_dir` (what you would use if the training was interrupted, for instance).
|
||||
2. Pass `--model_name_or_path path_to_a_specific_checkpoint` to resume training from that checkpoint folder.
|
||||
|
||||
Should you want to turn an example into a notebook where you'd no longer have access to the command
|
||||
line, 🤗 Trainer supports resuming from a checkpoint via `trainer.train(resume_from_checkpoint)`.
|
||||
|
||||
1. If `resume_from_checkpoint` is `True` it will look for the last checkpoint in the value of `output_dir` passed via `TrainingArguments`.
|
||||
2. If `resume_from_checkpoint` is a path to a specific checkpoint it will use that saved checkpoint folder to resume the training from.
|
||||
|
||||
|
||||
## Distributed training and mixed precision
|
||||
|
||||
All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to
|
||||
the [Trainer API](https://huggingface.co/transformers/main_classes/trainer.html). To launch one of them on _n_ GPUS,
|
||||
use the following command:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node number_of_gpu_you_have path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 GPUs:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node 8 text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
If you have a GPU with mixed precision capabilities (architecture Pascal or more recent), you can use mixed precision
|
||||
training with PyTorch 1.6.0 or latest, or by installing the [Apex](https://github.com/NVIDIA/apex) library for previous
|
||||
versions. Just add the flag `--fp16` to your command launching one of the scripts mentioned above!
|
||||
|
||||
Using mixed precision training usually results in 2x-speedup for training with the same final results (as shown in
|
||||
[this table](https://github.com/huggingface/transformers/tree/master/examples/text-classification#mixed-precision-training)
|
||||
for text classification).
|
||||
|
||||
## Running on TPUs
|
||||
|
||||
When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Strategy`.
|
||||
|
||||
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
|
||||
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
|
||||
|
||||
In this repo, we provide a very simple launcher script named
|
||||
[xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our
|
||||
example scripts on multiple TPU cores without any boilerplate. Just pass a `--num_cores` flag to this script, then your
|
||||
regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for
|
||||
`torch.distributed`):
|
||||
|
||||
```bash
|
||||
python xla_spawn.py --num_cores num_tpu_you_have \
|
||||
path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 TPUs:
|
||||
|
||||
```bash
|
||||
python xla_spawn.py --num_cores 8 \
|
||||
text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
## Logging & Experiment tracking
|
||||
|
||||
You can easily log and monitor your runs code. The following are currently supported:
|
||||
|
||||
* [TensorBoard](https://www.tensorflow.org/tensorboard)
|
||||
* [Weights & Biases](https://docs.wandb.ai/integrations/huggingface)
|
||||
* [Comet ML](https://www.comet.ml/docs/python-sdk/huggingface/)
|
||||
|
||||
### Weights & Biases
|
||||
|
||||
To use Weights & Biases, install the wandb package with:
|
||||
|
||||
```bash
|
||||
pip install wandb
|
||||
```
|
||||
|
||||
Then log in the command line:
|
||||
|
||||
```bash
|
||||
wandb login
|
||||
```
|
||||
|
||||
If you are in Jupyter or Colab, you should login with:
|
||||
|
||||
```python
|
||||
import wandb
|
||||
wandb.login()
|
||||
```
|
||||
|
||||
To enable logging to W&B, include `"wandb"` in the `report_to` of your `TrainingArguments` or script. Or just pass along `--report_to all` if you have `wandb` installed.
|
||||
|
||||
Whenever you use `Trainer` or `TFTrainer` classes, your losses, evaluation metrics, model topology and gradients (for `Trainer` only) will automatically be logged.
|
||||
|
||||
Advanced configuration is possible by setting environment variables:
|
||||
|
||||
| Environment Variable | Value |
|
||||
|---|---|
|
||||
| WANDB_LOG_MODEL | Log the model as artifact (log the model as artifact at the end of training (`false` by default) |
|
||||
| WANDB_WATCH | one of `gradients` (default) to log histograms of gradients, `all` to log histograms of both gradients and parameters, or `false` for no histogram logging |
|
||||
| WANDB_PROJECT | Organize runs by project |
|
||||
|
||||
Set run names with `run_name` argument present in scripts or as part of `TrainingArguments`.
|
||||
|
||||
Additional configuration options are available through generic [wandb environment variables](https://docs.wandb.com/library/environment-variables).
|
||||
|
||||
Refer to related [documentation & examples](https://docs.wandb.ai/integrations/huggingface).
|
||||
|
||||
### Comet.ml
|
||||
|
||||
To use `comet_ml`, install the Python package with:
|
||||
|
||||
```bash
|
||||
pip install comet_ml
|
||||
```
|
||||
|
||||
or if in a Conda environment:
|
||||
|
||||
```bash
|
||||
conda install -c comet_ml -c anaconda -c conda-forge comet_ml
|
||||
```
|
||||
|
||||
237
examples/pytorch/README.md
Normal file
237
examples/pytorch/README.md
Normal file
@@ -0,0 +1,237 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
# Examples
|
||||
|
||||
This folder contains actively maintained examples of use of 🤗 Transformers using the PyTorch backend, organized along NLP tasks.
|
||||
|
||||
## The Big Table of Tasks
|
||||
|
||||
Here is the list of all our examples:
|
||||
- with information on whether they are **built on top of `Trainer``** (if not, they still work, they might
|
||||
just lack some features),
|
||||
- whether or not they have a version using the [🤗 Accelerate](https://github.com/huggingface/accelerate) library.
|
||||
- whether or not they leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library.
|
||||
- links to **Colab notebooks** to walk through the scripts and run them easily,
|
||||
<!--
|
||||
Coming soon!
|
||||
- links to **Cloud deployments** to be able to deploy large-scale trainings in the Cloud with little to no setup.
|
||||
-->
|
||||
|
||||
| Task | Example datasets | Trainer support | 🤗 Accelerate | 🤗 Datasets | Colab
|
||||
|---|---|:---:|:---:|:---:|:---:|
|
||||
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling) | WikiText-2 | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/language_modeling.ipynb)
|
||||
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/multiple-choice) | SWAG | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/multiple_choice.ipynb)
|
||||
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering) | SQuAD | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb)
|
||||
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization) | XSum | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/summarization.ipynb)
|
||||
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification) | GLUE | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb)
|
||||
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-generation) | - | n/a | - | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
|
||||
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/token-classification) | CoNLL NER | ✅ |✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb)
|
||||
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation) | WMT | ✅ | ✅ |✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/translation.ipynb)
|
||||
|
||||
|
||||
## Running quick tests
|
||||
|
||||
Most examples are equipped with a mechanism to truncate the number of dataset samples to the desired length. This is useful for debugging purposes, for example to quickly check that all stages of the programs can complete, before running the same setup on the full dataset which may take hours to complete.
|
||||
|
||||
For example here is how to truncate all three splits to just 50 samples each:
|
||||
```
|
||||
examples/pytorch/token-classification/run_ner.py \
|
||||
--max_train_samples 50 \
|
||||
--max_val_samples 50 \
|
||||
--max_test_samples 50 \
|
||||
[...]
|
||||
```
|
||||
|
||||
Most example scripts should have the first two command line arguments and some have the third one. You can quickly check if a given example supports any of these by passing a `-h` option, e.g.:
|
||||
```
|
||||
examples/pytorch/token-classification/run_ner.py -h
|
||||
```
|
||||
|
||||
## Resuming training
|
||||
|
||||
You can resume training from a previous checkpoint like this:
|
||||
|
||||
1. Pass `--output_dir previous_output_dir` without `--overwrite_output_dir` to resume training from the latest checkpoint in `output_dir` (what you would use if the training was interrupted, for instance).
|
||||
2. Pass `--model_name_or_path path_to_a_specific_checkpoint` to resume training from that checkpoint folder.
|
||||
|
||||
Should you want to turn an example into a notebook where you'd no longer have access to the command
|
||||
line, 🤗 Trainer supports resuming from a checkpoint via `trainer.train(resume_from_checkpoint)`.
|
||||
|
||||
1. If `resume_from_checkpoint` is `True` it will look for the last checkpoint in the value of `output_dir` passed via `TrainingArguments`.
|
||||
2. If `resume_from_checkpoint` is a path to a specific checkpoint it will use that saved checkpoint folder to resume the training from.
|
||||
|
||||
|
||||
## Distributed training and mixed precision
|
||||
|
||||
All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to
|
||||
the [Trainer API](https://huggingface.co/transformers/main_classes/trainer.html). To launch one of them on _n_ GPUS,
|
||||
use the following command:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node number_of_gpu_you_have path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 GPUs:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node 8 pytorch/text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
If you have a GPU with mixed precision capabilities (architecture Pascal or more recent), you can use mixed precision
|
||||
training with PyTorch 1.6.0 or latest, or by installing the [Apex](https://github.com/NVIDIA/apex) library for previous
|
||||
versions. Just add the flag `--fp16` to your command launching one of the scripts mentioned above!
|
||||
|
||||
Using mixed precision training usually results in 2x-speedup for training with the same final results (as shown in
|
||||
[this table](https://github.com/huggingface/transformers/tree/master/examples/text-classification#mixed-precision-training)
|
||||
for text classification).
|
||||
|
||||
## Running on TPUs
|
||||
|
||||
When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Strategy`.
|
||||
|
||||
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
|
||||
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
|
||||
|
||||
In this repo, we provide a very simple launcher script named
|
||||
[xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our
|
||||
example scripts on multiple TPU cores without any boilerplate. Just pass a `--num_cores` flag to this script, then your
|
||||
regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for
|
||||
`torch.distributed`):
|
||||
|
||||
```bash
|
||||
python xla_spawn.py --num_cores num_tpu_you_have \
|
||||
path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 TPUs (from this folder):
|
||||
|
||||
```bash
|
||||
python xla_spawn.py --num_cores 8 \
|
||||
text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
## Using Accelerate
|
||||
|
||||
Most PyTorch example scripts have a version using the [🤗 Accelerate](https://github.com/huggingface/accelerate) library
|
||||
that exposes the training loop so it's easy for you to customize or tweak them to your needs. They all require you to
|
||||
install `accelerate` with
|
||||
|
||||
```bash
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
Then you can easily launch any of the scripts by running
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
and reply to the questions asked. Then
|
||||
|
||||
```bash
|
||||
accelerate test
|
||||
```
|
||||
|
||||
that will check everything is ready for training. Finally, you cam launch training with
|
||||
|
||||
```bash
|
||||
accelerate launch path_to_script.py --args_to_script
|
||||
```
|
||||
|
||||
## Logging & Experiment tracking
|
||||
|
||||
You can easily log and monitor your runs code. The following are currently supported:
|
||||
|
||||
* [TensorBoard](https://www.tensorflow.org/tensorboard)
|
||||
* [Weights & Biases](https://docs.wandb.ai/integrations/huggingface)
|
||||
* [Comet ML](https://www.comet.ml/docs/python-sdk/huggingface/)
|
||||
|
||||
### Weights & Biases
|
||||
|
||||
To use Weights & Biases, install the wandb package with:
|
||||
|
||||
```bash
|
||||
pip install wandb
|
||||
```
|
||||
|
||||
Then log in the command line:
|
||||
|
||||
```bash
|
||||
wandb login
|
||||
```
|
||||
|
||||
If you are in Jupyter or Colab, you should login with:
|
||||
|
||||
```python
|
||||
import wandb
|
||||
wandb.login()
|
||||
```
|
||||
|
||||
To enable logging to W&B, include `"wandb"` in the `report_to` of your `TrainingArguments` or script. Or just pass along `--report_to all` if you have `wandb` installed.
|
||||
|
||||
Whenever you use `Trainer` or `TFTrainer` classes, your losses, evaluation metrics, model topology and gradients (for `Trainer` only) will automatically be logged.
|
||||
|
||||
Advanced configuration is possible by setting environment variables:
|
||||
|
||||
| Environment Variable | Value |
|
||||
|---|---|
|
||||
| WANDB_LOG_MODEL | Log the model as artifact (log the model as artifact at the end of training (`false` by default) |
|
||||
| WANDB_WATCH | one of `gradients` (default) to log histograms of gradients, `all` to log histograms of both gradients and parameters, or `false` for no histogram logging |
|
||||
| WANDB_PROJECT | Organize runs by project |
|
||||
|
||||
Set run names with `run_name` argument present in scripts or as part of `TrainingArguments`.
|
||||
|
||||
Additional configuration options are available through generic [wandb environment variables](https://docs.wandb.com/library/environment-variables).
|
||||
|
||||
Refer to related [documentation & examples](https://docs.wandb.ai/integrations/huggingface).
|
||||
|
||||
### Comet.ml
|
||||
|
||||
To use `comet_ml`, install the Python package with:
|
||||
|
||||
```bash
|
||||
pip install comet_ml
|
||||
```
|
||||
|
||||
or if in a Conda environment:
|
||||
|
||||
```bash
|
||||
conda install -c comet_ml -c anaconda -c conda-forge comet_ml
|
||||
```
|
||||
1
examples/pytorch/benchmarking/requirements.txt
Normal file
1
examples/pytorch/benchmarking/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
torch >= 1.3
|
||||
@@ -16,9 +16,7 @@ limitations under the License.
|
||||
|
||||
# Multiple Choice
|
||||
|
||||
Based on the script [`run_swag.py`]().
|
||||
|
||||
## PyTorch script: fine-tuning on SWAG
|
||||
## Fine-tuning on SWAG with the Trainer
|
||||
|
||||
`run_swag` allows you to fine-tune any model from our [hub](https://huggingface.co/models) (as long as its architecture as a `ForMultipleChoice` version in the library) on the SWAG dataset or your own csv/jsonlines files as long as they are structured the same way. To make it works on another dataset, you will need to tweak the `preprocess_function` inside the script.
|
||||
|
||||
@@ -41,9 +39,9 @@ eval_acc = 0.8338998300509847
|
||||
eval_loss = 0.44457291918821606
|
||||
```
|
||||
|
||||
## PyTorch version, no Trainer
|
||||
## With Accelerate
|
||||
|
||||
Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/multiple-choice/run_swag_no_trainer.py).
|
||||
Based on the script [run_swag_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/pytorch/multiple-choice/run_swag_no_trainer.py).
|
||||
|
||||
Like `run_swag.py`, this script allows you to fine-tune any of the models on the [hub](https://huggingface.co/models) (as long as its architecture as a `ForMultipleChoice` version in the library) on
|
||||
the SWAG dataset or your own data in a csv or a JSON file. The main difference is that this
|
||||
@@ -108,24 +106,3 @@ This command is the same and will work for:
|
||||
- a training on TPUs
|
||||
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
|
||||
## Tensorflow
|
||||
|
||||
```bash
|
||||
export SWAG_DIR=/path/to/swag_data_dir
|
||||
python ./examples/multiple-choice/run_tf_multiple_choice.py \
|
||||
--task_name swag \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--data_dir $SWAG_DIR \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 3 \
|
||||
--max_seq_length 80 \
|
||||
--output_dir models_bert/swag_base \
|
||||
--per_gpu_eval_batch_size=16 \
|
||||
--per_device_train_batch_size=16 \
|
||||
--logging-dir logs \
|
||||
--gradient_accumulation_steps 2 \
|
||||
--overwrite_output
|
||||
```
|
||||
@@ -1,2 +1,3 @@
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
torch >= 1.3
|
||||
@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
## SQuAD
|
||||
# SQuAD
|
||||
|
||||
Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_qa.py).
|
||||
Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py).
|
||||
|
||||
**Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it
|
||||
uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in
|
||||
@@ -29,7 +29,9 @@ The old version of this script can be found [here](https://github.com/huggingfac
|
||||
|
||||
Note that if your dataset contains samples with no possible answers (like SQUAD version 2), you need to pass along the flag `--version_2_with_negative`.
|
||||
|
||||
#### Fine-tuning BERT on SQuAD1.0
|
||||
## Trainer-based scripts
|
||||
|
||||
### Fine-tuning BERT on SQuAD1.0
|
||||
|
||||
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
|
||||
on a single tesla V100 16GB.
|
||||
@@ -57,7 +59,6 @@ exact_match = 81.22
|
||||
|
||||
#### Distributed training
|
||||
|
||||
|
||||
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1:
|
||||
|
||||
```bash
|
||||
@@ -128,6 +129,71 @@ python run_qa_beam_search.py \
|
||||
--save_steps 5000
|
||||
```
|
||||
|
||||
## With Accelerate
|
||||
|
||||
Based on the script `run_qa_no_trainer.py` and `run_qa_beam_search_no_trainer.py`.
|
||||
|
||||
Like `run_qa.py` and `run_qa_beam_search.py`, these scripts allow you to fine-tune any of the models supported on a
|
||||
SQUAD or a similar dataset, the main difference is that this
|
||||
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
|
||||
|
||||
It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
|
||||
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
|
||||
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
|
||||
after installing it:
|
||||
|
||||
```bash
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
then
|
||||
|
||||
```bash
|
||||
python run_qa_no_trainer.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--max_seq_length 384 \
|
||||
--doc_stride 128 \
|
||||
--output_dir ~/tmp/debug_squad
|
||||
```
|
||||
|
||||
You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
and reply to the questions asked. Then
|
||||
|
||||
```bash
|
||||
accelerate test
|
||||
```
|
||||
|
||||
that will check everything is ready for training. Finally, you cna launch training with
|
||||
|
||||
```bash
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
accelerate launch run_qa_no_trainer.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--max_seq_length 384 \
|
||||
--doc_stride 128 \
|
||||
--output_dir ~/tmp/debug_squad
|
||||
```
|
||||
|
||||
This command is the same and will work for:
|
||||
|
||||
- a CPU-only setup
|
||||
- a setup with one GPU
|
||||
- a distributed training with several GPUs (single or multi node)
|
||||
- a training on TPUs
|
||||
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
|
||||
|
||||
## Results
|
||||
|
||||
Larger batch size may improve the performance while costing more memory.
|
||||
|
||||
##### Results for SQuAD1.0 with the previously defined hyper-parameters:
|
||||
@@ -223,22 +289,3 @@ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answer
|
||||
```
|
||||
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
|
||||
`bert-large-uncased-whole-word-masking`.
|
||||
|
||||
## SQuAD with the Tensorflow Trainer
|
||||
|
||||
```bash
|
||||
python run_tf_squad.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--output_dir model \
|
||||
--max_seq_length 384 \
|
||||
--num_train_epochs 2 \
|
||||
--per_gpu_train_batch_size 8 \
|
||||
--per_gpu_eval_batch_size 16 \
|
||||
--do_train \
|
||||
--logging_dir logs \
|
||||
--logging_steps 10 \
|
||||
--learning_rate 3e-5 \
|
||||
--doc_stride 128
|
||||
```
|
||||
|
||||
For the moment evaluation is not available in the Tensorflow Trainer only the training.
|
||||
@@ -1 +1,2 @@
|
||||
datasets >= 1.4.0
|
||||
torch >= 1.3.0
|
||||
@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
## Sequence to Sequence Training and Evaluation
|
||||
## Summarization
|
||||
|
||||
This directory contains examples for finetuning and evaluating transformers on summarization and translation tasks.
|
||||
This directory contains examples for finetuning and evaluating transformers on summarization tasks.
|
||||
Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!
|
||||
For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md).
|
||||
For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq).
|
||||
@@ -30,16 +30,16 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s
|
||||
- `PegasusForConditionalGeneration`
|
||||
- `T5ForConditionalGeneration`
|
||||
|
||||
`run_summarization.py` and `run_translation.py` are lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
|
||||
`run_summarization.py` is a lightweight example of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
|
||||
|
||||
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files
|
||||
and you also will find examples of these below.
|
||||
|
||||
### Summarization
|
||||
## With Trainer
|
||||
|
||||
Here is an example on a summarization task:
|
||||
```bash
|
||||
python examples/seq2seq/run_summarization.py \
|
||||
python examples/pytorch/summarization/run_summarization.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -63,7 +63,7 @@ And here is how you would use it on your own files, after adjusting the values f
|
||||
`--train_file`, `--validation_file`, `--text_column` and `--summary_column` to match your setup:
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_summarization.py \
|
||||
python examples/pytorch/summarization/run_summarization.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -134,115 +134,64 @@ And as with the CSV files, you can specify which values to select from the file,
|
||||
--summary_column summary \
|
||||
```
|
||||
|
||||
## With Accelerate
|
||||
|
||||
Based on the script [`run_summarization_no_trainer.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/summarization/run_summarization_no_trainer.py).
|
||||
|
||||
### Translation
|
||||
Like `run_summarization.py`, this script allows you to fine-tune any of the models supported on a
|
||||
summarization task, the main difference is that this
|
||||
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
|
||||
|
||||
Here is an example of a translation fine-tuning with a MarianMT model:
|
||||
It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
|
||||
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
|
||||
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
|
||||
after installing it:
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_translation.py \
|
||||
--model_name_or_path Helsinki-NLP/opus-mt-en-ro \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
MBart and some T5 models require special handling.
|
||||
|
||||
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
then
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_translation.py \
|
||||
python run_summarization_no_trainer.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--source_prefix "translate English to Romanian: " \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
--dataset_name cnn_dailymail \
|
||||
--dataset_config "3.0.0" \
|
||||
--source_prefix "summarize: " \
|
||||
--output_dir ~/tmp/tst-summarization
|
||||
```
|
||||
|
||||
If you get a terrible BLEU score, make sure that you didn't forget to use the `--source_prefix` argument.
|
||||
|
||||
For the aforementioned group of T5 models it's important to remember that if you switch to a different language pair, make sure to adjust the source and target values in all 3 language-specific command line argument: `--source_lang`, `--target_lang` and `--source_prefix`.
|
||||
|
||||
MBart models require a different format for `--source_lang` and `--target_lang` values, e.g. instead of `en` it expects `en_XX`, for `ro` it expects `ro_RO`. The full MBart specification for language codes can be found [here](https://huggingface.co/facebook/mbart-large-cc25). For example:
|
||||
You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_translation.py \
|
||||
--model_name_or_path facebook/mbart-large-en-ro \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--source_lang en_XX \
|
||||
--target_lang ro_RO \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
accelerate config
|
||||
```
|
||||
|
||||
And here is how you would use the translation finetuning on your own files, after adjusting the
|
||||
values for the arguments `--train_file`, `--validation_file` to match your setup:
|
||||
and reply to the questions asked. Then
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_translation.py \
|
||||
accelerate test
|
||||
```
|
||||
|
||||
that will check everything is ready for training. Finally, you cna launch training with
|
||||
|
||||
```bash
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
accelerate launch run_summarization_no_trainer.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--source_prefix "translate English to Romanian: " \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--train_file path_to_jsonlines_file \
|
||||
--validation_file path_to_jsonlines_file \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
--dataset_name cnn_dailymail \
|
||||
--dataset_config "3.0.0" \
|
||||
--source_prefix "summarize: " \
|
||||
--output_dir ~/tmp/tst-summarization
|
||||
```
|
||||
|
||||
The task of translation supports only custom JSONLINES files, with each line being a dictionary with a key `"translation"` and its value another dictionary whose keys is the language pair. For example:
|
||||
This command is the same and will work for:
|
||||
|
||||
```json
|
||||
{ "translation": { "en": "Others have dismissed him as a joke.", "ro": "Alții l-au numit o glumă." } }
|
||||
{ "translation": { "en": "And some are holding out for an implosion.", "ro": "Iar alții așteaptă implozia." } }
|
||||
```
|
||||
Here the languages are Romanian (`ro`) and English (`en`).
|
||||
- a CPU-only setup
|
||||
- a setup with one GPU
|
||||
- a distributed training with several GPUs (single or multi node)
|
||||
- a training on TPUs
|
||||
|
||||
If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as following:
|
||||
|
||||
```bash
|
||||
python examples/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang de \
|
||||
--source_prefix "translate English to German: " \
|
||||
--dataset_name stas/wmt14-en-de-pre-processed \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
7
examples/pytorch/summarization/requirements.txt
Normal file
7
examples/pytorch/summarization/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
datasets >= 1.1.3
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
rouge-score
|
||||
nltk
|
||||
py7zr
|
||||
torch >= 1.3
|
||||
@@ -36,7 +36,8 @@ SRC_DIRS = [
|
||||
"language-modeling",
|
||||
"multiple-choice",
|
||||
"question-answering",
|
||||
"seq2seq",
|
||||
"summarization",
|
||||
"translation",
|
||||
]
|
||||
]
|
||||
sys.path.extend(SRC_DIRS)
|
||||
@@ -16,7 +16,7 @@ limitations under the License.
|
||||
|
||||
# Text classification examples
|
||||
|
||||
## PyTorch version
|
||||
## GLUE tasks
|
||||
|
||||
Based on the script [`run_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py).
|
||||
|
||||
@@ -129,7 +129,7 @@ and reply to the questions asked. Then
|
||||
accelerate test
|
||||
```
|
||||
|
||||
that will check everything is ready for training. Finally, you cna launch training with
|
||||
that will check everything is ready for training. Finally, you can launch training with
|
||||
|
||||
```bash
|
||||
export TASK_NAME=mrpc
|
||||
@@ -152,84 +152,3 @@ This command is the same and will work for:
|
||||
- a training on TPUs
|
||||
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
|
||||
## TensorFlow 2.0 version
|
||||
|
||||
Based on the script [`run_tf_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_glue.py).
|
||||
|
||||
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: [General Language Understanding Evaluation](https://gluebenchmark.com/).
|
||||
|
||||
This script has an option for mixed precision (Automatic Mixed Precision / AMP) to run models on Tensor Cores (NVIDIA Volta/Turing GPUs) and future hardware and an option for XLA, which uses the XLA compiler to reduce model runtime.
|
||||
Options are toggled using `USE_XLA` or `USE_AMP` variables in the script.
|
||||
These options and the below benchmark are provided by @tlkh.
|
||||
|
||||
Quick benchmarks from the script (no other modifications):
|
||||
|
||||
| GPU | Mode | Time (2nd epoch) | Val Acc (3 runs) |
|
||||
| --------- | -------- | ----------------------- | ----------------------|
|
||||
| Titan V | FP32 | 41s | 0.8438/0.8281/0.8333 |
|
||||
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
|
||||
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
|
||||
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
|
||||
| 1080 Ti | FP32 | 55s | - |
|
||||
|
||||
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
|
||||
|
||||
|
||||
## Run generic text classification script in TensorFlow
|
||||
|
||||
The script [run_tf_text_classification.py](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_text_classification.py) allows users to run a text classification on their own CSV files. For now there are few restrictions, the CSV files must have a header corresponding to the column names and not more than three columns: one column for the id, one column for the text and another column for a second piece of text in case of an entailment classification for example.
|
||||
|
||||
To use the script, one as to run the following command line:
|
||||
```bash
|
||||
python run_tf_text_classification.py \
|
||||
--train_file train.csv \ ### training dataset file location (mandatory if running with --do_train option)
|
||||
--dev_file dev.csv \ ### development dataset file location (mandatory if running with --do_eval option)
|
||||
--test_file test.csv \ ### test dataset file location (mandatory if running with --do_predict option)
|
||||
--label_column_id 0 \ ### which column corresponds to the labels
|
||||
--model_name_or_path bert-base-multilingual-uncased \
|
||||
--output_dir model \
|
||||
--num_train_epochs 4 \
|
||||
--per_device_train_batch_size 16 \
|
||||
--per_device_eval_batch_size 32 \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--do_predict \
|
||||
--logging_steps 10 \
|
||||
--evaluation_strategy steps \
|
||||
--save_steps 10 \
|
||||
--overwrite_output_dir \
|
||||
--max_seq_length 128
|
||||
```
|
||||
|
||||
|
||||
## XNLI
|
||||
|
||||
Based on the script [`run_xnli.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_xnli.py).
|
||||
|
||||
[XNLI](https://www.nyu.edu/projects/bowman/xnli/) is a crowd-sourced dataset based on [MultiNLI](http://www.nyu.edu/projects/bowman/multinli/). It is an evaluation benchmark for cross-lingual text representations. Pairs of text are labeled with textual entailment annotations for 15 different languages (including both high-resource language such as English and low-resource languages such as Swahili).
|
||||
|
||||
#### Fine-tuning on XNLI
|
||||
|
||||
This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It runs in 106 mins on a single tesla V100 16GB.
|
||||
|
||||
```bash
|
||||
python run_xnli.py \
|
||||
--model_name_or_path bert-base-multilingual-cased \
|
||||
--language de \
|
||||
--train_language en \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--per_device_train_batch_size 32 \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 2.0 \
|
||||
--max_seq_length 128 \
|
||||
--output_dir /tmp/debug_xnli/ \
|
||||
--save_steps -1
|
||||
```
|
||||
|
||||
Training with the previously defined hyper-parameters yields the following results on the **test** set:
|
||||
|
||||
```bash
|
||||
acc = 0.7093812375249501
|
||||
```
|
||||
@@ -2,3 +2,4 @@ accelerate
|
||||
datasets >= 1.1.3
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
torch >= 1.3
|
||||
@@ -1,2 +1,3 @@
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
torch >= 1.3
|
||||
@@ -61,7 +61,7 @@ You can find the old version of the PyTorch script [here](https://github.com/hug
|
||||
|
||||
## Pytorch version, no Trainer
|
||||
|
||||
Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_no_trainer.py).
|
||||
Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/pytorch/token-classification/run_ner_no_trainer.py).
|
||||
|
||||
Like `run_ner.py`, this script allows you to fine-tune any of the models on the [hub](https://huggingface.co/models) on a
|
||||
token classification task, either NER, POS or CHUNKS tasks or your own data in a csv or a JSON file. The main difference is that this
|
||||
@@ -126,66 +126,3 @@ This command is the same and will work for:
|
||||
- a training on TPUs
|
||||
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
|
||||
### TensorFlow version
|
||||
|
||||
The following examples are covered in this section:
|
||||
|
||||
* NER on the GermEval 2014 (German NER) dataset
|
||||
* Emerging and Rare Entities task: WNUT’17 (English NER) dataset
|
||||
|
||||
Details and results for the fine-tuning provided by @stefan-it.
|
||||
|
||||
### GermEval 2014 (German NER) dataset
|
||||
|
||||
#### Data (Download and pre-processing steps)
|
||||
|
||||
Data can be obtained from the [GermEval 2014](https://sites.google.com/site/germeval2014ner/data) shared task page.
|
||||
|
||||
Here are the commands for downloading and pre-processing train, dev and test datasets. The original data format has four (tab-separated) columns, in a pre-processing step only the two relevant columns (token and outer span NER annotation) are extracted:
|
||||
|
||||
```bash
|
||||
curl -L 'https://drive.google.com/uc?export=download&id=1Jjhbal535VVz2ap4v4r_rN1UEHTdLK5P' \
|
||||
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > train.txt.tmp
|
||||
curl -L 'https://drive.google.com/uc?export=download&id=1ZfRcQThdtAR5PPRjIDtrVP7BtXSCUBbm' \
|
||||
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > dev.txt.tmp
|
||||
curl -L 'https://drive.google.com/uc?export=download&id=1u9mb7kNJHWQCWyweMDRMuTFoOHOfeBTH' \
|
||||
| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > test.txt.tmp
|
||||
```
|
||||
|
||||
The GermEval 2014 dataset contains some strange "control character" tokens like `'\x96', '\u200e', '\x95', '\xad' or '\x80'`.
|
||||
One problem with these tokens is, that `BertTokenizer` returns an empty token for them, resulting in misaligned `InputExample`s.
|
||||
The `preprocess.py` script located in the `scripts` folder a) filters these tokens and b) splits longer sentences into smaller ones (once the max. subtoken length is reached).
|
||||
|
||||
Let's define some variables that we need for further pre-processing steps and training the model:
|
||||
|
||||
```bash
|
||||
export MAX_LENGTH=128
|
||||
export BERT_MODEL=bert-base-multilingual-cased
|
||||
```
|
||||
|
||||
Run the pre-processing script on training, dev and test datasets:
|
||||
|
||||
```bash
|
||||
python3 scripts/preprocess.py train.txt.tmp $BERT_MODEL $MAX_LENGTH > train.txt
|
||||
python3 scripts/preprocess.py dev.txt.tmp $BERT_MODEL $MAX_LENGTH > dev.txt
|
||||
python3 scripts/preprocess.py test.txt.tmp $BERT_MODEL $MAX_LENGTH > test.txt
|
||||
```
|
||||
|
||||
The GermEval 2014 dataset has much more labels than CoNLL-2002/2003 datasets, so an own set of labels must be used:
|
||||
|
||||
```bash
|
||||
cat train.txt dev.txt test.txt | cut -d " " -f 2 | grep -v "^$"| sort | uniq > labels.txt
|
||||
```
|
||||
|
||||
#### Prepare the run
|
||||
|
||||
Additional environment variables must be set:
|
||||
|
||||
```bash
|
||||
export OUTPUT_DIR=germeval-model
|
||||
export BATCH_SIZE=32
|
||||
export NUM_EPOCHS=3
|
||||
export SAVE_STEPS=750
|
||||
export SEED=1
|
||||
```
|
||||
@@ -1,2 +1,3 @@
|
||||
seqeval
|
||||
datasets >= 1.1.3
|
||||
torch >= 1.3
|
||||
212
examples/pytorch/translation/README.md
Normal file
212
examples/pytorch/translation/README.md
Normal file
@@ -0,0 +1,212 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
## Translation
|
||||
|
||||
This directory contains examples for finetuning and evaluating transformers on translation tasks.
|
||||
Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!
|
||||
For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md).
|
||||
For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq).
|
||||
|
||||
### Supported Architectures
|
||||
|
||||
- `BartForConditionalGeneration`
|
||||
- `FSMTForConditionalGeneration` (translation only)
|
||||
- `MBartForConditionalGeneration`
|
||||
- `MarianMTModel`
|
||||
- `PegasusForConditionalGeneration`
|
||||
- `T5ForConditionalGeneration`
|
||||
|
||||
`run_translation.py` is a lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
|
||||
|
||||
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files
|
||||
and you also will find examples of these below.
|
||||
|
||||
|
||||
## With Trainer
|
||||
|
||||
Here is an example of a translation fine-tuning with a MarianMT model:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/seq2seq/run_translation.py \
|
||||
--model_name_or_path Helsinki-NLP/opus-mt-en-ro \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
MBart and some T5 models require special handling.
|
||||
|
||||
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--source_prefix "translate English to Romanian: " \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
If you get a terrible BLEU score, make sure that you didn't forget to use the `--source_prefix` argument.
|
||||
|
||||
For the aforementioned group of T5 models it's important to remember that if you switch to a different language pair, make sure to adjust the source and target values in all 3 language-specific command line argument: `--source_lang`, `--target_lang` and `--source_prefix`.
|
||||
|
||||
MBart models require a different format for `--source_lang` and `--target_lang` values, e.g. instead of `en` it expects `en_XX`, for `ro` it expects `ro_RO`. The full MBart specification for language codes can be found [here](https://huggingface.co/facebook/mbart-large-cc25). For example:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/seq2seq/run_translation.py \
|
||||
--model_name_or_path facebook/mbart-large-en-ro \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--source_lang en_XX \
|
||||
--target_lang ro_RO \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
And here is how you would use the translation finetuning on your own files, after adjusting the
|
||||
values for the arguments `--train_file`, `--validation_file` to match your setup:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--source_prefix "translate English to Romanian: " \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--train_file path_to_jsonlines_file \
|
||||
--validation_file path_to_jsonlines_file \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
The task of translation supports only custom JSONLINES files, with each line being a dictionary with a key `"translation"` and its value another dictionary whose keys is the language pair. For example:
|
||||
|
||||
```json
|
||||
{ "translation": { "en": "Others have dismissed him as a joke.", "ro": "Alții l-au numit o glumă." } }
|
||||
{ "translation": { "en": "And some are holding out for an implosion.", "ro": "Iar alții așteaptă implozia." } }
|
||||
```
|
||||
Here the languages are Romanian (`ro`) and English (`en`).
|
||||
|
||||
If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as following:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
--target_lang de \
|
||||
--source_prefix "translate English to German: " \
|
||||
--dataset_name stas/wmt14-en-de-pre-processed \
|
||||
--output_dir /tmp/tst-translation \
|
||||
--per_device_train_batch_size=4 \
|
||||
--per_device_eval_batch_size=4 \
|
||||
--overwrite_output_dir \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
## With Accelerate
|
||||
|
||||
Based on the script [`run_translation_no_trainer.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/translation/run_translationn_no_trainer.py).
|
||||
|
||||
Like `run_translation.py`, this script allows you to fine-tune any of the models supported on a
|
||||
translation task, the main difference is that this
|
||||
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
|
||||
|
||||
It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
|
||||
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
|
||||
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
|
||||
after installing it:
|
||||
|
||||
```bash
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
then
|
||||
|
||||
```bash
|
||||
python run_tranlation_no_trainer.py \
|
||||
--model_name_or_path Helsinki-NLP/opus-mt-en-ro \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir ~/tmp/tst-translation
|
||||
```
|
||||
|
||||
You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
and reply to the questions asked. Then
|
||||
|
||||
```bash
|
||||
accelerate test
|
||||
```
|
||||
|
||||
that will check everything is ready for training. Finally, you cna launch training with
|
||||
|
||||
```bash
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
accelerate launch run_translation_no_trainer.py \
|
||||
--model_name_or_path Helsinki-NLP/opus-mt-en-ro \
|
||||
--source_lang en \
|
||||
--target_lang ro \
|
||||
--dataset_name wmt16 \
|
||||
--dataset_config_name ro-en \
|
||||
--output_dir ~/tmp/tst-translation
|
||||
```
|
||||
|
||||
This command is the same and will work for:
|
||||
|
||||
- a CPU-only setup
|
||||
- a setup with one GPU
|
||||
- a distributed training with several GPUs (single or multi node)
|
||||
- a training on TPUs
|
||||
|
||||
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
|
||||
@@ -2,6 +2,5 @@ datasets >= 1.1.3
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
sacrebleu >= 1.4.12
|
||||
rouge-score
|
||||
nltk
|
||||
py7zr
|
||||
torch >= 1.3
|
||||
42
examples/tensorflow/README.md
Normal file
42
examples/tensorflow/README.md
Normal file
@@ -0,0 +1,42 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
# Examples
|
||||
|
||||
This folder contains actively maintained examples of use of 🤗 Transformers using the TensorFlow backend, organized along NLP tasks. It is under construction so we thank you for your patience!
|
||||
|
||||
## The Big Table of Tasks
|
||||
|
||||
Here is the list of all our examples:
|
||||
- with information on whether they are **built on top of `Keras`** (if not, they still work, they might
|
||||
just lack some features),
|
||||
- whether or not they leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library.
|
||||
- links to **Colab notebooks** to walk through the scripts and run them easily,
|
||||
<!--
|
||||
Coming soon!
|
||||
- links to **Cloud deployments** to be able to deploy large-scale trainings in the Cloud with little to no setup.
|
||||
-->
|
||||
|
||||
| Task | Example datasets | Keras support | 🤗 Datasets | Colab
|
||||
|---|---|:---:|:---:|:---:|
|
||||
| **`language-modeling`** | WikiText-2 | - | - | -
|
||||
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/multiple-choice) | SWAG | - | - | -
|
||||
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering) | SQuAD | - | - | -
|
||||
| **`summarization`** | XSum | - | - | -
|
||||
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) | GLUE | - | - | -
|
||||
| **`text-generation`** | n/a | - | n/a | -
|
||||
| **`token-classification`** | CoNLL NER | - | - | -
|
||||
| **`translation`** | WMT | - | - | -
|
||||
|
||||
26
examples/tensorflow/benchmarking/README.md
Normal file
26
examples/tensorflow/benchmarking/README.md
Normal file
@@ -0,0 +1,26 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
# 🤗 Benchmark results
|
||||
|
||||
Here, you can find a list of the different benchmark results created by the community.
|
||||
|
||||
If you would like to list benchmark results on your favorite models of the [model hub](https://huggingface.co/models) here, please open a Pull Request and add it below.
|
||||
|
||||
| Benchmark description | Results | Environment info | Author |
|
||||
|:----------|:-------------|:-------------|------:|
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
178
examples/tensorflow/benchmarking/plot_csv_file.py
Normal file
178
examples/tensorflow/benchmarking/plot_csv_file.py
Normal file
@@ -0,0 +1,178 @@
|
||||
# Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import csv
|
||||
from collections import defaultdict
|
||||
from dataclasses import dataclass, field
|
||||
from typing import List, Optional
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from matplotlib.ticker import ScalarFormatter
|
||||
|
||||
from transformers import HfArgumentParser
|
||||
|
||||
|
||||
def list_field(default=None, metadata=None):
|
||||
return field(default_factory=lambda: default, metadata=metadata)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlotArguments:
|
||||
"""
|
||||
Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.
|
||||
"""
|
||||
|
||||
csv_file: str = field(
|
||||
metadata={"help": "The csv file to plot."},
|
||||
)
|
||||
plot_along_batch: bool = field(
|
||||
default=False,
|
||||
metadata={"help": "Whether to plot along batch size or sequence length. Defaults to sequence length."},
|
||||
)
|
||||
is_time: bool = field(
|
||||
default=False,
|
||||
metadata={"help": "Whether the csv file has time results or memory results. Defaults to memory results."},
|
||||
)
|
||||
no_log_scale: bool = field(
|
||||
default=False,
|
||||
metadata={"help": "Disable logarithmic scale when plotting"},
|
||||
)
|
||||
is_train: bool = field(
|
||||
default=False,
|
||||
metadata={
|
||||
"help": "Whether the csv file has training results or inference results. Defaults to inference results."
|
||||
},
|
||||
)
|
||||
figure_png_file: Optional[str] = field(
|
||||
default=None,
|
||||
metadata={"help": "Filename under which the plot will be saved. If unused no plot is saved."},
|
||||
)
|
||||
short_model_names: Optional[List[str]] = list_field(
|
||||
default=None, metadata={"help": "List of model names that are used instead of the ones in the csv file."}
|
||||
)
|
||||
|
||||
|
||||
def can_convert_to_int(string):
|
||||
try:
|
||||
int(string)
|
||||
return True
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
|
||||
def can_convert_to_float(string):
|
||||
try:
|
||||
float(string)
|
||||
return True
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
|
||||
class Plot:
|
||||
def __init__(self, args):
|
||||
self.args = args
|
||||
self.result_dict = defaultdict(lambda: dict(bsz=[], seq_len=[], result={}))
|
||||
|
||||
with open(self.args.csv_file, newline="") as csv_file:
|
||||
reader = csv.DictReader(csv_file)
|
||||
for row in reader:
|
||||
model_name = row["model"]
|
||||
self.result_dict[model_name]["bsz"].append(int(row["batch_size"]))
|
||||
self.result_dict[model_name]["seq_len"].append(int(row["sequence_length"]))
|
||||
if can_convert_to_int(row["result"]):
|
||||
# value is not None
|
||||
self.result_dict[model_name]["result"][
|
||||
(int(row["batch_size"]), int(row["sequence_length"]))
|
||||
] = int(row["result"])
|
||||
elif can_convert_to_float(row["result"]):
|
||||
# value is not None
|
||||
self.result_dict[model_name]["result"][
|
||||
(int(row["batch_size"]), int(row["sequence_length"]))
|
||||
] = float(row["result"])
|
||||
|
||||
def plot(self):
|
||||
fig, ax = plt.subplots()
|
||||
title_str = "Time usage" if self.args.is_time else "Memory usage"
|
||||
title_str = title_str + " for training" if self.args.is_train else title_str + " for inference"
|
||||
|
||||
if not self.args.no_log_scale:
|
||||
# set logarithm scales
|
||||
ax.set_xscale("log")
|
||||
ax.set_yscale("log")
|
||||
|
||||
for axis in [ax.xaxis, ax.yaxis]:
|
||||
axis.set_major_formatter(ScalarFormatter())
|
||||
|
||||
for model_name_idx, model_name in enumerate(self.result_dict.keys()):
|
||||
batch_sizes = sorted(list(set(self.result_dict[model_name]["bsz"])))
|
||||
sequence_lengths = sorted(list(set(self.result_dict[model_name]["seq_len"])))
|
||||
results = self.result_dict[model_name]["result"]
|
||||
|
||||
(x_axis_array, inner_loop_array) = (
|
||||
(batch_sizes, sequence_lengths) if self.args.plot_along_batch else (sequence_lengths, batch_sizes)
|
||||
)
|
||||
|
||||
label_model_name = (
|
||||
model_name if self.args.short_model_names is None else self.args.short_model_names[model_name_idx]
|
||||
)
|
||||
|
||||
for inner_loop_value in inner_loop_array:
|
||||
if self.args.plot_along_batch:
|
||||
y_axis_array = np.asarray(
|
||||
[results[(x, inner_loop_value)] for x in x_axis_array if (x, inner_loop_value) in results],
|
||||
dtype=np.int,
|
||||
)
|
||||
else:
|
||||
y_axis_array = np.asarray(
|
||||
[results[(inner_loop_value, x)] for x in x_axis_array if (inner_loop_value, x) in results],
|
||||
dtype=np.float32,
|
||||
)
|
||||
|
||||
(x_axis_label, inner_loop_label) = (
|
||||
("batch_size", "len") if self.args.plot_along_batch else ("in #tokens", "bsz")
|
||||
)
|
||||
|
||||
x_axis_array = np.asarray(x_axis_array, np.int)[: len(y_axis_array)]
|
||||
plt.scatter(
|
||||
x_axis_array, y_axis_array, label=f"{label_model_name} - {inner_loop_label}: {inner_loop_value}"
|
||||
)
|
||||
plt.plot(x_axis_array, y_axis_array, "--")
|
||||
|
||||
title_str += f" {label_model_name} vs."
|
||||
|
||||
title_str = title_str[:-4]
|
||||
y_axis_label = "Time in s" if self.args.is_time else "Memory in MB"
|
||||
|
||||
# plot
|
||||
plt.title(title_str)
|
||||
plt.xlabel(x_axis_label)
|
||||
plt.ylabel(y_axis_label)
|
||||
plt.legend()
|
||||
|
||||
if self.args.figure_png_file is not None:
|
||||
plt.savefig(self.args.figure_png_file)
|
||||
else:
|
||||
plt.show()
|
||||
|
||||
|
||||
def main():
|
||||
parser = HfArgumentParser(PlotArguments)
|
||||
plot_args = parser.parse_args_into_dataclasses()[0]
|
||||
plot = Plot(args=plot_args)
|
||||
plot.plot()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
examples/tensorflow/benchmarking/requirements.txt
Normal file
1
examples/tensorflow/benchmarking/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
tensorflow >= 2.3
|
||||
38
examples/tensorflow/multiple-choice/README.md
Normal file
38
examples/tensorflow/multiple-choice/README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
# Multiple Choice
|
||||
|
||||
## Fine-tuning on SWAG
|
||||
|
||||
```bash
|
||||
export SWAG_DIR=/path/to/swag_data_dir
|
||||
python ./examples/multiple-choice/run_tf_multiple_choice.py \
|
||||
--task_name swag \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--data_dir $SWAG_DIR \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 3 \
|
||||
--max_seq_length 80 \
|
||||
--output_dir models_bert/swag_base \
|
||||
--per_gpu_eval_batch_size=16 \
|
||||
--per_device_train_batch_size=16 \
|
||||
--logging-dir logs \
|
||||
--gradient_accumulation_steps 2 \
|
||||
--overwrite_output
|
||||
```
|
||||
3
examples/tensorflow/multiple-choice/requirements.txt
Normal file
3
examples/tensorflow/multiple-choice/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
tensorflow >= 2.3
|
||||
34
examples/tensorflow/question-answering/README.md
Normal file
34
examples/tensorflow/question-answering/README.md
Normal file
@@ -0,0 +1,34 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
## SQuAD with the Tensorflow Trainer
|
||||
|
||||
```bash
|
||||
python run_tf_squad.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--output_dir model \
|
||||
--max_seq_length 384 \
|
||||
--num_train_epochs 2 \
|
||||
--per_gpu_train_batch_size 8 \
|
||||
--per_gpu_eval_batch_size 16 \
|
||||
--do_train \
|
||||
--logging_dir logs \
|
||||
--logging_steps 10 \
|
||||
--learning_rate 3e-5 \
|
||||
--doc_stride 128
|
||||
```
|
||||
|
||||
For the moment evaluation is not available in the Tensorflow Trainer only the training.
|
||||
2
examples/tensorflow/question-answering/requirements.txt
Normal file
2
examples/tensorflow/question-answering/requirements.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
datasets >= 1.4.0
|
||||
tensorflow >= 2.3.0
|
||||
67
examples/tensorflow/text-classification/README.md
Normal file
67
examples/tensorflow/text-classification/README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
# Text classification examples
|
||||
|
||||
## GLUE tasks
|
||||
|
||||
Based on the script [`run_tf_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/tensorflow/text-classification/run_tf_glue.py).
|
||||
|
||||
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: [General Language Understanding Evaluation](https://gluebenchmark.com/).
|
||||
|
||||
This script has an option for mixed precision (Automatic Mixed Precision / AMP) to run models on Tensor Cores (NVIDIA Volta/Turing GPUs) and future hardware and an option for XLA, which uses the XLA compiler to reduce model runtime.
|
||||
Options are toggled using `USE_XLA` or `USE_AMP` variables in the script.
|
||||
These options and the below benchmark are provided by @tlkh.
|
||||
|
||||
Quick benchmarks from the script (no other modifications):
|
||||
|
||||
| GPU | Mode | Time (2nd epoch) | Val Acc (3 runs) |
|
||||
| --------- | -------- | ----------------------- | ----------------------|
|
||||
| Titan V | FP32 | 41s | 0.8438/0.8281/0.8333 |
|
||||
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
|
||||
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
|
||||
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
|
||||
| 1080 Ti | FP32 | 55s | - |
|
||||
|
||||
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
|
||||
|
||||
|
||||
## Run generic text classification script in TensorFlow
|
||||
|
||||
The script [run_tf_text_classification.py](https://github.com/huggingface/transformers/blob/master/examples/tensorflow/text-classification/run_tf_text_classification.py) allows users to run a text classification on their own CSV files. For now there are few restrictions, the CSV files must have a header corresponding to the column names and not more than three columns: one column for the id, one column for the text and another column for a second piece of text in case of an entailment classification for example.
|
||||
|
||||
To use the script, one as to run the following command line:
|
||||
```bash
|
||||
python run_tf_text_classification.py \
|
||||
--train_file train.csv \ ### training dataset file location (mandatory if running with --do_train option)
|
||||
--dev_file dev.csv \ ### development dataset file location (mandatory if running with --do_eval option)
|
||||
--test_file test.csv \ ### test dataset file location (mandatory if running with --do_predict option)
|
||||
--label_column_id 0 \ ### which column corresponds to the labels
|
||||
--model_name_or_path bert-base-multilingual-uncased \
|
||||
--output_dir model \
|
||||
--num_train_epochs 4 \
|
||||
--per_device_train_batch_size 16 \
|
||||
--per_device_eval_batch_size 32 \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--do_predict \
|
||||
--logging_steps 10 \
|
||||
--evaluation_strategy steps \
|
||||
--save_steps 10 \
|
||||
--overwrite_output_dir \
|
||||
--max_seq_length 128
|
||||
```
|
||||
|
||||
5
examples/tensorflow/text-classification/requirements.txt
Normal file
5
examples/tensorflow/text-classification/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
accelerate
|
||||
datasets >= 1.1.3
|
||||
sentencepiece != 0.1.92
|
||||
protobuf
|
||||
tensorflow >= 2.3
|
||||
@@ -1,20 +0,0 @@
|
||||
{ "translation": { "en": "UN Chief Says There Is No Military Solution in Syria Secretary-General Ban Ki-moon says his response to Russia's stepped up military support for Syria is that \"there is no military solution\" to the nearly five-year conflict and more weapons will only worsen the violence and misery for millions of people. The U.N. chief again urged all parties, including the divided U.N. Security Council, to unite and support inclusive negotiations to find a political solution. Ban told a news conference Wednesday that he plans to meet with foreign ministers of the five permanent council nations - the U.S., Russia, China, Britain and France - on the sidelines of the General Assembly's ministerial session later this month to discuss Syria.", "ro": "Șeful ONU declară că nu există soluții militare în Siria Secretarul General Ban Ki-moon afirmă că răspunsul său la suportul militar al Rusiei pentru Siria este că „nu există o soluție militară” la conflictul care durează de aproape cinci ani iar mai multe arme nu ar face decât să agraveze violența și suferința a milioane de oameni. Șeful ONU a solicitat din nou tuturor părților, inclusiv Consiliului de securitate ONU divizat să se unifice și să susțină negocierile pentru a găsi o soluție politică. Ban a declarat miercuri în cadrul unei conferințe că intenționează să se întâlnească luna aceasta cu miniștrii de externe din cinci țări permanent prezente în consiliu - SUA, Rusia, China, Anglia și Franța - pe marginea sesiunii ministeriale a Adunării Generale pentru a discuta despre Siria." } }
|
||||
{ "translation": { "en": "He expressed regret that divisions in the council and among the Syrian people and regional powers \"made this situation unsolvable.\" Ban urged the five permanent members to show the solidarity and unity they did in achieving an Iran nuclear deal in addressing the Syria crisis. 8 Poll Numbers That Show Donald Trump Is For Real Some have tried to label him a flip-flopper. Others have dismissed him as a joke. And some are holding out for an implosion. But no matter how some Republicans are trying to drag Donald Trump down from atop the polls, it hasn't worked (yet).", "ro": "Ban și-a exprimat regretul că divizările în consiliu și între poporul sirian și puterile regionale „au făcut această situație de nerezolvat”. Ban le-a cerut celor cinci membri permanenți să dea dovadă de solidaritatea și unitatea arătate atunci când au reușit să încheie un acord referitor la armele nucleare ale Iranului, abordând astfel criza din Siria. 8 cifre din sondaje care arată că Donald Trump are șanse reale Unii au încercat să îl eticheteze ca politician „flip-flop”. Alții l-au numit o glumă. Iar alții așteaptă implozia. Însă indiferent de modul în care unii republicani încearcă să îl dărâme pe Donald Trump din vârful sondajelor, nu a funcționat (încă)." } }
|
||||
{ "translation": { "en": "Ten of the last 11 national polls have shown Donald Trump's lead at double digits, and some are starting to ask seriously what it means for the real estate mogul's nomination chances. Of course, it's still early in the election cycle. None of this is to say that Trump is likely to win the Republican nomination. Pundits point out that at this time in 2011, Rick Perry's lead was giving way to a rising Herman Cain, neither of whom won even one state in the nomination process. And there are many reasons he would struggle in a general election. But outside groups like Jeb Bush's Super PAC and the economic conservative group Club for Growth are recognizing Trump's staying power and beginning to unload their dollars to topple him.", "ro": "Zece din ultimele 11 sondaje naționale au arătat că Donald Trump conduce cu un procent din două cifre iar unele voci încep să se întrebe serios ce înseamnă acest lucru pentru șansele de numire ale mogulului imobiliar. Desigur, este încă prematur. Nimic din toate acestea nu spune că Trump va câștiga cursa pentru nominalizarea republicanilor. Pundits arată că, în aceeași perioadă a anului 2011, avansul lui Rick Perry îi făcea loc lui Herman Cain în sondaje, dar niciunul dintre ei nu a câștigat în vreun stat în cursa de nominalizare. Iar motivele pentru care s-ar lupta din greu la alegerile generale sunt numeroase. Însă grupurile din exterior precum Super PAC al lui Jeb Bush și grupul conservator economic Club for Growth admit puterea lui Trump și încep să îl susțină cu bani." } }
|
||||
{ "translation": { "en": "Here are some recent poll numbers that suggest that the real estate mogul isn't just a passing phase: Trump's favorability ratings have turned 180 degrees. Right before Donald Trump announced his candidacy in mid-June, a Monmouth University poll showed only two in 10 Republicans had a positive view of the real estate mogul. By mid-July, it was 40 percent. In early August, it was 52 percent. Now, six in 10 Republicans have a favorable view of Donald Trump. Roughly three in 10 say they have a negative view. And these numbers hold up in early states. A Quinnipiac poll in Iowa last week found that 60 percent of Republicans there had a favorable view of Trump.", "ro": "În continuare vă prezentăm câteva cifre din sondaje recente care sugerează că mogulul imobiliar nu este doar ceva trecător: Cifrele care indică susținerea față de Trump s-au întors la 180 grade. Chiar înainte ca Donald Trump să își anunțe candidatura, la mijlocul lui iunie, un sondaj realizat de Universitatea din Monmouth arăta că doar doi din 10 republicani aveau o părere pozitivă despre mogulul imobiliar. Până la mijlocul lui iulie, procentul a urcat la 40%. La începutul lui august, era 52%. În prezent, șase din 10 republicani au o părere favorabilă despre Donald Trump. Aproximativ trei din 10 declară că au o părere negativă. Aceste cifre se mențin. Un sondaj realizat săptămâna trecută de Quinnipiac în Iowa a concluzionat că 60% dintre republicanii din regiune au o părere favorabilă despre Trump." } }
|
||||
{ "translation": { "en": "Two-thirds of GOP voters would be happy with Trump as the nominee. In a CNN/ORC poll last week, 67 percent of Republicans said they would be either \"enthusiastic\" or \"satisfied\" if Trump were the nominee. Only two in 10 say they would be \"upset\" if he were the nominee. Only Ben Carson generates roughly the same level of enthusiasm as Trump (43 percent say they would be \"enthusiastic\" vs. 40 percent who say the same of Trump). The next closest in enthusiasm? Marco Rubio with only 21 percent.", "ro": "Două treimi dintre alegătorii GOP ar fi fericiți dacă Trump ar câștiga cursa pentru nominalizare. Într-un sondaj realizat săptămâna trecută de CNN/ORC, 67% dintre republicani au declarat că ar fi „entuziasmați” sau „mulțumiți” dacă Trump ar câștiga cursa pentru nominalizare. Doar doi din 10 declară că ar fi „supărați” dacă Trump ar câștiga cursa pentru nominalizare. Doar Ben Carson generează aproximativ același nivel de entuziasm ca Trump (43% declară că ar fi „entuziasmați” față de 40% care declară același lucru despre Trump). Cel mai aproape în ceea ce privește entuziasmul? Marco Rubio, cu doar 21%." } }
|
||||
{ "translation": { "en": "On the flip side, 47 percent of Republican voters say they would be \"dissatisfied\" or \"upset\" if establishment favorite Jeb Bush becomes the nominee. A majority of Republicans don't see Trump's temperament as a problem. While Donald Trump has been widely criticized for his bombast and insults, 52 percent of leaned Republican voters nationwide think that the real estate mogul has the right temperament to be president, according to Monday's ABC News/Washington Post poll. The same number holds in the first-in-the-nation caucus state of Iowa, where the same 52 percent of Republicans think he has the personality to be commander in chief, according to Quinnipiac last week.", "ro": "De partea cealaltă, 47% dintre alegătorii republicani afirmă că ar fi „nemulțumiți” sau „supărați” dacă favoritul Jeb Bush câștigă cursa pentru nominalizare. Majoritatea republicanilor nu consideră temperamentul lui Trump o problemă. Deși Donald Trump a fost puternic criticat pentru insultele aduse și stilul său bombastic, 52% dintre alegătorii republicani la nivel național consideră că mogulul imobiliar are temperamentul potrivit pentru a fi președinte, conform sondajului realizat luni de ABC News/Washington Post. Regăsim aceleași cifre în statul Iowa, unde tot 52% dintre republicani cred că Trump are personalitatea potrivită pentru a fi conducător, conform sondajului realizat săptămâna trecută de Quinnipiac." } }
|
||||
{ "translation": { "en": "Still, 44 percent think he doesn't have the personality to serve effectively, and almost six in 10 independents say his temperament does not belong in the White House, according to ABC/Post. Republican voters are getting used to the idea. When they put on their pundit hats, Republican voters think Trump is for real. When asked who is most likely to win the GOP nomination, four in 10 said Trump was the best bet, according to a CNN/ORC poll out last week. That's a change from when four in 10 placed their money on Jeb Bush in late July. Full disclosure: GOP voters haven't had the clearest crystal ball in the past.", "ro": "Totuși, 44% sunt de părere că nu are personalitatea necesară pentru a acționa eficient și aproape șase din 10 independenți afirmă că temperamentul său nu are ce căuta la Casa Albă, conform ABC/Post. Alegătorii republicani se obișnuiesc cu ideea. Atunci când iau atitudinea de intelectuali, alegătorii republicani consideră că Trump este autentic. Conform unui sondaj realizat săptămâna trecută de CNN/ORC, la întrebarea cine are cele mai multe șanse să câștige cursa pentru nominalizare GOP, patru din 10 au declarat că Trump. Situația s-a schimbat față de finalul lui iulie, când patru din 10 ar fi pariat pe Jeb Bush. Informare completă: în trecut, alegătorii GOP nu au citit foarte bine viitorul." } }
|
||||
{ "translation": { "en": "At this time last cycle, four in 10 Republicans picked Rick Perry to win the nomination, vs. only 28 percent for eventual nominee Mitt Romney. Still, it shows that a plurality of GOP voters see Trump's campaign as plausible. Even if Republicans rallied around another candidate, Trump still beats almost everyone. Some pundits point out that the splintered field is likely contributing to Trump's lead, while anti-Trump support is be spread diffusely among more than a dozen other candidates. But a Monmouth University poll in early September shows that, in a hypothetical head-to-head matchup between Trump and most other Republican candidates, Trump almost always garners majority support.", "ro": "În aceeași perioadă a ultimelor alegeri, patru din 10 republicani l-au ales pe Rick Perry în cursa pentru nominalizare, față de doar 28% pentru Mitt Romney. Însă, aceste cifre arată că majoritatea alegătorilor GOP consideră plauzibilă campania lui Trump. Chiar dacă republicanii sau repliat spre un alt candidat. Trump încă se află în fruntea tuturor. Unele voci spun că situația divizată va contribui probabil la victoria lui Trump, în timp ce susținerea contra lui Trump se va împărți la mai mult de doisprezece candidați. Însă un sondaj derulat la începutul lui septembrie de Universitatea din Monmouth arată că, în situația ipotetică a unei colaborări între Trump și majoritatea celorlalți candidați republicani, aproape întotdeauna Trump va beneficia de susținerea majoritară." } }
|
||||
{ "translation": { "en": "He leads Carly Fiorina by 13 points, Marco Rubio by 14 points, Walker by 15 points, Jeb Bush by 19 points, and, finally, Rand Paul, John Kasich and Chris Christie by 33 points each. He's in a dead heat with Ted Cruz. The only candidate who beats him? Ben Carson would lead the businessman by a wide 19 points in a hypothetical head-to-head. A bare majority of Donald Trump's supporters say they've made up their minds. A new CBS/NYT poll out on Tuesday shows that just more than half of voters who support Trump say they have locked in their votes. Obviously, a lot can happen to change that, and no one can really say they would never change their mind.", "ro": "Trump se află la distanță de 13 puncte de Carly Fiorina, la 14 puncte de Marco Rubio, la 15 puncte de Walker, la 19 puncte de Jeb Bush și, în cele din urmă, la câte 33 de puncte față de Rand Paul, John Kasich și Chris Christie. Este aproape la egalitate cu Ted Cruz. Singurul candidat care îl învinge? Ben Carson l-ar învinge pe omul de afaceri cu 19 puncte într-o confruntare ipotetică de unu la unu. Majoritatea susținătorilor lui Donald Trump declară că s-au decis. Un nou sondaj realizat marți de CBS/NYT arată că peste jumătate dintre alegătorii care îl susțin pe Trump declară că nu își schimbă opțiunea de vot. Evident, se pot întâmpla multe în acest sens și nimeni nu poate spune că aceștia nu se vor răzgândi niciodată." } }
|
||||
{ "translation": { "en": "46 percent said they are leaving the door open to switching candidates. Still, Trump's strongest competition at the moment is from fellow outsider neurosurgeon Ben Carson, but voters who say they have made up their minds are twice as likely to go for Trump. Six in 10 Republicans say they agree with Trump on immigration. Even since Donald Trump called immigrants from Mexico \"rapists\" in his campaign announcement speech two months ago, immigration has been front and center in the 2016 conversation. Some are worried that Trump's bombast will drive crucial Hispanic voters away from the Republican Party and damage rebranding efforts.", "ro": "46% afirmă că lasă portița deschisă posibilității de a-și schimba opțiunea. Cu toate acestea, cel mai important adversar al lui Trump este în prezent neurochirurgul Ben Carson, însă este de două ori mai probabil ca alegătorii care declară că s-au decis să voteze cu Trump. Șase din 10 republicani afirmă că sunt de acord cu Trump în problema imigrării. De când Donald Trump i-a numit pe imigranții din Mexic „violatori” în discursul de deschidere a campaniei sale, în urmă cu două luni, imigrarea a fost subiectul central în campania pentru 2016. Unii sunt îngrijorați că stilul bombastic al lui Trump va duce la o scindare între alegătorii hispanici importanți și Partidul Republican și va prejudicia eforturile de rebranding." } }
|
||||
{ "translation": { "en": "But according to Monday's new ABC/Post poll, six in 10 Republicans say they agree with Trump on immigration issues. So as long as immigration remains in the spotlight, it seems Donald Trump will remain too. Frustration with government is climbing to new highs. Donald Trump and Ben Carson now account for roughly half of the support from Republican voters, largely due to their outsider status. Six in 10 Republicans in Monday's new ABC/Post poll say they want a political outsider over someone with government experience. And they are angry at Washington, too.", "ro": "Însă, conform sondajului realizat luni de ABC/Post, șase din 10 republicani afirmă că sunt de acord cu Trump în problema imigrării. Așa că, se pare că atâta timp cât problema imigrării rămâne în lumina reflectoarelor, la fel va rămâne și Doland Trump. Frustrarea față de autorități atinge noi culmi. Donald Trump și Ben Carson sunt acum susținuți de aproape jumătate dintre alegătorii republicani, în mare parte datorită statutului lor de outsideri. Conform sondajului realizat luni de ABC/Post, șase din 10 republicani afirmă că preferă un outsider politic în detrimentul cuiva cu experiență în guvernare. Oamenii sunt de asemenea supărați pe autoritățile de la Washington." } }
|
||||
{ "translation": { "en": "A Des Moines Register/Bloomberg poll in Iowa from two weeks ago shows that three in four Iowa Republicans are frustrated with Republicans in Congress, with 54 percent \"unsatisfied\" and 21 percent \"mad as hell.\" Jeremy Corbyn to make debut at Prime Minister's Questions Since his election, Mr Corbyn's debut at PMQs has been keenly awaited New Labour leader Jeremy Corbyn is to make his debut at Prime Minister's Questions later, taking on David Cameron for the first time.", "ro": "Un sondaj derulat în urmă cu două săptămâni în Iowa de către Des Moines Register/Bloomberg arată că trei din patru republicani din Iowa sunt frustrați de prestația republicanilor din COngres, 54% declarându-se „nemulțumiți” iar 21% „nervoși la culme”. Jeremy Corbyn își face debutul la Prime Minister's Questions Încă de la alegerea sa, debutul domnului Corbyn la PMQs a fost îndelung așteptat Noul lider al Partidului Laburist, Jeremy Corbyn, își va face mai târziu debutul la Prime Minister's Questions, confruntându-se pentru prima dată cu David Cameron." } }
|
||||
{ "translation": { "en": "Mr Corbyn will rise to ask the first of his six allotted questions shortly after midday, with his performance likely to be closely scrutinised by the media and Labour MPs. He has called for \"less theatre and more facts\" at the weekly showpiece. He has also said he could skip some sessions, leaving them to colleagues. The encounter will be the first parliamentary test of Mr Corbyn's leadership, coming after his appointment of a shadow cabinet and his speech to the TUC annual congress on Tuesday.", "ro": "Dl Corbyn va adresa primele dintre cele șase întrebări la care are dreptul la scurt timp după prânz; prestația sa va fi probabil analizată îndeaproape de mass-media și parlamentarii laburiști. În cadrul aparițiilor săptămânale, el a cerut „mai puțin teatru și mai multe fapte”. A declarat de asemenea că poate renunța la câteva participări și că le cedează colegilor săi. Confruntarea va fi primul test parlamentar al Dl Corbyn în poziție de lider, venind după ce a numit un „cabinet fantomă” și după discursul pe care l-a ținut marți la congresul anual TUC." } }
|
||||
{ "translation": { "en": "Meanwhile, the Labour leader's decision to stand in silence during the singing of the national anthem at a service on Tuesday to mark the 75th anniversary of the Battle of Britain has attracted criticism from a number of Tory MPs and is the focus of several front page stories in the newspapers. Mr Corbyn's decision not to sing the national anthem has attracted attention A spokesman for Mr Corbyn said he had \"stood in respectful silence\" and did recognise the \"heroism of the Royal Air Force in the Battle of Britain.\"", "ro": "Între timp, decizia liderului Partidului laburist de a păstra tăcerea la rostirea imnului național în cadrul unei slujbe ținute marți cu ocazia aniversării a 75 de ani de la Bătălia Angliei a atras critici din partea unor parlamentari conservatori și a ținut prima pagină a ziarelor. Decizia domnului Corbyn de a nu cânta imnul național a atras atenția Un purtător de cuvânt al Dl Corbyn a declarat că acesta „a păstrat tăcerea în mod respectuos” și a recunoscut „eroismul Forțelor aeriene britanice în Bătălia Angliei.”" } }
|
||||
{ "translation": { "en": "But a member of Mr Corbyn's shadow cabinet, Owen Smith, told BBC Two's Newsnight programme he would have advised the Labour leader to sing the national anthem \"irrespective\" of his belief that the monarchy should be abolished. Nearly a dozen shadow ministers have refused to serve in Mr Corbyn's top team, citing differences over the economy, defence and foreign affairs, while less than a sixth of the parliamentary party originally backed him as leader. BBC political correspondent Robin Brant says policy differences are also \"stacking up\" within Labour following Mr Corbyn's appointment over its position on the European Union and the government's cap on benefits.", "ro": "Însă un membru al cabinetului fantomă al Dl Corbyn, Owen Smith, a declarat pentru emisiunea Two's Newsnight transmisă de BBC că i-ar fi recomandat liderului laburist să cânte imnul național „indiferent” de credința sa că monarhia ar trebui abolită. În jur de doisprezece miniștri din cabinetul fantomă au refuzat să facă parte din echipa de frunte a Dl Corbyn, argumentând prin diferențe de opinie legate de economie, apărare și externe, în timp ce mai puțin de o șesime din partidul parlamentar l-a susținut ca lider. Corespondentul politic al BBC, Robin Brant, declară că diferențele de politică „se cumulează” în Partidul Laburist după numirea domnului Corbyn referitor la poziția sa față de Uniunea Europeană și limita de beneficii." } }
|
||||
{ "translation": { "en": "Mr Corbyn told the TUC conference Labour was putting forward amendments to remove the whole idea of a cap altogether. Hours later Mr Smith, the shadow work and pensions secretary, said the party was \"very clear\" that it was only opposing government plans to reduce the level of cap from £26,000 to £23,000. Mr Corbyn will be the fifth Labour leader that David Cameron has faced across the despatch box over the past decade since he became Tory leader. The Labour leader, who has promised a different approach to politics, says he has \"crowd sourced\" ideas for questions to ask Mr Cameron and has been given more than 30,000 suggestions.", "ro": "Dl Corbyn a declarat la conferința TUC că Partidul Laburist va aduce modificări prin care se va elimina integral ideea limitării. Câteva ore mai târziu, Dl Smith, Ministrul Muncii și Pensiilor, a declarat că partidul „este foarte clar” în opoziția exclusivă față de planurile guvernului de a reduce nivelul „cap” de la 26.000 lire la 23.000 lire. Dl Corbyn va fi al cincilea lider laburist cu care se confruntă David Cameron la tribună în ultimul deceniu, de când a preluat conducerea Partidului Conservator. Liderul laburist, care a promis o abordare diferită a politicii, spune că are idei „din surse externe” pentru întrebări pe care să i le adreseze Domnului Cameron și că a primit peste 30.000 de sugestii." } }
|
||||
{ "translation": { "en": "The Islington North MP has said PMQs is too confrontational and that he will refrain from both \"repartee\" and trading barbs, instead vowing to focus on serious issues such as poverty, inequality and the challenges facing young people. Mr Corbyn has said that Angela Eagle, the shadow business secretary, will deputise for him at PMQs when he does not attend - for instance when Mr Cameron is travelling abroad. He has also floated the idea of allowing other colleagues to take the floor on occasion, saying he had approached the Commons Speaker John Bercow to discuss the issue.", "ro": "Parlamentarul Islington North a afirmat că PMQs implică un nivel de confruntare prea înalt și că se va abține de la replici și atacuri, angajându-se să se concentreze în schimb pe probleme serioase precum sărăcia, inegalitatea și provocările cu care se confruntă tinerii. Dl Corbyn a declarat că Angela Eagle, Ministrul de finanțe, îi va ține locul la PMQs atunci când el nu poate participa - de exemplu atunci când Dl Cameron se deplasează în străinătate. A exprimat de asemenea ideea că va permite altor colegi să ia cuvântul ocazional, spunând că l-a abordat pe Președintele Camerei Deputaților, John Bercow, pentru a discuta acest aspect." } }
|
||||
{ "translation": { "en": "When he became leader in 2005, Mr Cameron said he wanted to move away from the \"Punch and Judy\" style of politics often associated with PMQs but admitted some years later that he had failed. Since it was first televised in 1990, PMQs has been seen as a key barometer of a leader's judgement, their command of the Commons and their standing among their fellow MPs although critics have argued it has become a caricature and is in need of far-reaching reforms. 'Shot in Joburg': Homeless youth trained as photographers Downtown Johannesburg is a tough place to be homeless.", "ro": "În 2005, când a preluat conducerea, Dl Cameron a declarat că dorește să renunțe la stilul politic „Punch and Judy” asociat adesea cu PMQs însă a recunoscut câțiva ani mai târziu că nu a reușit în demersul său. De la prima transmisie, în 1990, PMQs a fost considerată un barometru cheie al raționamentului unui lider, al modului în care acesta conduce Camera Deputaților și a poziției sale în rândul colegilor parlamentari, deși criticii afirmă a ca devenit o caricatură și că are nevoie de o reformare profundă. „Cadru în Joburg”: Tineri fără adăpost beneficiază de cursuri de fotografie Este dificil să fii un om fără adăpost în Johannesburg." } }
|
||||
{ "translation": { "en": "But one group of former street children have found a way to learn a skill and make a living. \"I was shot in Joburg\" is a non-profit studio that teaches homeless youngsters how to take photographs of their neighbourhood and make a profit from it. BBC News went to meet one of the project's first graduates. JD Sports boss says higher wages could hurt expansion JD Sports Executive Chairman Peter Cowgill says a higher minimum wage for UK workers could mean \"more spending power in the pockets of potential consumers.\" But that spending power is unlikely to outweigh the higher labour costs at his firm, he says.", "ro": "Însă un grup de oameni care au trăit pe străzi în copilărie au găsit un mod de a învăța o meserie și de a-și câștiga traiul. „I was shot în Joburg” este un studio non-profit care îi învață pe tinerii fără adăpost să facă fotografii ale zonelor în care trăiesc și să câștige bani din asta. BBC News s-a întâlnit cu unul dintre primii absolvenți ai proiectului. Șeful JD Sports spune că salariile mai mari ar putea dăuna extinderii Președintele JD Sports, Peter Cowgill, declară că o creștere a salariului minim în Marea Britanie ar putea însemna „o putere de cumpărare mai mare în buzunarele potențialilor consumatori.” Este însă puțin probabil ca respectiva putere de cumpărare să depășească costurile mai mari pentru forța de muncă în cadrul firmei, afirmă el." } }
|
||||
{ "translation": { "en": "The costs could hit JD Sports' expansion plans, he added, which could mean fewer extra jobs. Thanasi Kokkinakis backed by Tennis Australia president Steve Healy Thanasi Kokkinakis deserves kudos rather than criticism for his behaviour. Thanasi Kokkinakis has been the collateral damage in the recent storm around his friend Nick Kyrgios and deserves kudos rather than criticism for his own behaviour, according to Tennis Australia president Steve Healy.", "ro": "Costurile ar putea avea impact asupra planurilor de extindere ale JD Sports, a adăugat el, ceea ce ar putea însemna mai puține locuri de muncă noi. Thanasi Kokkinakis susținut de președintele Tennis Australia, Steve Healy Thanasi Kokkinakis ar merita să fie lăudat și nu criticat pentru comportamentul său. Thanasi Kokkinakis a fost victimă colaterală în „furtuna” creată în jurul prietenului său, Nick Kyrgios, iar comportamentul său merită mai degrabă cuvinte de laudă și nu critică, în opinia președintelui Tennis Australia, Steve Healy." } }
|
||||
@@ -1,11 +0,0 @@
|
||||
{ "translation": { "en": "Corrections to votes and voting intentions: see Minutes Assignment conferred on a Member: see Minutes Membership of committees and delegations: see Minutes Decisions concerning certain documents: see Minutes Forwarding of texts adopted during the sitting: see Minutes Dates for next sittings: see Minutes", "ro": "Corectările voturilor şi intenţiile de vot: a se vedea procesul-verbal Misiune încredinţată unui deputat: consultaţi procesul-verbal Componenţa comisiilor şi a delegaţiilor: a se vedea procesul-verbal Decizii privind anumite documente: a se vedea procesul-verbal Transmiterea textelor adoptate în cursul prezentei şedinţe: a se vedea procesul-verbal Calendarul următoarelor şedinţe: a se vedea procesul-verbal" } }
|
||||
{ "translation": { "en": "Membership of Parliament: see Minutes Approval of Minutes of previous sitting: see Minutes Membership of Parliament: see Minutes Verification of credentials: see Minutes Documents received: see Minutes Written statements and oral questions (tabling): see Minutes Petitions: see Minutes Texts of agreements forwarded by the Council: see Minutes Action taken on Parliament's resolutions: see Minutes Agenda for next sitting: see Minutes Closure of sitting (The sitting was closed at 7.45 p.m.)", "ro": "Componenţa Parlamentului: a se vedea procesul-verbal Aprobarea procesului-verbal al şedinţei precedente: a se vedea procesul-verbal Componenţa Parlamentului: a se vedea procesul-verbal Verificarea prerogativelor: a se vedea procesul-verbal Depunere de documente: a se vedea procesul-verbal Declaraţii scrise şi întrebări orale (depunere): consultaţi procesul-verbal Petiţii: a se vedea procesul-verbal Transmiterea de către Consiliu a textelor acordurilor: a se vedea procesul-verbal Cursul dat rezoluţiilor Parlamentului: a se vedea procesul-verbal Ordinea de zi a următoarei şedinţe: a se vedea procesul-verbal Ridicarea şedinţei (Se levanta la sesión a las 19.45 horas)" } }
|
||||
{ "translation": { "en": "Election of Vice-Presidents of the European Parliament (deadline for submitting nominations): see Minutes (The sitting was suspended at 12.40 p.m. and resumed at 3.00 p.m.) Election of Quaestors of the European Parliament (deadline for submitting nominations): see Minutes (The sitting was suspended at 3.25 p.m. and resumed at 6.00 p.m.) Agenda for next sitting: see Minutes Closure of sitting (The sitting was closed at 6.15 p.m.) Opening of the sitting (The sitting was opened at 9.35 a.m.) Documents received: see Minutes Approval of Minutes of previous sitting: see Minutes Membership of Parliament: see Minutes", "ro": "Alegerea vicepreşedinţilor Parlamentului European (termenul de depunere a candidaturilor): consultaţi procesul-verbal (Die Sitzung wird um 12.40 Uhr unterbrochen und um 15.00 Uhr wiederaufgenommen). Alegerea chestorilor Parlamentului European (termenul de depunere a candidaturilor): consultaţi procesul-verbal (Die Sitzung wird um 15.25 Uhr unterbrochen und um 18.00 Uhr wiederaufgenommen). Ordinea de zi a următoarei şedinţe: a se vedea procesul-verbal Ridicarea şedinţei (Die Sitzung wird um 18.15 Uhr geschlossen.) Deschiderea şedinţei (Die Sitzung wird um 9.35 Uhr eröffnet.) Depunerea documentelor: a se vedea procesul-verbal Aprobarea procesului-verbal al şedinţei precedente: a se vedea procesul-verbal Componenţa Parlamentului: a se vedea procesul-verbal" } }
|
||||
{ "translation": { "en": "Membership of committees (deadline for tabling amendments): see Minutes (The sitting was suspended at 7 p.m. and resumed at 9 p.m.) Agenda for next sitting: see Minutes Closure of sitting (The sitting was suspended at 23.25 p.m.) Documents received: see Minutes Communication of Council common positions: see Minutes (The sitting was suspended at 11.35 a.m. and resumed for voting time at noon) Approval of Minutes of previous sitting: see Minutes Committee of Inquiry into the crisis of the Equitable Life Assurance Society (extension of mandate): see Minutes", "ro": "Componenţa comisiilor (termenul de depunere a amendamentelor): consultaţi procesul-verbal (La seduta, sospesa alle 19.00, è ripresa alle 21.00) Ordinea de zi a următoarei şedinţe: a se vedea procesul-verbal Ridicarea şedinţei (Die Sitzung wird um 23.25 Uhr geschlossen.) Depunerea documentelor: a se vedea procesul-verbal Comunicarea poziţiilor comune ale Parlamentului: a se vedea procesul-verbal (La séance, suspendue à 11h35 dans l'attente de l'Heure des votes, est reprise à midi) Aprobarea procesului-verbal al şedinţei precedente: a se vedea procesul-verbal Comisia de anchetă privind criza societăţii de asigurări \"Equitable Life” (prelungirea mandatului): consultaţi procesul-verbal" } }
|
||||
{ "translation": { "en": "Announcement by the President: see Minutes 1. Membership of committees (vote) 2. Amendment of the ACP-EC Partnership Agreement (vote) 4. Certification of train drivers operating locomotives and trains on the railway system in the Community (vote) 6. Law applicable to non-contractual obligations (\"ROME II\") (vote) 8. Seventh and eighth annual reports on arms exports (vote) Corrections to votes and voting intentions: see Minutes Membership of committees and delegations: see Minutes Request for waiver of parliamentary immunity: see Minutes Decisions concerning certain documents: see Minutes", "ro": "Comunicarea Preşedintelui: consultaţi procesul-verbal 1. Componenţa comisiilor (vot) 2. Modificarea Acordului de parteneriat ACP-CE (\"Acordul de la Cotonou”) (vot) 4. Certificarea mecanicilor de locomotivă care conduc locomotive şi trenuri în sistemul feroviar comunitar (vot) 6. Legea aplicabilă obligaţiilor necontractuale (\"Roma II”) (vot) 8. Al şaptelea şi al optulea raport anual privind exportul de armament (vot) Corectările voturilor şi intenţiile de vot: a se vedea procesul-verbal Componenţa comisiilor şi a delegaţiilor: a se vedea procesul-verbal Cerere de ridicare a imunităţii parlamentare: consultaţi procesul-verbal Decizii privind anumite documente: a se vedea procesul-verbal" } }
|
||||
{ "translation": { "en": "Written statements for entry", "ro": "Declaraţii scrise înscrise" } }
|
||||
{ "translation": { "en": "Written statements for entry in the register (Rule 116): see Minutes Forwarding of texts adopted during the sitting: see Minutes Dates for next sittings: see Minutes Adjournment of the session I declare the session of the European Parliament adjourned. (The sitting was closed at 1 p.m.) Approval of Minutes of previous sitting: see Minutes Membership of Parliament: see Minutes Request for the defence of parliamentary immunity: see Minutes Appointments to committees (proposal by the Conference of Presidents): see Minutes Documents received: see Minutes Texts of agreements forwarded by the Council: see Minutes", "ro": "Declaraţii scrise înscrise în registru (articolul 116 din Regulamentul de procedură): a se vedea procesul-verbal Transmiterea textelor adoptate în cursul prezentei şedinţe: a se vedea procesul-verbal Calendarul următoarelor şedinţe: a se vedea procesul-verbal Întreruperea sesiunii Dichiaro interrotta la sessione del Parlamento europeo. (La seduta è tolta alle 13.00) Aprobarea procesului-verbal al şedinţei precedente: a se vedea procesul-verbal Componenţa Parlamentului: a se vedea procesul-verbal Cerere de apărare a imunităţii parlamentare: consultaţi procesul-verbal Numiri în comisii (propunerea Conferinţei preşedinţilor): consultaţi procesul-verbal Depunerea documentelor: a se vedea procesul-verbal Transmiterea de către Consiliu a textelor acordurilor: a se vedea procesul-verbal" } }
|
||||
{ "translation": { "en": "Action taken on Parliament's resolutions: see Minutes Oral questions and written statements (tabling): see Minutes Written statements (Rule 116): see Minutes Agenda: see Minutes 1. Appointments to parliamentary committees (vote): see Minutes Voting time Agenda for next sitting: see Minutes Closure of sitting (The sitting was closed at 12 midnight) Opening of the sitting (The sitting was opened at 09.05) Documents received: see Minutes Approval of Minutes of previous sitting: see Minutes 1. Protection of passengers against displaced luggage (vote) 2.", "ro": "Continuări ale rezoluţiilor Parlamentului: consultaţi procesul-verbal Declaraţii scrise şi întrebări orale (depunere): consultaţi procesul-verbal Declaraţii scrise (articolul 116 din Regulamentul de procedură) Ordinea de zi: a se vedea procesul-verbal 1. Numiri în comisiile parlamentare (vot): consultaţi procesul-verbal Timpul afectat votului Ordinea de zi a următoarei şedinţe: a se vedea procesul-verbal Ridicarea şedinţei (La seduta è tolta alle 24.00) Deschiderea şedinţei (The sitting was opened at 09.05) Depunerea documentelor: a se vedea procesul-verbal Aprobarea procesului-verbal al şedinţei precedente: a se vedea procesul-verbal 1. Protecţia pasagerilor împotriva deplasării bagajelor (vot) 2." } }
|
||||
{ "translation": { "en": "Approval of motor vehicles with regard to the forward field of vision of the driver (vote) 3. EC-Korea Agreement on scientific and technological cooperation (vote) 4. Mainstreaming sustainability in development cooperation policies (vote) 5. Draft Amending Budget No 1/2007 (vote) 7. EC-Gabon Fisheries Partnership (vote) 10. Limitation periods in cross-border disputes involving personal injuries and fatal accidents (vote) 12. Strategy for a strengthened partnership with the Pacific Islands (vote) 13. The European private company statute (vote) That concludes the vote.", "ro": "Omologarea vehiculelor cu motor cu privire la câmpul de vizibilitate înainte al conducătorului auto (vot) 3. Acordul CE-Coreea de cooperare ştiinţifică şi tehnologică (vot) 4. Integrarea durabilităţii în politicile de cooperare pentru dezvoltare (vot) 5. Proiect de buget rectificativ nr.1/2007 (vot) 7. Acordul de parteneriat în domeniul pescuitului între Comunitatea Europeană şi Republica Gaboneză (vot) 10. Termenele de prescripţie aplicabile în cadrul litigiilor transfrontaliere cu privire la vătămările corporale şi accidentele mortale (vot) 12. Relaţiile UE cu insulele din Pacific: Strategie pentru un parteneriat consolidat (vot) 13. Statutul societăţii private europene (vot) Damit ist die Abstimmungsstunde beendet." } }
|
||||
{ "translation": { "en": "Corrections to votes and voting intentions: see Minutes Assignment conferred on a Member: see Minutes Membership of committees and delegations: see Minutes Decisions concerning certain documents: see Minutes Forwarding of texts adopted during the sitting: see Minutes Dates for next sittings: see Minutes", "ro": "Corectările voturilor şi intenţiile de vot: a se vedea procesul-verbal Misiune încredinţată unui deputat: consultaţi procesul-verbal Componenţa comisiilor şi a delegaţiilor: a se vedea procesul-verbal Decizii privind anumite documente: a se vedea procesul-verbal Transmiterea textelor adoptate în cursul prezentei şedinţe: a se vedea procesul-verbal Calendarul următoarelor şedinţe: a se vedea procesul-verbal" } }
|
||||
{ "translation": { "en": "Written statements for entry", "ro": "Declaraţii scrise înscrise" } }
|
||||
@@ -1,16 +0,0 @@
|
||||
{ "translation": { "en": "Brazil's Former Presidential Chief-of-Staff to Stand Trial A federal judge on Tuesday accepted the charges filed against Brazil's former presidential chief of staff for his alleged involvement in a massive corruption scheme at state-owned oil company Petrobras. The federal prosecutor's office said Jose Dirceu will face trial on the corruption, racketeering and money laundering charges filed earlier this month. Fourteen other people will also be tried, including Joao Vaccari Neto, the former treasurer of Brazil's governing Workers' Party and Renato de Souza Duque, Petrobras' former head of corporate services.", "ro": "Fostul șef al cabinetului prezidențial brazilian este adus în fața instanței Marți, un judecător federal a acceptat acuzațiile aduse împotriva fostului șef al cabinetului prezidențial brazilian pentru presupusa implicare a acestuia într-o schemă masivă de corupție privind compania petrolieră de stat Petrobras. Biroul procurorului federal a declarat că Jose Dirceu va fi trimis în judecată pentru acuzațiile de corupție, înșelătorie și spălare de bani aduse în această lună. Alte paisprezece persoane vor fi judecate, printre acestea numărându-se Joao Vaccari Neto, fostul trezorier al Partidului Muncitorilor, aflat la putere în Brazilia, și Renato de Souza Duque, fostul președinte al serviciilor pentru întreprinderi ale Petrobras." } }
|
||||
{ "translation": { "en": "Dirceu is the most senior member of the ruling Workers' Party to be taken into custody in connection with the scheme. Dirceu served as former President Luiz Inacio Lula da Silva's chief of staff between 2003 and 2005. He was arrested early August in his home, where he already was under house arrest serving an 11-year sentence for his involvement in a cash-for-votes scheme in Congress more than 10 years ago. Prosecutors have said that Dirceu masterminded the kickback scheme at Petrobras, accepted bribes while in office and continued to receive payments from contractors after he was jailed in late 2013 for the vote-buying scandal.", "ro": "Dirceu este cel mai vechi membru al Partidului Muncitorilor aflat la guvernare luat în custodie pentru legăturile cu această schemă. Dirceu a servit ca șef de cabinet al fostului președinte Luiz Inacio Lula da Silva între 2003 și 2005. A fost arestat la începutul lui august de acasă, unde deja se afla sub arest la domiciliu, cu o pedeapsă de 11 ani pentru implicarea într-o schemă de cumpărare a voturilor în Congres cu peste 10 ani în urmă. Procurorii au declarat că Dirceu a dezvoltat schema de luare de mită de la Petrobras, a acceptat mită în timp ce se afla în funcție și a continuat să primească plăți de la antreprenori după ce a fost închis la sfârșitul lui 2013 pentru scandalul voturilor cumpărate." } }
|
||||
{ "translation": { "en": "According to prosecutors, the scheme at Petrobras involved roughly $2 billion in bribes and other illegal funds. Some of that money was allegedly funneled back to campaign coffers of the ruling party and its allies. It also allegedly included the payment of bribes to Petrobras executives in return for inflated contracts. 'Miraculous' recovery for Peshawar massacre schoolboy A teenager paralysed after being shot four times in Pakistan's deadliest terror attack has made a \"miraculous\" recovery following treatment in the UK. Muhammad Ibrahim Khan, 13, had been told by doctors in Pakistan that he would never walk again.", "ro": "Conform procurorilor, schema de la Petrobras a implicat aproximativ 2 miliarde de dolari sub formă de mită și alte fonduri ilegale. O parte din acei bani s-ar fi întors în fondul de campanie al partidului aflat la guvernare și al aliaților acestora. De asemenea, ar fi inclus mită către directorii Petrobras în schimbul unor contracte umflate. Recuperarea „miraculoasă” a unui elev supraviețuitor al masacrului de la Peshawar Un adolescent paralizat după ce fusese împușcat de patru ori în cel mai cumplit atac terorist din Pakistan a reușit o recuperare „miraculoasă” după ce a urmat un tratament în Regatul Unit. Lui Mohamed Ibrahim Khan, în vârstă de 13 ani, doctorii din Pakistan îi spuseseră că nu va mai putea să meargă niciodată." } }
|
||||
{ "translation": { "en": "At least 140 people, mostly children, were killed when gunmen stormed Peshawar's Army Public School last December. Muhammad, who arrived in London last month for surgery, is being discharged from hospital later. Exactly nine months ago, on an ordinary Tuesday morning, Muhammad sat in his first aid class listening to his teachers intently. At the same time seven gunmen disguised in security uniforms were entering the Army Public School. They were strapped with explosives and had one simple mission in mind: Kill every man, woman and child they came across. \"I can't forget what happened that day,\" Muhammad says with a severe stare.", "ro": "Cel puțin 140 de persoane, majoritatea copii, au fost ucise când bărbați înarmați au atacat școala publică a armatei din Peshawar în luna decembrie a anului trecut. Mohamed, care a sosit la Londra luna trecută pentru operație, va fi externat mai târziu din spital. Exact cu nouă luni în urmă, într-o dimineață obișnuită de marți, Mohamed stătea la ora de primul ajutor și își asculta atent profesorii. Chiar atunci, șapte bărbați înarmați deghizați în uniformele agenților de pază intrau în școala publică a armatei. Purtau centuri cu explozivi și aveau de îndeplinit o misiune simplă: să îi ucidă pe toți bărbații, femeile și copiii care le ieșeau în cale. „Nu pot uita ce s-a întâmplat în acea zi”, spune Mohamed cu o privire aspră." } }
|
||||
{ "translation": { "en": "We were sitting in the auditorium, we were asking questions... and then we heard heavy gunfire outside. The terrorists moved inside and they started killing - our teacher was burned alive. Muhammad described pulling four other pupils out of the auditorium as the carnage unfolded. He said he then heard his friend, Hamza calling to him. He said, 'oh brother save me'. I held his hand. That's when I was shot in the back, and he was shot in the head. Most of the people killed in the attack were pupils Hamza died in Muhammad's arms. Muhammad recalled blacking out after that, and the next thing he knew he was in a hospital bed, paralysed from the waist down.", "ro": "Stăteam în amfiteatru, puneam întrebări... apoi am auzit focuri de armă afară. Teroriștii au intrat înăuntru și au început să ucidă. Profesorul nostru a fost ars de viu. Mohamed descrie cum a scos patru elevi din amfiteatru în timp ce se desfășura carnagiul. Apoi spune că și-a auzit prietenul, pe Hamza, strigându-l. Spunea „oh, frate, salvează-mă”. L-am ținut de mână. Atunci eu am fost împușcat în spate, iar el în cap. Cei mai mulți dintre cei uciși în atac erau elevi Hamza a murit în brațele lui Mohamed. Mohamed își amintește că imediat după asta a leșinat și că următorul lucru pe care l-a știut a fost că se afla pe un pat de spital, paralizat de la brâu în jos." } }
|
||||
{ "translation": { "en": "Doctors in Peshawar in northern Pakistan, and then Rawalpindi, close to the capital, told his family there was no treatment, and he would never walk again. \"Seeing him I felt like my soul had left my body,\" says Muhammad's father, Sher Khan Those nine months were the hardest in my life. But Mr Khan and his wife, Sherbano, refused to believe that their cricket-mad son would never be able to use his legs again. They campaigned, and appealed for help on Pakistani TV, gaining the support of high profile people such as cricketer turned politician Imran Khan.", "ro": "Doctorii din Peshawar din nordul Pakistanului, apoi cei din Rawalpindi, aproape de capitală, i-au spus familiei sale că nu exista tratament și că nu va mai putea merge niciodată. „Când l-am văzut, am simțit cum îmi iese sufletul”, spune Sher Khan, tatăl lui Mohamed. Acele nouă luni au fost cele mai grele din viața mea. Însă Khan și soția lui, Sherbano, au refuzat să creadă că fiul lor atât de pasionat de crichet nu-și va mai putea folosi vreodată picioarele. Au făcut o campanie și au cerut ajutor de la televiziunea pakistaneză, atrăgând sprijinul unor oameni faimoși precum Imran Khan, jucător de crichet devenit politician." } }
|
||||
{ "translation": { "en": "Finally, they were able to raise the funds to bring Muhammad to the UK and provide him with treatment at London's private Harley Street Clinic. Consultant neurosurgeon Irfan Malik described Muhammad as \"terrified\" when he first arrived at the hospital. \"He'd spent the last [few] months lying on a bed, unable to move side to side,\" says Mr Malik. He was weak, he had a pressure sore on his back. He wasn't in great shape. A vertebra at the base of Muhammad's spine was destroyed Muhammad was shot in his shoulder, his hip, and his back during the attack, damaging his lower spine - leading to paralysis.", "ro": "Într-un final, au reușit să strângă fonduri pentru a-l duce pe Mohamed în Regatul Unit și a-i oferi tratament la clinica privată Harley Street din Londra. Neurochirurgul consultant Irfan Malik l-a descris pe Mohamed drept „înspăimântat” când acesta a ajuns la spital. „Își petrecuse ultimele [câteva] luni zăcând în pat, fără să se poată mișca de pe o parte pe alta, spune Malik. Era slăbit, se pusese multă presiune pe spatele lui. Nu era într-o formă prea bună. O vertebră de la baza coloanei vertebrale a lui Mohamed fusese distrusă Mohamed fusese împușcat în umăr, în șold și în spate în timpul atacului, iar coloana vertebrală inferioară îi fusese distrusă, ducând la paralizie." } }
|
||||
{ "translation": { "en": "But during six hours of surgery, Mr Malik and his team were able to reattach nerve endings and reconstruct the damaged part of the spine. Even Mr Malik was surprised at what happened next. Exactly one week after the surgery Muhammad stood up and started taking steps and walking. We were not expecting to get that sort of excellent result. That was miraculous,\" he says. Less than two weeks after his operation, Muhammad is ready to leave hospital and start the long road to recovery. Muhammad has defied the odds and started to walk again He says he wants to build his strength and continue his education in the UK. But he says he is determined to return to Pakistan, join the army and help fight terrorism.", "ro": "Însă, în timpul unei operații care a durat șase ore, Malik și echipa lui au reușit să lege din nou terminațiile nervoase și să reconstruiască partea distrusă a coloanei. Chiar și Malik a fost surprins de ceea ce s-a întâmplat în continuare. Exact la o săptămână după operație, Mohamed s-a ridicat și a început să facă pași și să meargă. Nu ne așteptam la un rezultat atât de bun. A fost un miracol”, spune acesta. În mai puțin de două săptămâni de la operație, Mohamed este gata să părăsească spitalul și să înceapă procesul lung de recuperare. Mohamed a sfidat soarta și a început să meargă din nou Vrea să devină puternic și să își continue studiile în Regatul Unit. Însă este hotărât să revină în Pakistan, să se înroleze în armată și să lupte împotriva terorismului." } }
|
||||
{ "translation": { "en": "\"I feel like I have a second chance at life,\" he says as he shows off pictures he's drawn of guns scribbled out next to school books and pens Muhammad grows physically stronger every day but the psychological trauma he continues to endure is unimaginable. \"My anger is not diminishing\" he says. In my school little kids were killed. What was their crime? His mother, wiping a tear from her eye, caressed his head and said: \"I can see my son walking again.\" He'll be able to get on with his normal life. 'Super Voice' 4G service from Three offers better signal Three is making use of a lower frequency 4G spectrum that can travel more widely", "ro": "„Simt că am încă o șansă la viață” spune el, arătând imaginile cu arme desenate de el lângă manuale școlare și stilouri Fizic, Mohamed devine tot mai puternic în fiecare zi, însă trauma psihologică prin care trece și acum este de neimaginat. „Furia mea nu a scăzut”, mărturisește el. În școala mea au fost uciși copii mici. Ce crimă au comis ei? Mama lui își șterge o lacrimă, îl mângâie pe creștet și spune: „Îmi văd fiul mergând din nou”. Va putea să-și continue firesc viața. Serviciul 4G „Super Voice” de la Three oferă semnal mai bun Three folosește un spectru 4G cu o frecvență mai joasă, care poate acoperi o zonă mai extinsă" } }
|
||||
{ "translation": { "en": "Mobile phone provider Three has launched a UK service it says will improve reception inside buildings and in rural black spots. Its 4G Super Voice enables customers to make calls and send texts using a lower frequency spectrum. Other networks are looking into introducing the technology, known as Voice Over Long-Term Evolution (VoLTE). It currently works on only the Samsung Galaxy S5, but recent iPhone handsets will be added in the coming months. Three said up to 5.5 million customers would have access to the service by 2017.", "ro": "Furnizorul de telefonie mobilă Three a lansat în Regatul Unit un serviciu despre care spune că va îmbunătăți recepția în interiorul clădirilor și în zonele rurale fără semnal. Serviciul 4G Super Voice le permite clienților să efectueze apeluri și să trimită mesaje text folosind un spectru cu o frecvență mai joasă. Și alte rețele intenționează să introducă aceeași tehnologie, cunoscută ca „Voice Over Long-Term Evolution (VoLTE)”. Aceasta funcționează momentan doar cu Samsung Galaxy S5, însă telefoanele iPhone recente vor beneficia de ea în lunile următoare. Three menționează că până la 5,5 milioane de clienți vor avea acces la serviciu până în 2017." } }
|
||||
{ "translation": { "en": "Chief technology officer Bryn Jones said: \"By the end of the year, one million of our customers will have access to better indoor coverage and be able to use their phones in more places than ever before.\" Stars prepare for panto season Pantomime season is big business for theatres up and down the UK, with many getting ready for this year's season now. Some of the biggest names in showbusiness now take part in the yuletide theatre. Matthew Kelly and Hayley Mills will be appearing in Cinderella - one as an ugly sister, the other as fairy godmother. They reveal their panto secrets to BBC Breakfast. Steven Wilson: 'If I don't do anything, I feel this creeping guilt'", "ro": "Responsabilul șef pentru tehnologie, Bryn Jones a declarat: „Până la sfârșitul anului, un milion dintre clienții noștri vor avea acces la o acoperire mai bună în interior și își vor putea folosi telefoanele în mai multe locuri ca până acum”. Vedetele se pregătesc pentru stagiunea de pantomimă Stagiunea de pantomimă este foarte importantă pentru teatrele din tot Regatul Unit, multe dintre ele pregătindu-se acum pentru stagiunea din acest an. Acum, la teatrul de Crăciun participă unele dintre numele cele mai mari din showbusiness. Matthew Kelly și Hayley Mills vor apărea în Cenușăreasa - primul în rolul uneia dintre surorile rele, iar a doua în rolul zânei. Aceștia dezvăluie secretele pantomimei lor la BBC Breakfast. Steven Wilson: „Dacă nu fac nimic, mă simt vinovat”" } }
|
||||
{ "translation": { "en": "Steven Wilson was recently the big winner at the Progressive Music Awards Steven Wilson is often dubbed the hardest working musician in the world of progressive rock. The multi-talented musician won three prizes at this month's Progressive Music Awards in London, including album of the year for Hand. The Guardian's five-star review called it \"a smart, soulful and immersive work of art.\" Since the 1980s, Wilson has been the driving force in a number of musical projects, the best known of which is the rock band Porcupine Tree. Now, ahead of two sell-out shows at the Royal Albert Hall, Wilson is releasing a vinyl-only double LP, Transience, to showcase the \"more accessible\" side of his solo output.", "ro": "Steven Wilson a fost desemnat recent drept marele câștigător al Progressive Music Awards Steven Wilson a fost numit de multe ori drept cel mai muncitor muzician din lumea rockului progresiv. Talentatul muzician a câștigat trei premii la Progressive Music Awards, care a avut loc luna aceasta la Londra, printre care și premiul pentru cel mai bun album al anului pentru Hand. În recenzia sa de cinci stele, The Guardian a numit albumul „o operă de artă inteligentă, expresivă și captivantă”. Încă din anii 1980, Wilson este motorul mai multor proiecte muzicale, cel mai cunoscut dintre acestea fiind trupa de rock Porcupine Tree. Acum, înainte de două spectacole cu casa închisă la Royal Albert Hall, Wilson lansează un dublu LP doar în format vinil, Transience, pentru a arăta latura „mai accesibilă” a activității sale solo." } }
|
||||
{ "translation": { "en": "He tells the BBC about his love of vinyl, his busy schedule and explains how comic actor Matt Berry came to be his support act. What does vinyl mean to you? I grew up at the very tail end of the vinyl era, and at the time, I remember, we couldn't wait for CD to come along because vinyl was so frustrating. You would buy the record, take it home, and it would have a scratch, and you would have to take it back again. I love CDs, and for some kinds of music - classical for example - it is better than vinyl. But the problem with the CD and digital downloads is that there's nothing you can really cherish or treasure. Owning vinyl is like having a beautiful painting hanging in your living room.", "ro": "A povestit pentru BBC despre dragostea lui pentru viniluri și despre programul său încărcat și a explicat cum a ajuns actorul de comedie Matt Berry să îi deschidă spectacolele. Ce înseamnă vinil pentru tine? Am crescut chiar în perioada de sfârșit a erei vinilurilor și îmi amintesc că atunci abia așteptam apariția CD-ului, căci vinilul era atât de enervant. Cumpărai un disc, mergeai cu el acasă, avea o zgârietură și trebuia să îl aduci înapoi. Iubesc CD-urile, iar pentru anumite tipuri de muzică, de exemplu cea clasică, sunt mai bune decât vinilurile. Însă problema cu CD-urile și cu descărcările digitale este aceea că nu mai există nimic pe care să îl prețuiești cu adevărat. Să ai un vinil e ca și cum ai avea un tablou frumos agățat în sufragerie." } }
|
||||
{ "translation": { "en": "It's something you can hold, pore over the lyrics and immerse yourself in the art work. I thought it was just a nostalgic thing, but it can't be if kids too young to remember vinyl are enjoying that kind of experience. Do you have a piece of vinyl that you treasure? The truth is I got rid of 100% of my vinyl in the 90s. All the vinyl I have is re-bought. I started off from the perspective that I wanted to recreate the collection I had when I was 15, but it's gone beyond that. The first record which I persuaded my parents to buy for me was Electric Light Orchestra's Out of the Blue.", "ro": "E ceva ce poți ține în mână, în timp ce te lași absorbit de versuri și copleșit de actul artistic. Am crezut că e doar o chestie nostalgică, însă nu are cum să fie așa dacă unor puști prea tineri să-și amintească de viniluri le place acest gen de experiență. Ai vreun vinil la care ții în mod special? Recunosc că am scăpat de toate vinilurile în anii '90. Toate vinilurile pe care le am sunt cumpărate din nou. Am pornit de la ideea de a reface colecția pe care o aveam la 15 ani, însă am trecut de limita aceea. Primul disc pe care mi-am convins părinții să mi-l cumpere a fost Out of the Blue de la Electric Light Orchestra." } }
|
||||
{ "translation": { "en": "If I still had my original copy, it would have sentimental value, but, alas, it's in a charity shop somewhere. Steven Wilson hopes the album will be a doorway for potential new fans Why release your new compilation Transience on vinyl? It was originally conceived as an idea for Record Store Day, but we missed the boat on that. My record company had suggested I put together some of my shorter, more accessible songs. I got a bit obsessed by the idea to make something like \"an introduction to Steven Wilson,\" and I was committed to it being a vinyl-only release. Anyone who buys the vinyl does also get a high-resolution download.", "ro": "Dacă aș mai fi avut încă exemplarul inițial, acesta ar fi avut valoare sentimentală, însă, din păcate, se află pe undeva printr-un magazin de caritate. Steven Wilson speră că albumul va fi o poartă către posibili fani noi De ce ți-ai lansat noua compilație Transience pe vinil? Aceasta a fost concepută inițial ca idee pentru Ziua magazinelor de discuri, însă am ratat ocazia. Casa mea de discuri sugerase să adun câteva dintre melodiile mele mai scurte și mai accesibile. Am ajuns să fiu ușor obsedat de ideea de a face ceva gen „introducere în muzica lui Steven Wilson” și am ținut neapărat ca proiectul să fie lansat doar pe vinil. Cine cumpără vinilul primește, de asemenea, și o variantă descărcată la rezoluție înaltă." } }
|
||||
{ "translation": { "en": "Do you have a concern that the album won't show your work in a true light?", "ro": "Ești îngrijorat că albumul nu va arăta muzica ta în adevărata ei lumină?" } }
|
||||
Reference in New Issue
Block a user