Release: v3.3.1

Fix Trainer tests in a multiGPU env (#7458 )
Catch import datasets common errors (#7456 )
2020-09-29 14:17:34 -04:00 · 2020-09-29 14:06:41 -04:00 · 2020-09-29 13:42:09 -04:00 · 2020-09-29 13:38:47 -04:00 · 2020-09-29 12:26:26 -04:00 · 2020-09-29 10:41:18 -04:00
22 changed files with 661 additions and 114 deletions
--- a/.circleci/deploy.sh
+++ b/.circleci/deploy.sh
@@ -49,4 +49,5 @@ deploy_doc "10d7239" v2.10.0
 deploy_doc "b42586e" v2.11.0
 deploy_doc "7fb8bdf" v3.0.2
 deploy_doc "4b3ee9c" v3.1.0
-deploy_doc "3ebb1b3" # v3.2.0 Latest stable release
+deploy_doc "3ebb1b3" v3.2.0
+deploy_doc "0613f05" # v3.3.0 Latest stable release
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,129 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+feedback@huggingface.co.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+
+Community Impact Guidelines were inspired by [Mozilla's code of conduct
+enforcement ladder](https://github.com/mozilla/diversity).
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see the FAQ at
+https://www.contributor-covenant.org/faq. Translations are available at
+https://www.contributor-covenant.org/translations.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -9,6 +9,9 @@ It also helps us if you spread the word: reference the library from blog posts
 on the awesome projects it made possible, shout out on Twitter every time it has
 helped you, or simply star the repo to say "thank you".

+Whichever way you choose to contribute, please be mindful to respect our
+[code of conduct](https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md).
+
 ## You can contribute in so many ways!

 There are 4 ways you can contribute to transformers:
@@ -176,13 +179,14 @@ Follow these steps to start contributing:
   ```bash
   $ make quality
   ```
-
   You can do the automatic style corrections and code verifications that can't be automated in one go:

   ```bash
   $ make fixup
   ```

+   This target is also optimized to only work with files modified by the PR you're working on.
+
   If you're modifying documents under `docs/source`, make sure to validate that
   they can still be built. This check also runs in CI. To run a local check
   make sure you have installed the documentation builder requirements, by
--- a/40
+++ b/40
@@ -1,24 +1,46 @@
-.PHONY: quality_checks quality style fixup test test-examples docs
+.PHONY: modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs
+
+
+check_dirs := examples templates tests src utils
+
+# get modified files since the branch was made
+fork_point_sha := $(shell git merge-base --fork-point master)
+joined_dirs    := $(shell echo $(check_dirs) | tr " " "|")
+modified_files := $(shell git diff --name-only $(fork_point_sha) | egrep '^($(joined_dirs))')
+#$(info modified files are: $(modified_files))
+
+modified_only_fixup:
+	@if [ -n "$(modified_files)" ]; then \
+		echo "Checking/fixing $(modified_files)"; \
+		black $(modified_files); \
+		isort $(modified_files); \
+		flake8 $(modified_files); \
+	else \
+		echo "No relevant files were modified"; \
+	fi

 # Check that source code meets quality standards

-quality_checks:
-	flake8 examples templates tests src utils
+extra_quality_checks:
 	python utils/check_copies.py
 	python utils/check_repo.py

+# this target runs checks on all files
 quality:
-	black --check examples templates tests src utils
-	isort --check-only examples templates tests src utils
-	${MAKE} quality_checks
+	black --check $(check_dirs)
+	isort --check-only $(check_dirs)
+	flake8 $(check_dirs)
+	${MAKE} extra_quality_checks

 # Format source code automatically and check is there are any problems left that need manual fixing

 style:
-	black examples templates tests src utils
-	isort examples templates tests src utils
+	black $(check_dirs)
+	isort $(check_dirs)

-fixup: style quality_checks
+# Super fast fix and check target that only works on relevant modified files since the branch was made
+
+fixup: modified_only_fixup extra_quality_checks

 # Make marked copies of snippets of codes conform to the original

--- a/README.md
+++ b/README.md
@@ -16,6 +16,9 @@
    <a href="https://github.com/huggingface/transformers/releases">
        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
    </a>
+    <a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
+        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
+    </a>
 </p>

 <h3 align="center">
--- a/docs/source/_static/js/custom.js
+++ b/docs/source/_static/js/custom.js
@@ -1,10 +1,11 @@
 // These two things need to be updated at each release for the version selector.
 // Last stable version
-const stableVersion = "v3.2.0"
+const stableVersion = "v3.3.0"
 // Dictionary doc folder to label
 const versionMapping = {
    "master": "master",
-    "": "v3.2.0",
+    "": "v3.3.0",
+    "v3.2.0": "v3.2.0",
    "v3.1.0": "v3.1.0 (stable)",
    "v3.0.2": "v3.0.0/v3.0.1/v3.0.2",
    "v2.11.0": "v2.11.0",
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -26,7 +26,7 @@ author = u'huggingface'
 # The short X.Y version
 version = u''
 # The full version, including alpha/beta/rc tags
-release = u'3.3.0'
+release = u'3.3.1'


 # -- General configuration ---------------------------------------------------
--- a/examples/seq2seq/rouge_cli.py
+++ b/examples/seq2seq/rouge_cli.py
@@ -9,7 +9,7 @@ def calculate_rouge_path(pred_path, tgt_path, save_path=None, **kwargs):
    tgt_lns = [x.strip() for x in open(tgt_path).readlines()][: len(pred_lns)]
    metrics = calculate_rouge(pred_lns, tgt_lns, **kwargs)
    if save_path is not None:
-        save_json(metrics, save_path)
+        save_json(metrics, save_path, indent=None)
    return metrics  # these print nicely


--- a/examples/seq2seq/run_eval.py
+++ b/examples/seq2seq/run_eval.py
@@ -152,8 +152,7 @@ def run_generate(verbose=True):
        print(scores)

    if args.score_path is not None:
-        path = args.score_path
-        json.dump(scores, open(path, "w"))
+        json.dump(scores, open(args.score_path, "w"))

    return scores

--- a/model_cards/TypicaAI/magbert-ner/README.md
+++ b/model_cards/TypicaAI/magbert-ner/README.md
@@ -0,0 +1,55 @@
+---
+language: fr
+widget:
+- text: "Je m'appelle Hicham et je vis a Fès"
+---
+
+# MagBERT-NER: a state-of-the-art NER model for Moroccan French language (Maghreb)
+
+## Introduction
+
+[MagBERT-NER] is a state-of-the-art NER model for Moroccan French language (Maghreb). The MagBERT-NER model was fine-tuned for NER Task based the language model for French Camembert (based on the RoBERTa architecture).
+
+For further information or requests, please go to [Typica.AI Website](https://typicasoft.io/)
+
+## How to use MagBERT-NER with HuggingFace
+
+##### Load MagBERT-NER and its sub-word tokenizer :
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+
+tokenizer = AutoTokenizer.from_pretrained("TypicaAI/magbert-ner")
+model = AutoModelForTokenClassification.from_pretrained("TypicaAI/magbert-ner")
+
+
+##### Process text sample (from wikipedia about the current Prime Minister of Morocco) Using NER pipeline  
+
+from transformers import pipeline
+
+nlp = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
+nlp("Saad Dine El Otmani, né le 16 janvier 1956 à Inezgane, est un homme d'État marocain, chef du gouvernement du Maroc depuis le 5 avril 2017")
+
+
+#[{'entity_group': 'I-PERSON',
+#  'score': 0.8941445276141167,
+#  'word': 'Saad Dine El Otmani'},
+# {'entity_group': 'B-DATE',
+#  'score': 0.5967703461647034,
+#  'word': '16 janvier 1956'},
+# {'entity_group': 'B-GPE', 'score': 0.7160899192094803, 'word': 'Inezgane'},
+# {'entity_group': 'B-NORP', 'score': 0.7971733212471008, 'word': 'marocain'},
+# {'entity_group': 'B-GPE', 'score': 0.8921478390693665, 'word': 'Maroc'},
+# {'entity_group': 'B-DATE',
+#  'score': 0.5760444005330404,
+#  'word': '5 avril 2017'}]
+
+```
+
+```
+
+
+## Authors 
+
+MagBert-NER was trained and evaluated by Hicham Assoudi, Ph.D.
+
+
--- a/model_cards/mrm8488/electricidad-base-discriminator/README.md
+++ b/model_cards/mrm8488/electricidad-base-discriminator/README.md
@@ -59,9 +59,21 @@ predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
 el rapido  zorro  marro    ##n   amar  sobre     el  perro   pere ##zoso    0.0    0.0    0.0    0.0    0.0    0.0    1.0    1.0    0.0    0.0    0.0    0.0    0.0[None, None, None, None, None, None, None, None, None, None, None, None, None
 '''
 ```
-
 As you can see there are **1s** in the places where the model detected a fake token. So, it works! 🎉

+
+### Some models fine-tuned on a downstream task 🛠️
+
+[Question Answering](https://huggingface.co/mrm8488/electricidad-base-finetuned-squadv1-es)
+
+[POS](https://huggingface.co/mrm8488/electricidad-base-finetuned-pos)
+
+[NER](https://huggingface.co/mrm8488/electricidad-base-finetuned-ner)
+
+[Paraphrase Identification](https://huggingface.co/mrm8488/RuPERTa-base-finetuned-pawsx-es)
+
+
+
 ## Acknowledgments

 I thank [🤗/transformers team](https://github.com/huggingface/transformers) for allowing me to train the model (specially to [Julien Chaumond](https://twitter.com/julien_c)).
--- a/model_cards/unideeplearning/polibert_sa/README.md
+++ b/model_cards/unideeplearning/polibert_sa/README.md
@@ -12,15 +12,22 @@ widget:
  
 ## Model description  
  
-This model performs sentiment analysis on Italian political twitter sentences. It was trained starting from an instance of "bert-base-italian-uncased-xxl" and fine-tuned on an Italian dataset of tweets.
+This model performs sentiment analysis on Italian political twitter sentences. It was trained starting from an instance of "bert-base-italian-uncased-xxl" and fine-tuned on an Italian dataset of tweets. You can try it out at https://www.unideeplearning.com/twitter_sa/ (in italian!)
  
 #### Hands-on  
  
 ```python
 import torch
 from torch import nn 
+from transformers import AutoTokenizer, AutoModelForSequenceClassification

-text = "Giueseppe Rossi è un pessimo politico"
+tokenizer = AutoTokenizer.from_pretrained("unideeplearning/polibert_sa")
+model = AutoModelForSequenceClassification.from_pretrained("unideeplearning/polibert_sa")
+			
+
+
+
+text = "Giuseppe Rossi è un pessimo politico"
 input_ids = tokenizer.encode(text, add_special_tokens=True, return_tensors= 'pt')

 logits, = model(input_ids)
@@ -41,4 +48,6 @@ print(prob.argmax().tolist())
 ## Acknowledgments

 Thanks to the support from: 
-the [Hugging Face](https://huggingface.co/), Unione Professionisti (https://www.unioneprofessionisti.com/)
+the [Hugging Face](https://huggingface.co/), https://www.unioneprofessionisti.com
+
+https://www.unideeplearning.com/
--- a/setup.py
+++ b/setup.py
@@ -5,7 +5,7 @@ To create the package for pypi.

 1. Change the version in __init__.py, setup.py as well as docs/source/conf.py.

-2. Unpin specific versions from setup.py (like isort).
+2. Unpin specific versions from setup.py that use a git install.

 2. Commit these changes with the message: "Release: VERSION"

@@ -98,7 +98,7 @@ extras["dev"] = extras["testing"] + extras["quality"] + extras["ja"] + ["scikit-

 setup(
    name="transformers",
-    version="3.3.0",
+    version="3.3.1",
    author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors",
    author_email="thomas@huggingface.co",
    description="State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch",
--- a/src/transformers/init.py
+++ b/src/transformers/init.py
@@ -2,7 +2,7 @@
 # There's no way to ignore "F401 '...' imported but unused" warnings in this
 # module, but to preserve other warnings. So, don't check this module at all.

-__version__ = "3.3.0"
+__version__ = "3.3.1"

 # Work around to update TensorFlow's absl.logging threshold which alters the
 # default Python logging output behavior when present.
--- a/src/transformers/configuration_gpt2.py
+++ b/src/transformers/configuration_gpt2.py
@@ -103,6 +103,8 @@ class GPT2Config(PretrainedConfig):
            :class:`~transformers.GPT2DoubleHeadsModel` and :class:`~transformers.TFGPT2DoubleHeadsModel`.

            The dropout ratio to be used after the projection and activation.
+        gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.

    Example::

@@ -142,6 +144,7 @@ class GPT2Config(PretrainedConfig):
        summary_first_dropout=0.1,
        bos_token_id=50256,
        eos_token_id=50256,
+        gradient_checkpointing=False,
        **kwargs
    ):
        super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
@@ -164,6 +167,7 @@ class GPT2Config(PretrainedConfig):
        self.summary_activation = summary_activation
        self.summary_first_dropout = summary_first_dropout
        self.summary_proj_to_labels = summary_proj_to_labels
+        self.gradient_checkpointing = gradient_checkpointing

        self.bos_token_id = bos_token_id
        self.eos_token_id = eos_token_id
--- a/src/transformers/file_utils.py
+++ b/src/transformers/file_utils.py
@@ -68,8 +68,12 @@ except (ImportError, AssertionError):
 try:
    import datasets  # noqa: F401

-    _datasets_available = True
-    logger.debug(f"Succesfully imported datasets version {datasets.__version__}")
+    # Check we're not importing a "datasets" directory somewhere
+    _datasets_available = hasattr(datasets, "__version__") and hasattr(datasets, "load_dataset")
+    if _datasets_available:
+        logger.debug(f"Succesfully imported datasets version {datasets.__version__}")
+    else:
+        logger.debug("Imported a datasets object but this doesn't seem to be the 🤗 datasets library.")

 except ImportError:
    _datasets_available = False
--- a/src/transformers/modeling_gpt2.py
+++ b/src/transformers/modeling_gpt2.py
@@ -15,7 +15,6 @@
 # limitations under the License.
 """PyTorch OpenAI GPT-2 model."""

-
 import os
 import warnings
 from dataclasses import dataclass
@@ -624,16 +623,35 @@ class GPT2Model(GPT2PreTrainedModel):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states.view(*output_shape),)

-            outputs = block(
-                hidden_states,
-                layer_past=layer_past,
-                attention_mask=attention_mask,
-                head_mask=head_mask[i],
-                encoder_hidden_states=encoder_hidden_states,
-                encoder_attention_mask=encoder_attention_mask,
-                use_cache=use_cache,
-                output_attentions=output_attentions,
-            )
+            if getattr(self.config, "gradient_checkpointing", False):
+
+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        # checkpointing only works with tuple returns, not with lists
+                        return tuple(output for output in module(*inputs, use_cache, output_attentions))
+
+                    return custom_forward
+
+                outputs = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(block),
+                    hidden_states,
+                    layer_past,
+                    attention_mask,
+                    head_mask[i],
+                    encoder_hidden_states,
+                    encoder_attention_mask,
+                )
+            else:
+                outputs = block(
+                    hidden_states,
+                    layer_past=layer_past,
+                    attention_mask=attention_mask,
+                    head_mask=head_mask[i],
+                    encoder_hidden_states=encoder_hidden_states,
+                    encoder_attention_mask=encoder_attention_mask,
+                    use_cache=use_cache,
+                    output_attentions=output_attentions,
+                )

            hidden_states, present = outputs[:2]
            if use_cache is True:
--- a/src/transformers/trainer.py
+++ b/src/transformers/trainer.py
@@ -20,7 +20,7 @@ from torch.utils.data.sampler import RandomSampler, Sampler, SequentialSampler
 from tqdm.auto import tqdm, trange

 from .data.data_collator import DataCollator, DataCollatorWithPadding, default_data_collator
-from .file_utils import is_datasets_available, is_torch_tpu_available
+from .file_utils import WEIGHTS_NAME, is_datasets_available, is_torch_tpu_available
 from .integrations import (
    default_hp_search_backend,
    is_comet_available,
@@ -42,6 +42,7 @@ from .trainer_utils import (
    EvaluationStrategy,
    HPSearchBackend,
    PredictionOutput,
+    TrainerState,
    TrainOutput,
    default_compute_objective,
    default_hp_space,
@@ -642,6 +643,7 @@ class Trainer:
            self.args.max_steps = t_total

        self.create_optimizer_and_scheduler(num_training_steps=t_total)
+        self.state = TrainerState()

        # Check if saved optimizer or scheduler states exist
        if (
@@ -657,6 +659,10 @@ class Trainer:
                self.lr_scheduler.load_state_dict(torch.load(os.path.join(model_path, "scheduler.pt")))
            reissue_pt_warnings(caught_warnings)

+        # Check if a saved Trainer state exist
+        if model_path is not None and os.path.isfile(os.path.join(model_path, "trainer_state.json")):
+            self.state = TrainerState.load_from_json(os.path.join(model_path, "trainer_state.json"))
+
        model = self.model
        if self.args.fp16 and _use_apex:
            if not is_apex_available():
@@ -673,8 +679,10 @@ class Trainer:
                model,
                device_ids=[self.args.local_rank],
                output_device=self.args.local_rank,
-                find_unused_parameters=True,
+                find_unused_parameters=not getattr(model.config, "gradient_checkpointing", False),
            )
+        # find_unused_parameters breaks checkpointing as per
+        # https://github.com/huggingface/transformers/pull/4659#issuecomment-643356021

        if self.tb_writer is not None:
            self.tb_writer.add_text("args", self.args.to_json_string())
@@ -803,44 +811,15 @@ class Trainer:
                    ):
                        metrics = self.evaluate()
                        self._report_to_hp_search(trial, epoch, metrics)
+                        if self.args.load_best_model_at_end:
+                            self._save_training(model, trial, metrics=metrics)

-                    if self.args.save_steps > 0 and self.global_step % self.args.save_steps == 0:
-                        # In all cases (even distributed/parallel), self.model is always a reference
-                        # to the model we want to save.
-                        if hasattr(model, "module"):
-                            assert (
-                                model.module is self.model
-                            ), f"Module {model.module} should be a reference to self.model"
-                        else:
-                            assert model is self.model, f"Model {model} should be a reference to self.model"
-                        # Save model checkpoint
-                        checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}"
-                        if self.hp_search_backend is not None and trial is not None:
-                            run_id = (
-                                trial.number
-                                if self.hp_search_backend == HPSearchBackend.OPTUNA
-                                else tune.get_trial_id()
-                            )
-                            checkpoint_folder += f"-run-{run_id}"
-                        output_dir = os.path.join(self.args.output_dir, checkpoint_folder)
-
-                        self.store_flos()
-                        self.save_model(output_dir)
-
-                        if self.is_world_process_zero():
-                            self._rotate_checkpoints(use_mtime=True)
-
-                        if is_torch_tpu_available():
-                            xm.rendezvous("saving_optimizer_states")
-                            xm.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
-                            with warnings.catch_warnings(record=True) as caught_warnings:
-                                xm.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
-                            reissue_pt_warnings(caught_warnings)
-                        elif self.is_world_process_zero():
-                            torch.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
-                            with warnings.catch_warnings(record=True) as caught_warnings:
-                                torch.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
-                            reissue_pt_warnings(caught_warnings)
+                    if (
+                        not self.args.load_best_model_at_end
+                        and self.args.save_steps > 0
+                        and self.global_step % self.args.save_steps == 0
+                    ):
+                        self._save_training(model, trial)

                epoch_pbar.update(1)
                if self.args.max_steps > 0 and self.global_step >= self.args.max_steps:
@@ -851,6 +830,8 @@ class Trainer:
            if self.args.evaluation_strategy == EvaluationStrategy.EPOCH:
                metrics = self.evaluate()
                self._report_to_hp_search(trial, epoch, metrics)
+                if self.args.load_best_model_at_end:
+                    self._save_training(model, trial, metrics=metrics)

            if self.args.tpu_metrics_debug or self.args.debug:
                if is_torch_tpu_available():
@@ -872,8 +853,73 @@ class Trainer:
            delattr(self, "_past")

        logger.info("\n\nTraining completed. Do not forget to share your model on huggingface.co/models =)\n\n")
+        if self.args.load_best_model_at_end and self.state.best_model_checkpoint is not None:
+            logger.info(
+                f"Loading best model from {self.state.best_model_checkpoint} (score: {self.state.best_metric})."
+            )
+            if isinstance(model, PreTrainedModel):
+                self.model = model.from_pretrained(self.state.best_model_checkpoint)
+                self.model = self.model.to(self.args.device)
+            else:
+                state_dict = torch.load(os.path.join(self.state.best_model_checkpoint, WEIGHTS_NAME))
+                self.model.load_state_dict(state_dict)
+
        return TrainOutput(self.global_step, tr_loss.item() / self.global_step)

+    def _save_training(self, model, trial, metrics=None):
+        # In all cases (even distributed/parallel), self.model is always a reference
+        # to the model we want to save.
+        if hasattr(model, "module"):
+            assert model.module is self.model, f"Module {model.module} should be a reference to self.model"
+        else:
+            assert model is self.model, f"Model {model} should be a reference to self.model"
+        # Save model checkpoint
+        checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}"
+        if self.hp_search_backend is not None and trial is not None:
+            run_id = trial.number if self.hp_search_backend == HPSearchBackend.OPTUNA else tune.get_trial_id()
+            checkpoint_folder += f"-run-{run_id}"
+        output_dir = os.path.join(self.args.output_dir, checkpoint_folder)
+
+        self.store_flos()
+        self.save_model(output_dir)
+
+        # Save optimizer and scheduler
+        if is_torch_tpu_available():
+            xm.rendezvous("saving_optimizer_states")
+            xm.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
+            with warnings.catch_warnings(record=True) as caught_warnings:
+                xm.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
+                reissue_pt_warnings(caught_warnings)
+        elif self.is_world_process_zero():
+            torch.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
+            with warnings.catch_warnings(record=True) as caught_warnings:
+                torch.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
+            reissue_pt_warnings(caught_warnings)
+
+        # Determine the new best metric / best model checkpoint
+        if metrics is not None:
+            metric_to_check = self.args.metric_for_best_model
+            if not metric_to_check.startswith("eval_"):
+                metric_to_check = f"eval_{metric_to_check}"
+            metric_value = metrics[metric_to_check]
+
+            operator = np.greater if self.args.greater_is_better else np.less
+            if (
+                self.state.best_metric is None
+                or self.state.best_model_checkpoint is None
+                or operator(metric_value, self.state.best_metric)
+            ):
+                self.state.best_metric = metric_value
+                self.state.best_model_checkpoint = output_dir
+
+        # Save the Trainer state
+        if self.is_world_process_zero():
+            self.state.save_to_json(os.path.join(output_dir, "trainer_state.json"))
+
+        # Maybe delete some older checkpoints.
+        if self.is_world_process_zero():
+            self._rotate_checkpoints(use_mtime=True)
+
    def hyperparameter_search(
        self,
        hp_space: Optional[Callable[["optuna.Trial"], Dict[str, float]]] = None,
@@ -1164,11 +1210,13 @@ class Trainer:

        # Save a trained model and configuration using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
-        if not isinstance(self.model, PreTrainedModel):
-            raise ValueError("Trainer.model appears to not be a PreTrainedModel")
-
        xm.rendezvous("saving_checkpoint")
-        self.model.save_pretrained(output_dir)
+        if not isinstance(self.model, PreTrainedModel):
+            logger.info("Trainer.model is not a `PreTrainedModel`, only saving its state dict.")
+            state_dict = self.model.state_dict()
+            xm.save(state_dict, os.path.join(output_dir, WEIGHTS_NAME))
+        else:
+            self.model.save_pretrained(output_dir)
        if self.tokenizer is not None:
            self.tokenizer.save_pretrained(output_dir)

@@ -1179,8 +1227,11 @@ class Trainer:
        # Save a trained model and configuration using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
        if not isinstance(self.model, PreTrainedModel):
-            raise ValueError("Trainer.model appears to not be a PreTrainedModel")
-        self.model.save_pretrained(output_dir)
+            logger.info("Trainer.model is not a `PreTrainedModel`, only saving its state dict.")
+            state_dict = self.model.state_dict()
+            torch.save(state_dict, os.path.join(output_dir, WEIGHTS_NAME))
+        else:
+            self.model.save_pretrained(output_dir)
        if self.tokenizer is not None:
            self.tokenizer.save_pretrained(output_dir)

@@ -1215,6 +1266,13 @@ class Trainer:

        checkpoints_sorted = sorted(ordering_and_checkpoint_path)
        checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
+        # Make sure we don't delete the best model.
+        if self.state.best_model_checkpoint is not None:
+            best_model_index = checkpoints_sorted.index(self.state.best_model_checkpoint)
+            checkpoints_sorted[best_model_index], checkpoints_sorted[best_model_index][-1] = (
+                checkpoints_sorted[-1],
+                checkpoints_sorted[best_model_index],
+            )
        return checkpoints_sorted

    def _rotate_checkpoints(self, use_mtime=False) -> None:
--- a/src/transformers/trainer_utils.py
+++ b/src/transformers/trainer_utils.py
@@ -1,4 +1,7 @@
+import dataclasses
+import json
 import random
+from dataclasses import dataclass
 from typing import Any, Dict, List, NamedTuple, Optional, Tuple, Union

 import numpy as np
@@ -213,3 +216,26 @@ def distributed_broadcast_scalars(
            raise AssertionError("Not currently using distributed training")
    else:
        raise ImportError("Torch must be installed to use `distributed_broadcast_scalars`")
+
+
+@dataclass
+class TrainerState:
+    """
+    A class containing the `Trainer` fields that will be saved along the model and optimizer.
+    """
+
+    best_metric: Optional[float] = None
+    best_model_checkpoint: Optional[str] = None
+
+    def save_to_json(self, json_path: str):
+        """ Save the content of this instance in JSON format inside :obj:`json_path`."""
+        json_string = json.dumps(dataclasses.asdict(self), indent=2, sort_keys=True) + "\n"
+        with open(json_path, "w", encoding="utf-8") as f:
+            f.write(json_string)
+
+    @classmethod
+    def load_from_json(cls, json_path: str):
+        """ Create an instance from the content of :obj:`json_path`."""
+        with open(json_path, "r", encoding="utf-8") as f:
+            text = f.read()
+        return cls(**json.loads(text))
--- a/src/transformers/training_args.py
+++ b/src/transformers/training_args.py
@@ -145,6 +145,28 @@ class TrainingArguments:
            Will eventually default to :obj:`["labels"]` except if the model used is one of the
            :obj:`XxxForQuestionAnswering` in which case it will default to
            :obj:`["start_positions", "end_positions"]`.
+        load_best_model_at_end (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether or not to load the best model found during training at the end of training.
+
+            .. note::
+
+                When set to :obj:`True`, the parameters :obj:`save_steps` will be ignored and the model will be saved
+                after each evaluation.
+        metric_for_best_model (:obj:`str`, `optional`)
+            Use in conjunction with :obj:`load_best_model_at_end` to specify the metric to use to compare two different
+            models. Must be the name of a metric returned by the evaluation with or without the prefix :obj:`"eval_"`.
+            Will default to :obj:`"loss"` if unspecified and :obj:`load_best_model_at_end=True` (to use the evaluation
+            loss).
+
+            If you set this value, :obj:`greater_is_better` will defaut to :obj:`True`. Don't forget to set it to
+            :obj:`False` if your metric is better when lower.
+        greater_is_better (:obj:`bool`, `optional`)
+            Use in conjunction with :obj:`load_best_model_at_end` and :obj:`metric_for_best_model` to specify if better
+            models should have a greater metric or not. Will default to:
+
+            - :obj:`True` if :obj:`metric_for_best_model` is set to a value that isn't :obj:`"loss"` or
+              :obj:`"eval_loss"`.
+            - :obj:`False` if :obj:`metric_for_best_model` is not set, or set to :obj:`"loss"` or :obj:`"eval_loss"`.
    """

    output_dir: str = field(
@@ -287,6 +309,17 @@ class TrainingArguments:
        default=None, metadata={"help": "The list of keys in your dictionary of inputs that correspond to the labels."}
    )

+    load_best_model_at_end: Optional[bool] = field(
+        default=False,
+        metadata={"help": "Whether or not to load the best model found during training at the end of training."},
+    )
+    metric_for_best_model: Optional[str] = field(
+        default=None, metadata={"help": "The metric to use to compare two different models."}
+    )
+    greater_is_better: Optional[bool] = field(
+        default=None, metadata={"help": "Whether the `metric_for_best_model` should be maximized or not."}
+    )
+
    def __post_init__(self):
        if self.disable_tqdm is None:
            self.disable_tqdm = logger.getEffectiveLevel() > logging.WARN
@@ -304,6 +337,11 @@ class TrainingArguments:
        if self.eval_steps is None:
            self.eval_steps = self.logging_steps

+        if self.load_best_model_at_end and self.metric_for_best_model is None:
+            self.metric_for_best_model = "loss"
+        if self.greater_is_better is None and self.metric_for_best_model is not None:
+            self.greater_is_better = self.metric_for_best_model not in ["loss", "eval_loss"]
+
    @property
    def train_batch_size(self) -> int:
        """
--- a/tests/test_modeling_gpt2.py
+++ b/tests/test_modeling_gpt2.py
@@ -88,7 +88,7 @@ class GPT2ModelTester:
        self.bos_token_id = vocab_size - 1
        self.eos_token_id = vocab_size - 1

-    def prepare_config_and_inputs(self):
+    def prepare_config_and_inputs(self, gradient_checkpointing=False):
        input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)

        input_mask = None
@@ -127,6 +127,7 @@ class GPT2ModelTester:
            bos_token_id=self.bos_token_id,
            eos_token_id=self.eos_token_id,
            return_dict=True,
+            gradient_checkpointing=gradient_checkpointing,
        )

        head_mask = ids_tensor([self.num_hidden_layers, self.num_attention_heads], 2)
@@ -269,6 +270,15 @@ class GPT2ModelTester:
        self.parent.assertEqual(result.loss.shape, ())
        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))

+    def create_and_check_forward_and_backwards(self, config, input_ids, input_mask, head_mask, token_type_ids, *args):
+        model = GPT2LMHeadModel(config)
+        model.to(torch_device)
+
+        result = model(input_ids, token_type_ids=token_type_ids, labels=input_ids)
+        self.parent.assertEqual(result.loss.shape, ())
+        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
+        result.loss.backward()
+
    def create_and_check_double_lm_head_model(
        self, config, input_ids, input_mask, head_mask, token_type_ids, mc_token_ids, *args
    ):
@@ -355,6 +365,10 @@ class GPT2ModelTest(ModelTesterMixin, unittest.TestCase):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_double_lm_head_model(*config_and_inputs)

+    def test_gpt2_gradient_checkpointing(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs(gradient_checkpointing=True)
+        self.model_tester.create_and_check_forward_and_backwards(*config_and_inputs)
+
    @slow
    def test_model_from_pretrained(self):
        for model_name in GPT2_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
@@ -366,33 +380,34 @@ class GPT2ModelTest(ModelTesterMixin, unittest.TestCase):
 class GPT2ModelLanguageGenerationTest(unittest.TestCase):
    @slow
    def test_lm_generate_gpt2(self):
-        model = GPT2LMHeadModel.from_pretrained("gpt2")
-        model.to(torch_device)
-        input_ids = torch.tensor([[464, 3290]], dtype=torch.long, device=torch_device)  # The dog
-        expected_output_ids = [
-            464,
-            3290,
-            373,
-            1043,
-            287,
-            257,
-            2214,
-            1474,
-            262,
-            16246,
-            286,
-            2688,
-            290,
-            2688,
-            27262,
-            13,
-            198,
-            198,
-            464,
-            3290,
-        ]  # The dog was found in a field near the intersection of West and West Streets.\n\nThe dog
-        output_ids = model.generate(input_ids, do_sample=False)
-        self.assertListEqual(output_ids[0].tolist(), expected_output_ids)
+        for checkpointing in [True, False]:
+            model = GPT2LMHeadModel.from_pretrained("gpt2", gradient_checkpointing=checkpointing)
+            model.to(torch_device)
+            input_ids = torch.tensor([[464, 3290]], dtype=torch.long, device=torch_device)  # The dog
+            expected_output_ids = [
+                464,
+                3290,
+                373,
+                1043,
+                287,
+                257,
+                2214,
+                1474,
+                262,
+                16246,
+                286,
+                2688,
+                290,
+                2688,
+                27262,
+                13,
+                198,
+                198,
+                464,
+                3290,
+            ]  # The dog was found in a field near the intersection of West and West Streets.\n\nThe dog
+            output_ids = model.generate(input_ids, do_sample=False)
+            self.assertListEqual(output_ids[0].tolist(), expected_output_ids)

    @slow
    def test_lm_generate_distilgpt2(self):
--- a/tests/test_trainer.py
+++ b/tests/test_trainer.py
@@ -1,9 +1,13 @@
+import json
+import os
+import tempfile
 import unittest

 import datasets
 import numpy as np

-from transformers import AutoTokenizer, TrainingArguments, is_torch_available
+from transformers import AutoTokenizer, PretrainedConfig, TrainingArguments, is_torch_available
+from transformers.file_utils import WEIGHTS_NAME
 from transformers.testing_utils import get_tests_dir, require_torch, slow


@@ -16,6 +20,7 @@ if is_torch_available():
        GlueDataset,
        GlueDataTrainingArguments,
        LineByLineTextDataset,
+        PreTrainedModel,
        Trainer,
    )

@@ -51,6 +56,14 @@ class AlmostAccuracy:
        return {"accuracy": true.astype(np.float32).mean().item()}


+class RegressionModelConfig(PretrainedConfig):
+    def __init__(self, a=0, b=0, double_output=False, **kwargs):
+        super().__init__(**kwargs)
+        self.a = a
+        self.b = b
+        self.double_output = double_output
+
+
 if is_torch_available():

    class SampleIterableDataset(IterableDataset):
@@ -79,15 +92,37 @@ if is_torch_available():
            loss = torch.nn.functional.mse_loss(y, labels)
            return (loss, y, y) if self.double_output else (loss, y)

-    def get_regression_trainer(a=0, b=0, double_output=False, train_len=64, eval_len=64, **kwargs):
+    class RegressionPreTrainedModel(PreTrainedModel):
+        config_class = RegressionModelConfig
+        base_model_prefix = "regression"
+
+        def __init__(self, config):
+            super().__init__(config)
+            self.a = torch.nn.Parameter(torch.tensor(config.a).float())
+            self.b = torch.nn.Parameter(torch.tensor(config.b).float())
+            self.double_output = config.double_output
+
+        def forward(self, input_x=None, labels=None, **kwargs):
+            y = input_x * self.a + self.b
+            if labels is None:
+                return (y, y) if self.double_output else (y,)
+            loss = torch.nn.functional.mse_loss(y, labels)
+            return (loss, y, y) if self.double_output else (loss, y)
+
+    def get_regression_trainer(a=0, b=0, double_output=False, train_len=64, eval_len=64, pretrained=True, **kwargs):
        label_names = kwargs.get("label_names", None)
        train_dataset = RegressionDataset(length=train_len, label_names=label_names)
        eval_dataset = RegressionDataset(length=eval_len, label_names=label_names)
-        model = RegressionModel(a, b, double_output)
+        if pretrained:
+            config = RegressionModelConfig(a=a, b=b, double_output=double_output)
+            model = RegressionPreTrainedModel(config)
+        else:
+            model = RegressionModel(a=a, b=b, double_output=double_output)
        compute_metrics = kwargs.pop("compute_metrics", None)
        data_collator = kwargs.pop("data_collator", None)
        optimizers = kwargs.pop("optimizers", (None, None))
-        args = TrainingArguments("./regression", **kwargs)
+        output_dir = kwargs.pop("output_dir", "./regression")
+        args = TrainingArguments(output_dir, **kwargs)
        return Trainer(
            model,
            args,
@@ -119,6 +154,40 @@ class TrainerIntegrationTest(unittest.TestCase):
        self.assertTrue(torch.allclose(model.a, a))
        self.assertTrue(torch.allclose(model.b, b))

+    def check_saved_checkpoints(self, output_dir, freq, total, is_pretrained=True):
+        file_list = [WEIGHTS_NAME, "training_args.bin", "log_history.json", "optimizer.pt", "scheduler.pt"]
+        if is_pretrained:
+            file_list.append("config.json")
+        for step in range(freq, total, freq):
+            checkpoint = os.path.join(output_dir, f"checkpoint-{step}")
+            self.assertTrue(os.path.isdir(checkpoint))
+            for filename in file_list:
+                self.assertTrue(os.path.isfile(os.path.join(checkpoint, filename)))
+
+    def check_best_model_has_been_loaded(
+        self, output_dir, freq, total, trainer, metric, greater_is_better=False, is_pretrained=True
+    ):
+        checkpoint = os.path.join(output_dir, f"checkpoint-{(total // freq) * freq}")
+        log_history = json.load(open(os.path.join(checkpoint, "log_history.json")))
+
+        values = [d[metric] for d in log_history]
+        best_value = max(values) if greater_is_better else min(values)
+        best_checkpoint = (values.index(best_value) + 1) * freq
+        checkpoint = os.path.join(output_dir, f"checkpoint-{best_checkpoint}")
+        if is_pretrained:
+            best_model = RegressionPreTrainedModel.from_pretrained(checkpoint)
+            best_model.to(trainer.args.device)
+        else:
+            best_model = RegressionModel()
+            state_dict = torch.load(os.path.join(checkpoint, WEIGHTS_NAME))
+            best_model.load_state_dict(state_dict)
+            best_model.to(trainer.args.device)
+        self.assertTrue(torch.allclose(best_model.a, trainer.model.a))
+        self.assertTrue(torch.allclose(best_model.b, trainer.model.b))
+
+        metrics = trainer.evaluate()
+        self.assertEqual(metrics[metric], best_value)
+
    def test_reproducible_training(self):
        # Checks that training worked, model trained and seed made a reproducible training.
        trainer = get_regression_trainer(learning_rate=0.1)
@@ -287,6 +356,86 @@ class TrainerIntegrationTest(unittest.TestCase):
        trainer.train()
        self.check_trained_model(trainer.model, alternate_seed=True)

+    def test_save_checkpoints(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(output_dir=tmpdir, save_steps=5)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 5, int(self.n_epochs * 64 / self.batch_size))
+
+        # With a regular model that is not a PreTrainedModel
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(output_dir=tmpdir, save_steps=5, pretrained=False)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 5, int(self.n_epochs * 64 / self.batch_size), False)
+
+    def test_load_best_model_at_end(self):
+        total = int(self.n_epochs * 64 / self.batch_size)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(
+                a=1.5,
+                b=2.5,
+                output_dir=tmpdir,
+                learning_rate=0.1,
+                eval_steps=5,
+                evaluation_strategy="steps",
+                load_best_model_at_end=True,
+            )
+            self.assertFalse(trainer.args.greater_is_better)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 5, total)
+            self.check_best_model_has_been_loaded(tmpdir, 5, total, trainer, "eval_loss")
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(
+                a=1.5,
+                b=2.5,
+                output_dir=tmpdir,
+                learning_rate=0.1,
+                eval_steps=5,
+                evaluation_strategy="steps",
+                load_best_model_at_end=True,
+                metric_for_best_model="accuracy",
+                compute_metrics=AlmostAccuracy(),
+            )
+            self.assertTrue(trainer.args.greater_is_better)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 5, total)
+            self.check_best_model_has_been_loaded(tmpdir, 5, total, trainer, "eval_accuracy", greater_is_better=True)
+
+        # Save is done every eval regardless of the strategy
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(
+                a=1.5,
+                b=2.5,
+                output_dir=tmpdir,
+                learning_rate=0.1,
+                evaluation_strategy="epoch",
+                load_best_model_at_end=True,
+                metric_for_best_model="accuracy",
+                compute_metrics=AlmostAccuracy(),
+            )
+            self.assertTrue(trainer.args.greater_is_better)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 64 // self.batch_size, total)
+            self.check_best_model_has_been_loaded(
+                tmpdir, 64 // self.batch_size, total, trainer, "eval_accuracy", greater_is_better=True
+            )
+
+        # Test this works with a non PreTrainedModel
+        with tempfile.TemporaryDirectory() as tmpdir:
+            trainer = get_regression_trainer(
+                output_dir=tmpdir,
+                learning_rate=0.1,
+                eval_steps=5,
+                evaluation_strategy="steps",
+                load_best_model_at_end=True,
+                pretrained=False,
+            )
+            self.assertFalse(trainer.args.greater_is_better)
+            trainer.train()
+            self.check_saved_checkpoints(tmpdir, 5, total, is_pretrained=False)
+            self.check_best_model_has_been_loaded(tmpdir, 5, total, trainer, "eval_loss", is_pretrained=False)
+
    @slow
    def test_trainer_eval_mrpc(self):
        MODEL_ID = "bert-base-cased-finetuned-mrpc"
Author	SHA1	Message	Date
Sylvain Gugger	1ba08dc221	Release: v3.3.1	2020-09-29 14:17:34 -04:00
Sylvain Gugger	8546dc55c2	Fix Trainer tests in a multiGPU env (#7458 )	2020-09-29 14:06:41 -04:00
Sylvain Gugger	d0fd7154c5	Catch import datasets common errors (#7456 )	2020-09-29 13:42:09 -04:00
Sylvain Gugger	f1220c5fe2	Add a code of conduct (#7433 )	2020-09-29 13:38:47 -04:00
Teven	9e9a1fb8c7	Adding gradient checkpointing to GPT2 (#7446 ) * GPT2 gradient checkpointing * find_unused_parameters removed if checkpointing * find_unused_parameters removed if checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Added a test for generation with checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-09-29 12:26:26 -04:00
Sylvain Gugger	52e8392b7e	Add automatic best model loading to Trainer (#7431 ) * Add automatic best model loading to Trainer * Some small fixes * Formatting	2020-09-29 10:41:18 -04:00
Sylvain Gugger	1fc4de69ed	Document new features of make fixup (#7434 )	2020-09-29 03:56:57 -04:00
GmailB	205bf0b7ea	Update README.md (#7444 ) Hi, just corrected the example code, add 2 links and fixed some typos	2020-09-29 03:18:01 -04:00
Sam Shleifer	74d8d69bd4	[s2s] consistent output format across eval scripts (#7435 )	2020-09-28 23:20:03 -04:00
Typicasoft	671b278e25	Create README.md (#7436 ) * Create README.md MagBERT-NER : Added widget (Text) * Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md	2020-09-28 18:25:25 -04:00
Manuel Romero	a1a8ffa512	Update README.md (#7429 ) Add links to models fine-tuned on a downstream task	2020-09-28 13:40:09 -04:00
Stas Bekman	f62f2ffdcc	[makefile] 10x speed up checking/fixing (#7403 ) * [makefile] check/fix only modified since branching files * fix phonies * parametrize dirs * have only one source for dirs to check * look ma, no autoformatters here	2020-09-28 10:45:42 -04:00
Lysandre	16c213820e	Update docs to version v3.3.0	2020-09-28 16:32:00 +02:00