Doc fixes in preparation for the docstyle PR (#8061)

* Fixes in preparation for doc styling

* More fixes

* Better syntax

* Fixes

* Style

* More fixes

* More fixes
This commit is contained in:
Sylvain Gugger
2020-10-26 15:01:09 -04:00
committed by GitHub
parent 8bbb74f211
commit 04a17f8550
27 changed files with 179 additions and 237 deletions

View File

@@ -19,14 +19,12 @@ DataCollator = NewType("DataCollator", Callable[[List[InputDataClass]], Dict[str
def default_data_collator(features: List[InputDataClass]) -> Dict[str, torch.Tensor]:
"""
Very simple data collator that:
- simply collates batches of dict-like objects
- Performs special handling for potential keys named:
Very simple data collator that simply collates batches of dict-like objects and erforms special handling for potential keys named:
- ``label``: handles a single value (int or float) per object
- ``label_ids``: handles a list of values per object
- does not do any additional preprocessing
i.e., Property names of the input object will be used as corresponding inputs to the model.
Des not do any additional preprocessing: property names of the input object will be used as corresponding inputs to the model.
See glue and ner for example of how it's useful.
"""
@@ -425,6 +423,7 @@ class DataCollatorForPermutationLanguageModeling:
def mask_tokens(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
"""
The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting ``cur_len = 0`` (number of tokens processed so far).
1. Sample a ``span_length`` from the interval ``[1, max_span_length]`` (length of span of tokens to be masked)
2. Reserve a context of length ``context_length = span_length / plm_probability`` to surround span to be masked