Doc fixes in preparation for the docstyle PR (#8061)
* Fixes in preparation for doc styling * More fixes * Better syntax * Fixes * Style * More fixes * More fixes
This commit is contained in:
@@ -19,14 +19,12 @@ DataCollator = NewType("DataCollator", Callable[[List[InputDataClass]], Dict[str
|
||||
|
||||
def default_data_collator(features: List[InputDataClass]) -> Dict[str, torch.Tensor]:
|
||||
"""
|
||||
Very simple data collator that:
|
||||
- simply collates batches of dict-like objects
|
||||
- Performs special handling for potential keys named:
|
||||
Very simple data collator that simply collates batches of dict-like objects and erforms special handling for potential keys named:
|
||||
|
||||
- ``label``: handles a single value (int or float) per object
|
||||
- ``label_ids``: handles a list of values per object
|
||||
- does not do any additional preprocessing
|
||||
|
||||
i.e., Property names of the input object will be used as corresponding inputs to the model.
|
||||
Des not do any additional preprocessing: property names of the input object will be used as corresponding inputs to the model.
|
||||
See glue and ner for example of how it's useful.
|
||||
"""
|
||||
|
||||
@@ -425,6 +423,7 @@ class DataCollatorForPermutationLanguageModeling:
|
||||
def mask_tokens(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
|
||||
"""
|
||||
The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
|
||||
|
||||
0. Start from the beginning of the sequence by setting ``cur_len = 0`` (number of tokens processed so far).
|
||||
1. Sample a ``span_length`` from the interval ``[1, max_span_length]`` (length of span of tokens to be masked)
|
||||
2. Reserve a context of length ``context_length = span_length / plm_probability`` to surround span to be masked
|
||||
|
||||
Reference in New Issue
Block a user