No more Tuple, List, Dict (#38797)
* No more Tuple, List, Dict * make fixup * More style fixes * Docstring fixes with regex replacement * Trigger tests * Redo fixes after rebase * Fix copies * [test all] * update * [test all] * update * [test all] * make style after rebase * Patch the hf_argparser test * Patch the hf_argparser test * style fixes * style fixes * style fixes * Fix docstrings in Cohere test * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -571,7 +571,7 @@ The processor should call the appropriate modality-specific processors within it
|
||||
def __call__(
|
||||
self,
|
||||
images: ImageInput = None,
|
||||
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
|
||||
text: Union[TextInput, PreTokenizedInput, list[TextInput], list[PreTokenizedInput]] = None,
|
||||
audio=None,
|
||||
videos=None,
|
||||
**kwargs: Unpack[YourModelProcessorKwargs],
|
||||
|
||||
@@ -92,7 +92,7 @@ def custom_attention(
|
||||
a_new_kwargs = None, # You can now add as many kwargs as you need
|
||||
another_new_kwargs = None, # You can now add as many kwargs as you need
|
||||
**kwargs, # You need to accept **kwargs as models will pass other args
|
||||
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]
|
||||
) -> tuple[torch.Tensor, Optional[torch.Tensor]]
|
||||
... # do your magic!
|
||||
return attn_output, attn_weights # attn_weights are optional here
|
||||
|
||||
|
||||
@@ -47,7 +47,7 @@ class ResnetConfig(PretrainedConfig):
|
||||
def __init__(
|
||||
self,
|
||||
block_type="bottleneck",
|
||||
layers: List[int] = [3, 4, 6, 3],
|
||||
layers: list[int] = [3, 4, 6, 3],
|
||||
num_classes: int = 1000,
|
||||
input_channels: int = 3,
|
||||
cardinality: int = 1,
|
||||
|
||||
@@ -152,7 +152,7 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
||||
| `temperature` | `float` | How unpredictable the next selected token will be. High values (`>0.8`) are good for creative tasks, low values (e.g. `<0.4`) for tasks that require "thinking". Requires `do_sample=True`. |
|
||||
| `num_beams` | `int` | When set to `>1`, activates the beam search algorithm. Beam search is good on input-grounded tasks. Check [this guide](./generation_strategies.md) for more information. |
|
||||
| `repetition_penalty` | `float` | Set it to `>1.0` if you're seeing the model repeat itself often. Larger values apply a larger penalty. |
|
||||
| `eos_token_id` | `List[int]` | The token(s) that will cause generation to stop. The default value is usually good, but you can specify a different token. |
|
||||
| `eos_token_id` | `list[int]` | The token(s) that will cause generation to stop. The default value is usually good, but you can specify a different token. |
|
||||
|
||||
|
||||
## Pitfalls
|
||||
|
||||
@@ -62,11 +62,11 @@ def make_box_first_token_mask(bboxes, words, tokenizer, max_seq_length=512):
|
||||
|
||||
box_first_token_mask = np.zeros(max_seq_length, dtype=np.bool_)
|
||||
|
||||
# encode(tokenize) each word from words (List[str])
|
||||
input_ids_list: List[List[int]] = [tokenizer.encode(e, add_special_tokens=False) for e in words]
|
||||
# encode(tokenize) each word from words (list[str])
|
||||
input_ids_list: list[list[int]] = [tokenizer.encode(e, add_special_tokens=False) for e in words]
|
||||
|
||||
# get the length of each box
|
||||
tokens_length_list: List[int] = [len(l) for l in input_ids_list]
|
||||
tokens_length_list: list[int] = [len(l) for l in input_ids_list]
|
||||
|
||||
box_end_token_indices = np.array(list(itertools.accumulate(tokens_length_list)))
|
||||
box_start_token_indices = box_end_token_indices - np.array(tokens_length_list)
|
||||
|
||||
@@ -149,7 +149,7 @@ As a summary, consider the following table:
|
||||
| **Description** | Predicting bounding boxes and class labels around objects in an image | Predicting masks around objects (i.e. instances) in an image | Predicting masks around both objects (i.e. instances) as well as "stuff" (i.e. background things like trees and roads) in an image |
|
||||
| **Model** | [`~transformers.DetrForObjectDetection`] | [`~transformers.DetrForSegmentation`] | [`~transformers.DetrForSegmentation`] |
|
||||
| **Example dataset** | COCO detection | COCO detection, COCO panoptic | COCO panoptic | |
|
||||
| **Format of annotations to provide to** [`~transformers.DetrImageProcessor`] | {'image_id': `int`, 'annotations': `List[Dict]`} each Dict being a COCO object annotation | {'image_id': `int`, 'annotations': `List[Dict]`} (in case of COCO detection) or {'file_name': `str`, 'image_id': `int`, 'segments_info': `List[Dict]`} (in case of COCO panoptic) | {'file_name': `str`, 'image_id': `int`, 'segments_info': `List[Dict]`} and masks_path (path to directory containing PNG files of the masks) |
|
||||
| **Format of annotations to provide to** [`~transformers.DetrImageProcessor`] | {'image_id': `int`, 'annotations': `list[Dict]`} each Dict being a COCO object annotation | {'image_id': `int`, 'annotations': `list[Dict]`} (in case of COCO detection) or {'file_name': `str`, 'image_id': `int`, 'segments_info': `list[Dict]`} (in case of COCO panoptic) | {'file_name': `str`, 'image_id': `int`, 'segments_info': `list[Dict]`} and masks_path (path to directory containing PNG files of the masks) |
|
||||
| **Postprocessing** (i.e. converting the output of the model to Pascal VOC format) | [`~transformers.DetrImageProcessor.post_process`] | [`~transformers.DetrImageProcessor.post_process_segmentation`] | [`~transformers.DetrImageProcessor.post_process_segmentation`], [`~transformers.DetrImageProcessor.post_process_panoptic`] |
|
||||
| **evaluators** | `CocoEvaluator` with `iou_types="bbox"` | `CocoEvaluator` with `iou_types="bbox"` or `"segm"` | `CocoEvaluator` with `iou_tupes="bbox"` or `"segm"`, `PanopticEvaluator` |
|
||||
|
||||
|
||||
@@ -83,7 +83,7 @@ def read_video_pyav(container, indices):
|
||||
Decode the video with PyAV decoder.
|
||||
Args:
|
||||
container (`av.container.input.InputContainer`): PyAV container.
|
||||
indices (`List[int]`): List of frame indices to decode.
|
||||
indices (`list[int]`): List of frame indices to decode.
|
||||
Returns:
|
||||
result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
|
||||
'''
|
||||
|
||||
@@ -216,12 +216,12 @@ class Olmo2Attention(OlmoAttention):
|
||||
def forward(
|
||||
self,
|
||||
hidden_states: torch.Tensor,
|
||||
position_embeddings: Tuple[torch.Tensor, torch.Tensor],
|
||||
position_embeddings: tuple[torch.Tensor, torch.Tensor],
|
||||
attention_mask: Optional[torch.Tensor],
|
||||
past_key_value: Optional[Cache] = None,
|
||||
cache_position: Optional[torch.LongTensor] = None,
|
||||
**kwargs,
|
||||
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
|
||||
) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:
|
||||
input_shape = hidden_states.shape[:-1]
|
||||
hidden_shape = (*input_shape, -1, self.head_dim)
|
||||
|
||||
@@ -294,9 +294,9 @@ class Olmo2DecoderLayer(OlmoDecoderLayer):
|
||||
output_attentions: Optional[bool] = False,
|
||||
use_cache: Optional[bool] = False,
|
||||
cache_position: Optional[torch.LongTensor] = None,
|
||||
position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # necessary, but kept here for BC
|
||||
position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None, # necessary, but kept here for BC
|
||||
**kwargs,
|
||||
) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
|
||||
) -> tuple[torch.FloatTensor, Optional[tuple[torch.FloatTensor, torch.FloatTensor]]]:
|
||||
residual = hidden_states
|
||||
|
||||
# Self Attention
|
||||
@@ -494,7 +494,7 @@ class LlamaForCausalLM(nn.Module):
|
||||
input_ids: torch.LongTensor = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.LongTensor] = None,
|
||||
past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
|
||||
past_key_values: Optional[Union[Cache, list[torch.FloatTensor]]] = None,
|
||||
inputs_embeds: Optional[torch.FloatTensor] = None,
|
||||
labels: Optional[torch.LongTensor] = None,
|
||||
use_cache: Optional[bool] = None,
|
||||
@@ -520,7 +520,7 @@ class NewModelForCausalLM(LlamaForCausalLM): | class LlamaForCausalLM(nn.M
|
||||
| input_ids: torch.LongTensor = None,
|
||||
| attention_mask: Optional[torch.Tensor] = None,
|
||||
| position_ids: Optional[torch.LongTensor] = None,
|
||||
| past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = |None,
|
||||
| past_key_values: Optional[Union[Cache, list[torch.FloatTensor]]] = |None,
|
||||
| inputs_embeds: Optional[torch.FloatTensor] = None,
|
||||
| labels: Optional[torch.LongTensor] = None,
|
||||
| use_cache: Optional[bool] = None,
|
||||
|
||||
@@ -170,7 +170,7 @@ Unlike other data collators, this specific data collator needs to apply a differ
|
||||
... processor: AutoProcessor
|
||||
... padding: Union[bool, str] = "longest"
|
||||
|
||||
... def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
|
||||
... def __call__(self, features: list[dict[str, Union[list[int], torch.Tensor]]]) -> dict[str, torch.Tensor]:
|
||||
... # split inputs and labels since they have to be of different lengths and need
|
||||
... # different padding methods
|
||||
... input_features = [{"input_values": feature["input_values"][0]} for feature in features]
|
||||
|
||||
@@ -243,7 +243,7 @@ and it uses the exact same dataset as an example. Apply some geometric and color
|
||||
... )
|
||||
```
|
||||
|
||||
The `image_processor` expects the annotations to be in the following format: `{'image_id': int, 'annotations': List[Dict]}`,
|
||||
The `image_processor` expects the annotations to be in the following format: `{'image_id': int, 'annotations': list[Dict]}`,
|
||||
where each dictionary is a COCO object annotation. Let's add a function to reformat annotations for a single example:
|
||||
|
||||
```py
|
||||
@@ -252,9 +252,9 @@ The `image_processor` expects the annotations to be in the following format: `{'
|
||||
|
||||
... Args:
|
||||
... image_id (str): image id. e.g. "0001"
|
||||
... categories (List[int]): list of categories/class labels corresponding to provided bounding boxes
|
||||
... areas (List[float]): list of corresponding areas to provided bounding boxes
|
||||
... bboxes (List[Tuple[float]]): list of bounding boxes provided in COCO format
|
||||
... categories (list[int]): list of categories/class labels corresponding to provided bounding boxes
|
||||
... areas (list[float]): list of corresponding areas to provided bounding boxes
|
||||
... bboxes (list[tuple[float]]): list of bounding boxes provided in COCO format
|
||||
... ([center_x, center_y, width, height] in absolute coordinates)
|
||||
|
||||
... Returns:
|
||||
@@ -397,7 +397,7 @@ Intermediate format of boxes used for training is `YOLO` (normalized) but we wil
|
||||
|
||||
... Args:
|
||||
... boxes (torch.Tensor): Bounding boxes in YOLO format
|
||||
... image_size (Tuple[int, int]): Image size in format (height, width)
|
||||
... image_size (tuple[int, int]): Image size in format (height, width)
|
||||
|
||||
... Returns:
|
||||
... torch.Tensor: Bounding boxes in Pascal VOC format (x_min, y_min, x_max, y_max)
|
||||
|
||||
@@ -408,7 +408,7 @@ instructs the model to ignore that part of the spectrogram when calculating the
|
||||
... class TTSDataCollatorWithPadding:
|
||||
... processor: Any
|
||||
|
||||
... def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
|
||||
... def __call__(self, features: list[dict[str, Union[list[int], torch.Tensor]]]) -> dict[str, torch.Tensor]:
|
||||
... input_ids = [{"input_ids": feature["input_ids"]} for feature in features]
|
||||
... label_features = [{"input_values": feature["labels"]} for feature in features]
|
||||
... speaker_features = [feature["speaker_embeddings"] for feature in features]
|
||||
|
||||
Reference in New Issue
Block a user