From 3a46e30dd1cd7ba96d25f531f97ac96fba3afb60 Mon Sep 17 00:00:00 2001 From: D Date: Fri, 26 Jan 2024 19:58:57 +0800 Subject: [PATCH] [`docs`] Update preprocessing.md (#28719) * Update preprocessing.md adjust ImageProcessor link to working target (same as in lower section of file) * Update preprocessing.md --- docs/source/en/preprocessing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/preprocessing.md b/docs/source/en/preprocessing.md index 4aa9030fe4..04e9688c90 100644 --- a/docs/source/en/preprocessing.md +++ b/docs/source/en/preprocessing.md @@ -22,7 +22,7 @@ Before you can train a model on a dataset, it needs to be preprocessed into the * Text, use a [Tokenizer](./main_classes/tokenizer) to convert text into a sequence of tokens, create a numerical representation of the tokens, and assemble them into tensors. * Speech and audio, use a [Feature extractor](./main_classes/feature_extractor) to extract sequential features from audio waveforms and convert them into tensors. -* Image inputs use a [ImageProcessor](./main_classes/image) to convert images into tensors. +* Image inputs use a [ImageProcessor](./main_classes/image_processor) to convert images into tensors. * Multimodal inputs, use a [Processor](./main_classes/processors) to combine a tokenizer and a feature extractor or image processor. @@ -397,7 +397,7 @@ width are expected, for others only the `shortest_edge` is defined. >>> _transforms = Compose([RandomResizedCrop(size), ColorJitter(brightness=0.5, hue=0.5)]) ``` -2. The model accepts [`pixel_values`](model_doc/visionencoderdecoder#transformers.VisionEncoderDecoderModel.forward.pixel_values) +2. The model accepts [`pixel_values`](model_doc/vision-encoder-decoder#transformers.VisionEncoderDecoderModel.forward.pixel_values) as its input. `ImageProcessor` can take care of normalizing the images, and generating appropriate tensors. Create a function that combines image augmentation and image preprocessing for a batch of images and generates `pixel_values`: