* fix typos

* fix typos

* fix typos

* fix typos
This commit is contained in:
omahs
2025-03-24 15:08:29 +01:00
committed by GitHub
parent 340500b1a9
commit cbf924b76c
67 changed files with 111 additions and 111 deletions

View File

@@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
@@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
Multimodal model chat templates expect a similar [template](./chat_templating) as text-only models. It needs `messages` that includes a dictionary of the `role` and `content`.
Multimodal templates are included in the [Processor](./processors) class and requires an additional `type` key for specifying whether the included content is an image, video, or text.
Multimodal templates are included in the [Processor](./processors) class and require an additional `type` key for specifying whether the included content is an image, video, or text.
This guide will show you how to format chat templates for multimodal models as well as some best practices for configuring the template
@@ -109,7 +109,7 @@ These inputs are now ready to be used in [`~GenerationMixin.generate`].
Some vision models also support video inputs. The message format is very similar to the format for [image inputs](#image-inputs).
- The content `"type"` should be `"video"` to indicate the the content is a video.
- The content `"type"` should be `"video"` to indicate the content is a video.
- For videos, it can be a link to the video (`"url"`) or it could be a file path (`"path"`). Videos loaded from a URL can only be decoded with [PyAV](https://pyav.basswood-io.com/docs/stable/) or [Decord](https://github.com/dmlc/decord).
> [!WARNING]
@@ -141,7 +141,7 @@ Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input
The `video_load_backend` parameter refers to a specific framework to load a video. It supports [PyAV](https://pyav.basswood-io.com/docs/stable/), [Decord](https://github.com/dmlc/decord), [OpenCV](https://github.com/opencv/opencv), and [torchvision](https://pytorch.org/vision/stable/index.html).
The examples below uses Decord as the backend because it is a bit faster than PyAV.
The examples below use Decord as the backend because it is a bit faster than PyAV.
<hfoptions id="sampling">
<hfoption id="fixed number of frames">