[Large PR] Entire rework of pipelines. (#13308)
* Enabling dataset iteration on pipelines. Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format. * Fixing audio-classification for large PR. * Overly explicity null checking. * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`. * Removed internal state for parameters of the pipeline. Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters. * Move import warnings. * Small fixes. * Quality. * Another small fix, using the CI to debug faster. * Last fixes. * Last fix. * Small cleanup of tensor moving. * is not None. * Adding a bunch of docs + a iteration test. * Fixing doc style. * KeyDataset = None guard. * RRemoving the Cuda test for pipelines (was testing). * Even more simple iteration test. * Correct import . * Long day. * Fixes in docs. * [WIP] migrating object detection. * Fixed the target_size bug. * Fixup. * Bad variable name. * Fixing `ensure_on_device` respects original ModelOutput.
This commit is contained in:
@@ -46,12 +46,53 @@ The pipeline abstraction
|
||||
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
|
||||
pipeline but requires an additional argument which is the `task`.
|
||||
|
||||
Simple call on one item:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe("This restaurant is awesome")
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
|
||||
To call a pipeline on many items, you can either call with a `list`.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe(["This restaurant is awesome", "This restaurant is aweful"])
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
|
||||
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
|
||||
|
||||
|
||||
To iterate of full datasets it is recommended to use a :obj:`dataset` directly. This means you don't need to allocate
|
||||
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
|
||||
GPU. If it doesn't don't hesitate to create an issue.
|
||||
|
||||
.. code-block::
|
||||
|
||||
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
|
||||
dataset = datasets.load_dataset("superb", name="asr", split="test")
|
||||
|
||||
# KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
|
||||
# as we're not interested in the `target` part of the dataset.
|
||||
for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
|
||||
print(out)
|
||||
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
||||
# {"text": ....}
|
||||
# ....
|
||||
|
||||
|
||||
.. autofunction:: transformers.pipeline
|
||||
|
||||
Implementing a pipeline
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:doc:`Implementing a new pipeline <../add_new_pipeline>`
|
||||
|
||||
The task specific pipelines
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
AudioClassificationPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
|
||||
Reference in New Issue
Block a user