ChunkPipeline (batch_size enabled on zero-cls and qa pipelines. (#14225)

* Pipeline chunks. * Batching for Chunking pipelines ? * Batching for `question-answering` and `zero-shot-cls`. * Fixing for FNet. * Making ASR a chunk pipeline. * Chunking ASR API. * doc style. * Fixing ASR test. * Fixing QA eror (p_mask, padding is 1, not 0). * Enable both vad and simple chunking. * Max length for vad. * remove inference mode, crashing on s2t. * Revert ChunkPipeline for ASRpipeline. Too many knobs for simple integration within the pipeline, better stick to external convenience functions instead, more control to be had, simpler pipeline and also easier to replace with other things later. * Drop necessity for PT for these. * Enabling generators. * Add mic + cleanup. * Typo. * Typo2. * Remove ASR work, it does not belong in this PR anymore. * Update src/transformers/pipelines/pt_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Adding many comments. * Doc quality. * `hidden_states` handling. * Adding doc. * Bad rebase. * Autofixing docs. * Fixing CRITICAL bug in the new Zerocls pipeline. Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-12-27 11:26:20 +01:00
parent 705ca7f21b
commit b058490ceb
7 changed files with 631 additions and 274 deletions
--- a/docs/source/main_classes/pipelines.mdx
+++ b/docs/source/main_classes/pipelines.mdx
@@ -93,12 +93,36 @@ for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
    # ....
 ```

+For ease of use, a generator is also possible:
+
+
+```python
+from transformers import pipeline
+
+pipe = pipeline("text-classification")
+
+def data():
+    while True:
+        # This could come from a dataset, a database, a queue or HTTP request
+        # in a server
+        # Caveat: because this is iterative, you cannot use `num_workers > 1` variable
+        # to use multiple threads to preprocess data. You can still have 1 thread that
+        # does the preprocessing while the main runs the big inference
+        yield "This is a test"
+
+for out in pipe(data()):
+    print(out)
+    # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
+    # {"text": ....}
+    # ....
+```
+
 [[autodoc]] pipeline

 ## Pipeline batching

-All pipelines (except *zero-shot-classification* and *question-answering* currently) can use batching. This will work
-whenever the pipeline uses its streaming ability (so when passing lists or `Dataset`).
+All pipelines can use batching. This will work
+whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).

 ```python
 from transformers import pipeline                                                   
@@ -120,7 +144,7 @@ for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_firs
 However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
 on hardware, data and the actual model being used.

-Example where it's most a speedup:
+Example where it's mostly a speedup:

 </Tip>

@@ -227,6 +251,39 @@ For users, a rule of thumb is:
  - The larger the GPU the more likely batching is going to be more interesting
 - As soon as you enable batching, make sure you can handle OOMs nicely.

+## Pipeline chunk batching
+
+`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield
+mutliple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument.
+
+In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
+regular `Pipeline`. In short:
+
+
+```python
+preprocessed = pipe.preprocess(inputs)
+model_outputs = pipe.forward(preprocessed)
+outputs = pipe.postprocess(model_ouputs)
+```
+
+Now becomes:
+
+
+```python
+all_model_outputs = []
+for preprocessed in pipe.preprocess(inputs):
+    model_outputs = pipe.forward(preprocessed)
+    all_model_outputs.append(model_outputs)
+outputs = pipe.postprocess(all_model_ouputs)
+```
+
+This should be very transparent to your code because the pipelines are used in
+the same way.
+
+This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
+about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size`
+independantly of the inputs. The caveats from the previous section still apply.
+
 ## Pipeline custom code

 If you want to override a specific pipeline.