Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351)
* [Wav2Vec2] Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode * [Wav2Vec2] Add user-managed LM's pool tests and usage examples * Improve styling Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Wav2Vec2] Fix hyperlink references Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
bf0e094142
commit
af150e4a1c
@@ -73,6 +73,61 @@ This model was contributed by [patrickvonplaten](https://huggingface.co/patrickv
|
||||
- batch_decode
|
||||
- decode
|
||||
|
||||
### Decoding multiple audios
|
||||
|
||||
If you are planning to decode multiple batches of audios, you should consider using [`~Wav2Vec2ProcessorWithLM.batch_decode`] and passing an instantiated `multiprocessing.Pool`.
|
||||
Otherwise, [`~Wav2Vec2ProcessorWithLM.batch_decode`] performance will be slower than calling [`~Wav2Vec2ProcessorWithLM.decode`] for each audio individually, as it internally instantiates a new `Pool` for every call. See the example below:
|
||||
|
||||
```python
|
||||
>>> # Let's see how to use a user-managed pool for batch decoding multiple audios
|
||||
>>> from multiprocessing import get_context
|
||||
>>> from transformers import AutoTokenizer, AutoProcessor, AutoModelForCTC
|
||||
>>> from datasets import load_dataset
|
||||
>>> import datasets
|
||||
>>> import torch
|
||||
|
||||
>>> # import model, feature extractor, tokenizer
|
||||
>>> model = AutoModelForCTC.from_pretrained("patrickvonplaten/wav2vec2-base-100h-with-lm").to("cuda")
|
||||
>>> processor = AutoProcessor.from_pretrained("patrickvonplaten/wav2vec2-base-100h-with-lm")
|
||||
|
||||
>>> # load example dataset
|
||||
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
||||
>>> dataset = dataset.cast_column("audio", datasets.Audio(sampling_rate=16_000))
|
||||
|
||||
|
||||
>>> def map_to_array(batch):
|
||||
... batch["speech"] = batch["audio"]["array"]
|
||||
... return batch
|
||||
|
||||
|
||||
>>> # prepare speech data for batch inference
|
||||
>>> dataset = dataset.map(map_to_array, remove_columns=["audio"])
|
||||
|
||||
|
||||
>>> def map_to_pred(batch, pool):
|
||||
... inputs = processor(batch["speech"], sampling_rate=16_000, padding=True, return_tensors="pt")
|
||||
... inputs = {k: v.to("cuda") for k, v in inputs.items()}
|
||||
|
||||
... with torch.no_grad():
|
||||
... logits = model(**inputs).logits
|
||||
|
||||
... transcription = processor.batch_decode(logits.cpu().numpy(), pool).text
|
||||
... batch["transcription"] = transcription
|
||||
... return batch
|
||||
|
||||
|
||||
>>> # note: pool should be instantiated *after* `Wav2Vec2ProcessorWithLM`.
|
||||
>>> # otherwise, the LM won't be available to the pool's sub-processes
|
||||
>>> # select number of processes and batch_size based on number of CPU cores available and on dataset size
|
||||
>>> with get_context("fork").Pool(processes=2) as pool:
|
||||
... result = dataset.map(
|
||||
... map_to_pred, batched=True, batch_size=2, fn_kwargs={"pool": pool}, remove_columns=["speech"]
|
||||
... )
|
||||
|
||||
>>> result["transcription"][:2]
|
||||
['MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL', "NOR IS MISTER COULTER'S MANNER LESS INTERESTING THAN HIS MATTER"]
|
||||
```
|
||||
|
||||
## Wav2Vec2 specific outputs
|
||||
|
||||
[[autodoc]] models.wav2vec2_with_lm.processing_wav2vec2_with_lm.Wav2Vec2DecoderWithLMOutput
|
||||
|
||||
Reference in New Issue
Block a user