Time stamps for CTC models (#15687)

* [Wav2Vec2 Time Stamps]

* Add first version

* add word time stamps

* Fix

* save intermediate space

* improve

* [Finish CTC Tokenizer]

* remove @

* remove @

* push

* continue with phonemes

* up

* finish PR

* up

* add example

* rename

* finish

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct split

* finalize

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Patrick von Platen
2022-02-22 19:26:44 +01:00
committed by GitHub
parent 32295b15a1
commit c44d3675c2
12 changed files with 858 additions and 70 deletions

View File

@@ -45,6 +45,8 @@ This model was contributed by [patrickvonplaten](https://huggingface.co/patrickv
[[autodoc]] Wav2Vec2CTCTokenizer
- __call__
- save_vocabulary
- decode
- batch_decode
## Wav2Vec2FeatureExtractor