Commit Graph

726 Commits

Author SHA1 Message Date
Younes Belkada
c34c50cdc0 [docs] Add MaskGenerationPipeline in docs (#27063)
* add `MaskGenerationPipeline` in docs

* Update __init__.py

* fix repo consistency and clarify docstring

* add on check docstirngs

* actually we do have a tf sam

* oops
2023-10-25 19:31:36 +02:00
Yoach Lacombe
cb45f71c4d Add Seamless M4T model (#25693)
* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-23 14:49:48 +02:00
Adam Ross
929134bf65 [docstring] Fix docstring for speech-to-text config (#26883)
* Fix docstring for speech-to-text config

* Refactor doc line len <= 119 char

* Remove Speech2TextConfig from OBJECTS_TO_IGNORE

* Fix Speech2TextConfig doc str

* Fix Speech2TextConfig doc using doc-builder

* Refactor Speech2TextConfig doc
2023-10-20 09:49:55 +02:00
Daniil
ad08137e47 [docstring] Fix docstrings for CodeGen (#26821)
* remove docstrings CodeGen from objects_to_ignore

* autofix codegen docstrings

* fill in the missing types and docstrings

* fixup

* change descriptions to be in a separate line

* apply docstring suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* update n_ctx description in CodeGenConfig

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2023-10-19 14:21:40 +02:00
Sparty
816c2237c1 [docstring] Fix docstring for ChineseCLIP (#26880)
* Remove ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig from check_docstrings

* Run fix_and_overwrite for ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig

* Replace <fill_type> and <fill_docstring> in configuration_chinese_clip.py, image_processing_chinese_clip.py with type and docstring values

---------

Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>
2023-10-19 10:52:14 +02:00
Pablo Montalvo
caa0ff0bf1 Add fuyu model (#26911)
* initial commit

* add processor, add fuyu naming

* add draft processor

* fix processor

* remove dropout to fix loading of weights

* add image processing fixes from Pedro

* fix

* fix processor

* add basic processing fuyu test

* add documentation and TODO

* address comments, add tests, add doc

* replace assert with torch asserts

* add Mixins and fix tests

* clean imports

* add model tester, clean imports

* fix embedding test

* add updated tests from pre-release model

* Processor: return input_ids used for inference

* separate processing and model tests

* relax test tolerance for embeddings

* add test for logit comparison

* make sure fuyu image processor is imported in the init

* fix formattingh

* more formatting issues

* and more

* fixups

* remove some stuff

* nits

* update init

* remove the fuyu file

* Update integration test with release model

* Update conversion script.

The projection is not used, as confirmed by the authors.

* improve geenration

* Remove duplicate function

* Trickle down patches to model call

* processing fuyu updates

* remove things

* fix prepare_inputs_for_generation to fix generate()

* remove model_input

* update

* add generation tests

* nits

* draft leverage automodel and autoconfig

* nits

* fix dtype patch

* address comments, update READMEs and doc, include tests

* add working processing test, remove refs to subsequences

* add tests, remove Sequence classification

* processing

* update

* update the conversion script

* more processing cleanup

* safe import

* take out ModelTesterMixin for early release

* more cl;eanup

* more cleanup

* more cleanup

* and more

* register a buffer

* nits

* add postprocessing of generate output

* nits

* updates

* add one working test

* fix test

* make fixup works

* fixup

* Arthur's updates

* nits

* update

* update

* fix processor

* update tests

* passe more fixups

* fix

* nits

* don't import torch

* skip fuyu config for now

* fixup done

* fixup

* update

* oups

* nits

* Use input embeddings

* no buffer

* update

* styling processing fuyu

* fix test

* update licence

* protect torch import

* fixup and update not doctested

* kwargs should be passed

* udpates

* update the impofixuprts in the test

* protect import

* protecting imports

* protect imports in type checking

* add testing decorators

* protect top level import structure

* fix typo

* fix check init

* move requires_backend to functions

* Imports

* Protect types

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-10-18 15:24:11 -07:00
louietouie
51042ae8e5 [docstring] Fix docstring for LukeConfig (#26858)
* Deleted LukeConfig and ran check_docstrings.py

* Filled docstring information

---------

Co-authored-by: louie <louisparizeau@Chicken.local>
2023-10-17 19:30:46 +02:00
przemL
5c6b83cb69 [docstring] Fix bert generation tokenizer (#26820)
* Remove BertGenerationTokenizer from objects to ignore

The file BertGenerationTokenizer is removed from
objects to ignore as a first step to fix docstring.

* Docstrings fix for BertGenerationTokenizer

Docstring fix is generated for BertGenerationTokenizer
by using check_docstrings.py.

* Fix docstring for BertGenerationTokenizer

Added sep_token type and docstring in BertGenerationTokenizer.
2023-10-16 18:26:55 +02:00
Yih-Dar
12cc123359 Better way to run AMD CI with different flavors (#26634)
* Enable testing against mi250

* Change BERT to trigger tests

* Revert BERT's change

* AMD CI

* AMD CI

---------

Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-16 16:24:30 +02:00
Bojun-Feng
5c081e2993 [docstring] Fix docstring for CodeLlamaTokenizerFast (#26666)
* remove from OBJECTS_TO_IGNORE

* run check_docstrings.py

* fill in information

* ignore CodeLlamaTokenizer
2023-10-16 10:11:45 +02:00
Sparty
0e52af4d7b [docstring] Fix docstring for CanineConfig (#26771)
* Remove CanineConfig from check_docstrings

* Run fix_and_overwrite for CanineConfig

* Replace <fill_type> and <fill_docstring> in configuration_canine.py with type and docstring values

---------

Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>
2023-10-16 10:08:44 +02:00
NielsRogge
762af3e3c7 Add OWLv2, bis (#26668)
* First draft

* Update conversion script

* Update copied from statements

* Fix style

* Add copied from to config

* Add copied from to processor

* Run make fixup

* Add docstring

* Update docstrings

* Add method

* Improve docstrings

* Fix docstrings

* Improve docstrings

* Remove onnx

* Add flag

* Address comments

* Add copied from to model tests

* Add flag to conversion script

* Add code snippet

* Address more comments

* Address comment

* Improve conversion script

* More improvements

* Add expected objectness logits

* Skip test

* Improve conversion script

* Extend conversion script

* Convert large checkpoint

* Fix doc tests

* Convert all checkpoints, update integration tests

* Add checkpoint_path arg

* Fix repo_id
2023-10-13 16:41:24 +02:00
dekomori_sanae09
5bfda28dd3 [docstring] fix docstring DPRConfig (#26674)
* fix docstring dpr config

* fix style

* Update descp

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2023-10-13 12:13:43 +02:00
Bojun-Feng
d085662c59 [docstring] Fix docstring for RwkvConfig (#26782)
* update check_docstrings

* update docstring
2023-10-13 10:20:30 +02:00
Adwait
33df09e71a [docstring] Fix docstring for 'BertGenerationConfig' (#26661)
* [docstring] Remove 'BertGenerationConfig' from OBJECTS_TO_IGNORE

* [docstring] Fix docstring for 'BertGenerationConfig' (#26638)
2023-10-12 17:01:13 +02:00
Joseph McDonnell
b4199c2dad [docstring] Update GPT2 and Whisper (#26642)
* [DOCS] Update docstrings for  and  tokenizer

* [DOCS] add pad_token argument to whisper tokenizer docstring

* [FIX] Reword pad_token description

* [CHORE] Apply style formatting

---------

Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai>
2023-10-12 17:00:59 +02:00
Gizem
eb734e5147 [docstring] Fix UniSpeech, UniSpeechSat, Wav2Vec2ForCTC (#26664)
* Remove UniSpeechConfig

* Remove , at the end otherwise check_docstring changes order

* Auto add new docstring

* Update docstring for UniSpeechConfig

* Remove from check_docstrings

* Remove UniSpeechSatConfig and UniSpeechSatForCTC from check_docstrings

* Remove , at the end

* Fix docstring

* Update docstring for Wav2Vec2ForCTC

* Update Wav2Vec2ForCTC docstring

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix style

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2023-10-12 16:51:34 +02:00
Maria Khalusova
0ebee8b933 [docs] LLM prompting guide (#26274)
* llm prompting guide

* updated code examples

* an attempt to fix the code example tests

* set seed in examples

* added a doctest comment

* added einops to the doc_test_job

* string formatting

* string formatting, again

* added the toc to slow_documentation_tests.txt

* minor list fix

* string formatting + pipe renamed

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* replaced max_length with max_new_tokens and updated the outputs to match

* minor formatting fix

* removed einops from circleci config

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* removed einops and trust_remote_code parameter

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-10-12 08:48:01 -04:00
Tom Aarsen
40ea9ab2a1 Add many missing spaces in adjacent strings (#26751)
Add missing spaces in adjacent strings
2023-10-12 10:28:40 +02:00
Yih-Dar
3bc65505fc Fix doctest for Blip2ForConditionalGeneration (#26737)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-12 10:01:07 +02:00
Bojun-Feng
797a1babf2 [docstring] Fix docstring for CodeLlamaTokenizer (#26709)
* update check_docstrings

* update docstring
2023-10-11 18:01:22 +02:00
Minho Ryang
aaccf1844e [docstring] Fix docstring for LlamaTokenizer and LlamaTokenizerFast (#26669)
* [docstring] Fix docstring for `LlamaTokenizer` and `LlamaTokenizerFast`

* [docstring] Fix docstring typo at `LlamaTokenizer` and `LlamaTokenizerFast`
2023-10-11 17:03:31 +02:00
Shivanand
cc44ca8017 [docstring] SwinModel docstring fix (#26679)
* remove from utils

* updated doc string

* only in the model

* Update src/transformers/models/swin/modeling_swin.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Update src/transformers/models/swin/modeling_swin.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2023-10-11 15:53:32 +02:00
Yih-Dar
5334796d20 Copied from for test files (#26713)
* copied statement for test files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-11 14:12:09 +02:00
Pavarissy
e8fdd7875d [docstring] Fix docstring for LlamaConfig (#26685)
* Your commit message here

* fix LlamaConfig docstring

* run make fixup

* fix formatting after review

reformat of the file to prevent script issues

* rerun make fixup after reformat
2023-10-10 17:05:48 +02:00
Isaac Chung
a5e6df82c0 [docstring] Fix docstrings for CLIP (#26691)
fix docstrings for vanilla clip
2023-10-09 17:39:05 +02:00
Alex Bzdel
3257946fb7 [docstring] Fix docstring for DonutImageProcessor (#26641)
* removed donutimageprocessor from objects_to_ignore

* added docstring for donutimageprocessor

* readding donut file

* moved docstring to correct location
2023-10-09 16:32:13 +02:00
Isaac Chung
d2f06dfffc [docstring] Fix docstring for CLIPImageProcessor (#26676)
fix docstring for CLIPImageProcessor
2023-10-09 14:22:44 +02:00
Isaac Chung
3763101f85 [docstring] Fix docstring CLIP configs (#26677)
* fix docstrings for CLIP configs

* black formatted
2023-10-09 12:34:01 +02:00
Yih-Dar
360ea8fc72 [docstring] Fix docstring for AlbertConfig (#26636)
example fix docstring

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-06 17:36:22 +02:00
Maria Khalusova
18fbeec824 [docs] Update to scripts building index.md (#26546)
* build the table in index.md with links to the model_doc

* removed list generation on index.md

* fixed missing models

* make style
2023-10-05 10:20:41 -04:00
Sylvain Gugger
03af4c42a6 Docstring check (#26052)
* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-10-04 15:13:37 +02:00
Srijan Sahay Srivastava
245da7ed38 [Doctest] Add configuration_encoder_decoder.py (#26519)
* [Doctest] Add configuration_encoder_decoder.py

Added configuration_encoder_decoder.py to utils/documentation_tests.txt for doctest

* Revert "[Doctest] Add configuration_encoder_decoder.py"

This reverts commit bd653535a4356dc3c9f43e65883819079a2053b0.

* [Doctest] Add configuration_encoder_decoder.py

add configuration_encoder_decoder.py to utils/documentation_tests.txt

* [Doctest] Add configuration_encoder_decoder.py

add configuration_encoder_decoder.py to utils/documentation_tests.txt

* [Doctest] Add configuration_encoder_decoder.py

add configuration_encoder_decoder.py to utils/documentation_tests.txt

* changed as per request

* fixed line 46
2023-10-03 11:21:24 +02:00
Adithya Hegde Kota
4b4c6aabfb [Doctest] Add configuration_roformer.py (#26530)
* [Doctest] Add configuration_roformer.py

* [Doctest] Add configuration_roformer.py

* [Doctest] Add configuration_roformer.py

* [Doctest] Add configuration_roformer.py

* Removed documentation_test.txt

* Removed configuration_roformer.py

* Update not_doctested.txt
2023-10-02 17:19:13 +02:00
Lysandre Debut
78dd120282 Fix failing doctest (#26450)
* Fix doctest

* Adding modeling also for now
2023-09-27 18:47:26 +02:00
Chris Bamford
72958fcd3c [Mistral] Mistral-7B-v0.1 support (#26447)
* [Mistral] Mistral-7B-v0.1 support

* fixing names

* slightly longer test

* fixups

* not_doctested

* wrongly formatted references

* make fixuped

---------

Co-authored-by: Timothee Lacroix <t@eugen.ai>
Co-authored-by: timlacroix <t@mistral.ai>
2023-09-27 18:30:46 +02:00
NielsRogge
ace74d16bd Add Nougat (#25942)
* Add conversion script

* Add NougatImageProcessor

* Add crop margin

* More improvements

* Add docs, READMEs

* Remove print statements

* Include model_max_length

* Add NougatTokenizerFast

* Fix imports

* Improve postprocessing

* Improve image processor

* Fix image processor

* Improve normalize method

* More improvements

* More improvements

* Add processor, improve docs

* Simplify fast tokenizer

* Remove test file

* Fix docstrings

* Use NougatProcessor in conversion script

* Add is_levensthein_available

* Add tokenizer tests

* More improvements

* Use numpy instead of opencv

* Add is_cv2_available

* Fix cv2_available

* Add is_nltk_available

* Add image processor tests, improve crop_margin

* Add integration tests

* Improve integration test

* Use do_rescale instead of hacks, thanks Amy

* Remove random_padding

* Address comments

* Address more comments

* Add import

* Address more comments

* Address more comments

* Address comment

* Address comment

* Set max_model_input_sizes

* Add tests

* Add requires_backends

* Add Nougat to exotic tests

* Use to_pil_image

* Address comment regarding nltk

* Add NLTK

* Improve variable names, integration test

* Add test

* refactor, document, and test regexes

* remove named capture groups, add comments

* format

* add non-markdown fixed tokenization

* format

* correct flakyness of args parse

* add regex comments

* test functionalities for crop_image, align long axis and expected output

* add regex tests

* remove cv2 dependency

* test crop_margin equality between cv2 and python

* refactor table regexes to markdown

add newline

* change print to log, improve doc

* fix high count tables correction

* address PR comments: naming, linting, asserts

* Address comments

* Add copied from

* Update conversion script

* Update conversion script to convert both small and base versions

* Add inference example

* Add more info

* Fix style

* Add require annotators to test

* Define all keyword arguments explicitly

* Move cv2 annotator

* Add tokenizer init method

* Transfer checkpoints

* Add reference to Donut

* Address comments

* Skip test

* Remove cv2 method

* Add copied from statements

* Use cached_property

* Fix docstring

* Add file to not doctested

---------

Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
2023-09-26 07:06:04 +02:00
Maria Khalusova
546e7679e7 [docs] removed MaskFormerSwin and TimmBackbone from the table on index.md (#26347)
removed MaskFormerSwin and TimmBackbone from the table
2023-09-25 09:41:59 -04:00
LeviVasconcelos
576cd45a57 Add image to image pipeline (#25393)
* Add image to image pipeline

Add image to image pipeline

* remove swin2sr from tf auto

* make ImageToImage importable

* make style

make style

make style

make style

* remove tf support

* remove nonused imports

* fix postprocessing

* add important comments; add unit tests

* add documentation

* remove support for TF

* make fixup

* fix typehint Image.Image

* fix documentation code

* address review request; fix unittest type checking

* address review request; fix unittest type checking

* make fixup

* address reviews

* Update src/transformers/pipelines/image_to_image.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* enhance docs

* make style

* make style

* improve docetest time

* improve docetest time

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* make fixup

* undo faulty merge

* undo faulty merge

* add image-to-image to test pipeline mixin

* Update src/transformers/pipelines/image_to_image.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* improve docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-09-22 19:53:55 +03:00
Yih-Dar
c3ecf2d95d Fix doctest CI (#26324)
fix doc CI

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-09-22 08:58:30 +02:00
NielsRogge
7d6354e047 Add ViTMatte (#25843)
* First draft

* Simplify image processor

* Fix rebase

* Address comments

* Address more comments

* Address more comments

* Address more comments

* Address more comments

* Improve pad_image

* Add tests

* Update integration test

* Fix image processor tests

* Fix model tests

* Convert checkpoints

* Fix doc tests

* Remove file

* Apply suggestions

* Address comments

* Fix typing hint

* Add batch_norm_eps

* Address comments

* Fix style
2023-09-19 10:56:10 -03:00
Sanchit Gandhi
835b0a0533 [Check] Fix config docstring (#26222) 2023-09-18 19:58:01 +02:00
Julien Chaumond
bc7ce1808f moved ctrl to Salesforce/ctrl (#26183)
* moved `ctrl` to `Salesforce/ctrl`

redirects should theoretically work, but still updating those repo references for clarity

* Fixup

* Slow doc tests

* Add modeling file

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-09-18 13:52:43 +02:00
Yih-Dar
f02b915ba2 Remove utils/documentation_tests.txt (#26213)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-09-18 13:33:01 +02:00
Yih-Dar
d020a2b81b No doctest for convert_bros_to_pytorch.py (#26212)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-09-18 13:31:59 +02:00
NielsRogge
5469c18762 [BLIP-2] Improve conversion script (#24854)
* Improve conversion script

* Add int8 code example

* Update tip

* Fix code

* Fix code snippet

* Add nucleus sampling

* More improvements

* Address comments

* Address comments
2023-09-14 19:42:20 +01:00
Jinho Park
17fdd35481 Add BROS (#23190)
* add Bros boilerplate

* copy and pasted modeling_bros.py from official Bros repo

* update copyright of bros files

* copy tokenization_bros.py from official repo and update import path

* copy tokenization_bros_fast.py from official repo and update import path

* copy configuration_bros.py from official repo and update import path

* remove trailing period in copyright line

* copy and paste bros/__init__.py from official repo

* save formatting

* remove unused unnecessary pe_type argument - using only crel type

* resolve import issue

* remove unused model classes

* remove unnecessary tests

* remove unused classes

* fix original code's bug - layer_module's argument order

* clean up modeling auto

* add bbox to prepare_config_and_inputs

* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)

* remove decoder test, update create_and_check* input arguemnts

* add missing variable to model tests

* do make fixup

* update bros.mdx

* add boilerate plate for no_head inference test

* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)

* add prepare_bros_batch_inputs function

* update modeling_common to add bbox inputs in Bros Model Test

* remove unnecessary model inference

* add test case

* add model_doc

* add test case for token_classification

* apply fixup

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* - update class name

* - add BrosSpadeOutput
- update BrosConfig arguments

* add boilerate plate for no_head inference test

* add prepare_bros_batch_inputs function

* add test case

* add test case for token_classification

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* apply masking on the fly

* add BrosSpadeForTokenLinking

* update class name
put docstring to the beginning of the file

* separate the logits calculation logic and loss calculation logic

* update logic for loss calculation so that logits shape doesn't change
when return

* update typo

* update prepare_config_and_inputs

* update dummy node initialization

* update last_hidden_states getting logic to consider when return_dict is False

* update box first token mask param

* bugfix: remove random attention mask generation

* update keys to ignore on load missing

* run make style and quality

* apply make style and quality of other codes

* update box_first_token_mask to bool type

* update index.md

* apply make style and quality

* apply make fix-copies

* pass check_repo

* update bros model doc

* docstring bugfix fix

* add checkpoint for doc, tokenizer for doc

* Update README.md

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update bros.md

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply suggestions from code review

* apply suggestions from code review

* revert test_processor_markuplm.py

* Update test_processor_markuplm.py

* apply suggestions from code review

* apply suggestions from code review

* apply suggestions from code review

* update BrosSpadeELForTokenClassification head name to entity linker

* add doc string for config params

* update class, var names to more explicit and apply suggestions from code review

* remove unnecessary keys to ignore

* update relation extractor to be initialized with config

* add bros processor

* apply make style and quality

* update bros.md

* remove bros tokenizer, add bros processor that wraps bert tokenizer

* revert change

* apply make fix-copies

* update processor code, update itc -> initial token, stc -> subsequent token

* add type hint

* remove unnecessary condition branches in embedding forward

* fix auto tokenizer fail

* update docstring for each classes

* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass

* update bros docs

* apply suggestions from code review : update Bros -> BROS in bros.md

* 1. box prefix var -> bbox
2. update variable names to be more explicit

* replace einsum with torch matmul

* apply style and quality

* remove unused argument

* remove unused arguments

* update docstrings

* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations

* revert einsum update

* update bros processor

* apply suggestions from code review

* add conversion script for bros

* Apply suggestions from code review

* fix readme

* apply fix-copies

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-14 18:02:37 +01:00
Vaibhav Srivastav
32ec7345f2 Update spectrogram and waveform model mapping for TTS/A pipeline (#26114)
update names mapping for spectrogram and waveform models
2023-09-13 09:05:11 -04:00
Arthur
9cccb3a838 [Persimmon] Add support for persimmon (#26042)
* intiial commit

* updates

* nits

* update conversion script

* update conversion script

* use path to load

* add tips etc

* some modeling logic

* modeling update

* more nits

* nits

* normal layer norm

* update config and doc

* nits

* update doc remove unused

* update

* fix inits and stuff

* fixup

* revert wrong changes

* updates

* more nits

* add default config values to the configuration file

* fixup happy

* update

* 2 tests left

* update readmes

* more nits

* slow test and more documentation

* update readme

* fix licences

* styling

* use fast if possible when saving tokenizer

* remove todo

* remove tokenization tests

* small last nits

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* nits to skip the timout doctest

* fix integration test

* fix test

* update eos token

* update to allow fast tokenization

* styling

* fix codeLlama as well for the update post processor

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more copied from statements

* update

* doc passes doctest

* remove `# final layer norm?`

* change docstring prompot

* update

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't doctest the conversion script as it requires more packages

* don't init a model in the config

* oups

* fix doctest

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-09-12 11:33:27 +02:00
Matt
300d6a4a62 Remove Falcon from undocumented list (#26008)
Remove falcon from undocumented list
2023-09-06 15:49:04 +01:00