a57274466f7f72efaa2662d1738cdaf28ae8071f
269 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
65e940208c |
Samhq model addition (#35147)
* added the configuartion for sam_hq * added the modeelling for sam_hq * added the sam hq mask decoder with hq features * added the code for the samhq * added the code for the samhq * added the code for the samhq * Delete src/transformers/models/sam_hq/modelling_sam_hq.py * added the code for the samhq * added the code for the samhq * added the chnages for the modeelling * added the code for sam hq for image processing * added code for the sam hq model * added the required changes * added the changes * added the key mappings for the sam hq * adding the working code of samhq * added the required files * adding the pt object * added the push to hub account * added the args for the sam maks decoder * added the args for the sam hq vision config * aded the some more documentation * removed the unecessary spaces * all required chnages * removed the image processor * added the required file * added the changes for the checkcopies * added the code for modular file * added the changes for the __init file * added the code for the interm embeds * added the code for sam hq * added the changes for modular file * added the test file * added the changes required * added the changes required * added the code for the * added the cl errors * added the changes * added the required changes * added the some code * added the code for the removing image processor * added the test dimensins * added the code for the removing extra used variables * added the code for modeluar file hf_mlp for a better name * removed abbrevaation in core functionality * removed abbrevaation in core functionality * .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory * added the code which is after make fixup * added some test for the intermediate embeddings test * added the code for the torch support in sam hq * added the code for the updated modular file * added the changes for documentations as mentioned * removed the heading * add the changes for the code * first mentioned issue resolved * added the changes code to processor * added the easy loading to init file * added the changes to code * added the code to changes * added the code to work * added the code for sam hq * added the code for sam hq * added the code for the point pad value * added the small test for the image embeddings and intermediate embedding * added the code * added the code * added the code for the tests * added the code * added ythe code for the processor file * added the code * added the code * added the code * added the code * added the code * added the code for tests and some checks * added some code * added the code * added the code * added some code * added some code * added the changes for required * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added some changes * added some changes * removed spaces and quality checks * added some code * added some code * added some code * added code quality checks * added the checks for quality checks * addded some code which fixes test_inference_mask_generation_no_point * added code for the test_inference_mask_generation_one_point_one_bb * added code for the test_inference_mask_generation_one_point_one_bb_zero * added code for the test_inference_mask_generation_one_box * added some code in modelling for testing * added some code which sort maks with high score * added some code * added some code * added some code for the move KEYS_TO_MODIFY_MAPPING * added some code for the unsqueeze removal * added some code for the unsqueeze removal * added some code * added some code * add some code * added some code * added some code * added some testign values changed * added changes to code in sam hq for readbility purpose * added pre commit checks * added the fix samvisionmodel for compatibilty * added the changes made on sam by cyyever * fixed the tests for samhq * added some the code * added some code related to init file issue during merge conflicts * remobved the merge conflicts * added changes mentioned by aruther and mobap * added changes mentioned by aruther and mobap * solving quality checks * added the changes for input clearly * added the changes * added changes in mask generation file rgearding model inputs and sam hq quargs in processor file * added changes in processor file * added the Setup -> setupclass conversion * added the code mentioned for processor * added changes for the code * added some code * added some code * added some code --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> |
||
|
|
a245011252 |
Add InternVL (2.5 MPO) (#35968)
* initial commit * add convert internvl * add first end-to-end working internvl * nit prompt and image proc * add working chat template * add conversion llama-based models * add tests * pass all tests * fix isort * fix modular after main merge * add video processing for internvl * add support for interlaced images and videos * Remove processing and config from modular, add more tests * add llama model tests * Modify processor for compatibility with refactored got ocr image processor * add comments in processor * Add docs and nits * change video processing to use custom sample_indices_fn * rebase and fix tests * add processor tests * Add changes Raushan review * Use the new attention interface for the vision model * nits * add support for custom video_load_backend * remove mention to InternVLTokenizer * refactor vision model to simplify logic * refactor processor for better readibility * fix copies * fix require av processor test * refactor internVL vision * Update processor and fix processing tests * fix docstring * update convert_weights for internvl3 * change image processor to fast by default * remove do_center_crop=True in convert_weights * force use_cache to True * push_to_hub before reloading * fix internVLVision for larger models * update convert weight for qk norm * fix convert_weights * fix eos_token_id in convert * update docs and integration tests * make modifs after review * fix wrong k_norm and reduce modular * change image_token_index to image_token_id * change checkpoint to OpenGVLab org * last nits * explicitely del self.num_key_value_groups * add extra special tokens |
||
|
|
a2ef3cf537 |
Add Janus model (#36053)
* Iterative generation using input embeds * Add Janus model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Skip test for these models * Add Janus Model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Refactor to Text config * ✨ Added generate function * Saving intermediate convert file. Still need to read configs from the hub and convert them to our format. * Adding version that reads from the JSON files. Still have to tweak some parameters manually. * relative imports * Initial tests * Refactor image processor * Seemingly working version of the conversion script, will need to test further. * Adding command message * Fixing conflicting JanusTextConfig class * Incorporating some of the discussed changes. * Small fix to create dir. * Removing system from JINJA template * Adding draft processor tests * style fixes * Minor fixes and enhancement * added generation config * Initial tests * Small modifications, tests are now passing. * Small changes I noticed while reading code. * more fixes * Added JanusModel class * Small merge adaptations * Small merge adaptations * Image processing tests passing * More tests and fixes * Convert script updated and refactored * Tests and cleanup * make style * Postprocessing for image generation * generate refactor * fixes * - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload) - Passing tests of dispatching SDPA - Only gradient checkpointing tests are left. * Removing temporary code * Changes * Writing change to modular * Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next * Gradient checkpoint tests passing * Removing debug code * Major generate refactor 😮💨 * Temp changes for testing * Green quality CI * 2 out of 4 integration tests passing * breadcrumbs * Usage Examples * Regenerate modeling after merge * dirty code * JanusIntegrationTest are passing * breadcrumbs * happy CI * fixes * Changing template * nits * Text generation logits matching original codebase at 100% precision * Remove ./tmp from git tracking * Remove ./tmp from git tracking * Checkpointing changes after reviewing * Fixing code in docstrings * CHanging comments and small bug in convert file * Fixing bug in image_token_id for 7B version * Removing line that was added by both of us * Pushing changes after discussion. Only one left is to change the key mapping for convert file. * Updating module file * New convert file using dict. Tested that it is equivalent to the old one by: - comparing keys in a script - comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test. * revert changes * mistake * consistency change for CI * make style * doc fixes * more fixes * experimenting with masking out pad token * checkpoint * Batched generation with multi-images working for 1B models. Will test 7B next. * Device fix. * Writing changes to modular, previous ones were written to modeling just for quick testing. * Using passed processor attention mask (only in modeling for now) * Matching performance done in the non-standard way * Working version of batched generation. Will change how some args are passed to make it more similar to language case * More compliant version of the code * Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position` * Updating modular file, making masked filling with paddings more efficient * Slightly more efficient version * Modifying JanusVisionModel to be a wrapper * Fixing test to comply with new names * Modular overhaul * More refactoring * - Changing JanusVisionModel back - Changing forward pass - Adding boi token to the comparison * - Removing whole context model_ids - Using inherited implementation of prepare_inputs_for_generation * Moving the way boi token is passed to the model * Fixing sdpa test * Minor changes * testing changes * Minor fix * - Adding postprocessing test - checking values of generated image on integration test * changes * Removing pooled attention vision module, fixing convert script as a consequence * More changes * Fixes * Draft after merge * Bug fixes * More bug fix * Fixing docs * Nits * Refactor return dict * Moving image post processing test to main processor post process * Passing guidance_scale as kwarg * make style * 🔥 refactor * make style * Update and green CI * Nits and tests update * up * Added MID block * fix * Dead code * update testcase * update * model_id change * init_weight changes --------- Co-authored-by: hsilva664 <metallic-silver@hotmail.com> |
||
|
|
a91020aed0 |
Add TimesFM Time Series Forecasting Model (#34082)
* initial documentation * rename mask to attention_mask * smaller tests * fixup * fix copies * move to time series section * sort docs * isort fix * batch_size is not a configuration * rename to TimesFMModelForPrediction * initial script * add check_outputs * remove dropout_rate * works with torch.Tensor inputs * rename script * fix docstrings * fix freq when window_size is given * add loss * fix _quantile_loss * formatting * fix isort * add weight init * add support for sdpa and flash_attention_2 * fixes for flash_attention * formatting * remove flash_attention * fix tests * fix file name * fix quantile loss * added initial TimesFMModelIntegrationTests * fix formatting * fix import order * fix _quantile_loss * add doc for SDPA * use timesfm 2.0 * bug fix in timesfm decode function. * compare mean forecasts * refactor type hints, use CamelCase * consolidate decode func * more readable code for weight conversion * fix-copies * simpler init * renaem TimesFmMLP * use T5LayerNorm * fix tests * use initializer_range * TimesFmModel instead of TimesFmDecoder * TimesFmPositionalEmbedding takes config for its init * 2.0-500m-pytorch default configs * use TimesFmModel * fix formatting * ignore TimesFmModel for testing * fix docstring * override generate as its not needed * add doc strings * fix logging * add docstrings to output data classes * initial copy from t5 * added config and attention layers * add TimesFMPositionalEmbedding * calcuate scale_factor once * add more configs and TimesFMResidualBlock * fix input_dims * standardize code format with black * remove unneeded modules * TimesFM Model * order of imports * copy from Google official implementation * remove covariate forecasting * Adapting TimesFM to HF format * restructing in progress * adapted to HF convention * timesfm test * the model runs * fixing unit tests * fixing unit tests in progress * add post_init * do not change TimesFMOutput * fixing unit tests * all unit tests passed * remove timesfm_layers * add intermediate_size and initialize with config * initial documentation * rename mask to attention_mask * smaller tests * fixup * fix copies * move to time series section * sort docs * isort fix * batch_size is not a configuration * rename to TimesFMModelForPrediction * initial script * add check_outputs * remove dropout_rate * works with torch.Tensor inputs * rename script * fix docstrings * fix freq when window_size is given * add loss * fix _quantile_loss * formatting * fix isort * add weight init * add support for sdpa and flash_attention_2 * fixes for flash_attention * formatting * remove flash_attention * fix tests * fix file name * fix quantile loss * added initial TimesFMModelIntegrationTests * fix formatting * fix import order * fix _quantile_loss * add doc for SDPA * use timesfm 2.0 * bug fix in timesfm decode function. * compare mean forecasts * refactor type hints, use CamelCase * consolidate decode func * more readable code for weight conversion * fix-copies * simpler init * renaem TimesFmMLP * use T5LayerNorm * fix tests * use initializer_range * TimesFmModel instead of TimesFmDecoder * TimesFmPositionalEmbedding takes config for its init * 2.0-500m-pytorch default configs * use TimesFmModel * fix formatting * ignore TimesFmModel for testing * fix docstring * override generate as its not needed * add doc strings * fix logging * add docstrings to output data classes * add _CHECKPOINT_FOR_DOC * fix comments * Revert "fix comments" This reverts commit 8deeb3e191b3671bc1d74dbfe77b736a066c3d34. * add _prepare_4d_attention_mask * we do not have generative model classes * use Cache * return past_key_values * modules initialized with config only * update year * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add layer_idx to cache * modular timesfm * fix test * unwrap sequential class * fix toctree * remove TimesFmOnnxConfig * fix modular * remove TimesFmStackedDecoder * split qkv layer into individual layers * rename projection layers * use ALL_ATTENTION_FUNCTIONS * is_causal is True * rename config * does not support flash_attn_2 * formatting * fix typo in docsstring * rename inputs * add time series mapping * Update src/transformers/models/olmo2/modeling_olmo2.py * Update src/transformers/models/moonshine/modeling_moonshine.py * use updated arguments * fix class name * add MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING * isort * consolidate _preprocess into forward * fix a typo * fix a typo * fix toc * fix modular * remove aaserts * use self.config._attn_implementation * move to _postprocess_output * remove timesfm_get_large_negative_number * use view unstead of multiple unsqueeze * make helpers static methods of the Model * use to_tuple * use to_tuple if not return_dict * remove unused intitialization block as its incorporated in nn.Linear * remove unused num_key_value_groups * use the same convention as the masking method * update modular * do not use unsqueeze * use view instead of unsqueeze * use buffer for inv_timescales * formatting * modular conversion * remove unneeded intialization * add missing docstrings * remove cache * use simple_eager_attention_forward * support tp_plan * support for flex and flash attention masks * Revert "support for flex and flash attention masks" This reverts commit def36c4fcf31599b3f4937c9334b7da1a20132c3. * fix device * fix tests on gpu * remove unsued large model test * removed unneeded comments * add example usage * fix style * add import * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * inherit from LlamaRMSNorm * use can_return_tuple decorator * remvoe return_dict * fix year * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * pretrained does not inherit from GenerationMixin * use model for integration test --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Rajat Sen <rsen91@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> |
||
|
|
4b8c6d4cf8 |
Add Qwen2.5-Omni (#36752)
* Add qwen2.5-omni * Remove einops dependency * Add torchdiffeq dependency * Sort init * Add torchdiffeq to extras['diffeq'] * Fix repo consistency * use cached_file * del odeint * renew pytest * format * Remove torchdiffeq * format * fixed batch infer bug * Change positional_embedding to parameter * Change default speaker * Config revision * Use modular & code clean * code clean * decouple padding with model & code cleaning * sort init * fix * fix * Second code review * fix * fix * rename vars to full name + some comments * update pytest * Code clean & fix * fix * style * more clean up * fixup * smaller vision model in tests * fix processor test * deflake a bit the tests (still flaky though) * de-flake tests finally + add generation mixin * final nits i hope * make sure processor tests are complete * replace with Qwen2_5OmniForConditionalGeneration * fix tests after updating ckpt * fix typos when cleaning, also we can't change ckpt * fixup * images and videos kwargs for processor * thinker and talker loadable from hub ckpt * address comments and update tests after rebase * fixup * skip for now * fixup * fixup * remove torch dependency in processors --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con> Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com> Co-authored-by: raushan <raushan@huggingface.co> |
||
|
|
54a123f068 |
Simplify soft dependencies and update the dummy-creation process (#36827)
* Reverse dependency map shouldn't be created when test_all is set * [test_all] Remove dummies * Modular fixes * Update utils/check_repo.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * [test_all] Better docs * [test_all] Update src/transformers/commands/chat.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * [test_all] Remove deprecated AdaptiveEmbeddings from the tests * [test_all] Doc builder * [test_all] is_dummy * [test_all] Import utils * [test_all] Doc building should not require all deps --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> |
||
|
|
2515a5a290 |
Expose blip2qformer (#37254)
* Expose blip2qformer * Add missing args to blip2 config |
||
|
|
25b7f27234 |
Add llama4 (#37307)
* remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <|image|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <|eom|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe" This reverts commit ccda19f050867dd42ea143c5de60f3dec81375f0, reversing changes made to a515579aed8c0fe9bf529b6c40446a289406d5d6. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> |
||
|
|
ad3d157188 |
[RoPE] abstract dynamic RoPE update under a decorator ✨ (#37249)
* dynamic rope decorator * longrope; shorter fwd pass * propper docstring * make fixup |
||
|
|
4303d88c09 |
Add Phi4 multimodal (#36939)
* raw start * update * update * add to imports * update * up * simplify configs * clean configs * style * typos * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * fix * up * up * up * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * up * up * up * Update feature_extraction_phi4_multimodal.py * up * up * up * up * up * simplify configs * typo * cut code * typo * typo * typo * re * typo * up * up * up * add tests * fix * fix * Update test_modeling_phi4_multimodal.py * up * Update test_modeling_phi4_multimodal.py * doc * fix * up * up * up * up * up * up * simplify * up * simplify * config docstrings * cleanup * clean * typo * typo * fix * Update phi4_multimodal.md * fix * fix * Update test_modeling_phi4_multimodal.py * update * simplify reshapes and permutes * up * simplify special tokens * simplify processor a lot * Update processing_phi4_multimodal.py * Update processing_phi4_multimodal.py * switch to fast processor * image processor * Update image_processing_phi4_multimodal_fast.py * add lora extraction to converter * Update convert_phi4_multimodal_weights_to_hf.py * Update __init__.py * add AudioInput type in audio_utils * rewrite feature_extraction: support torch batched FFT * input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values * test update * not mono channel warning update * remove auto maps from processor * kargs dispatch in processor * simplify kwargs dispatch * simplify merging * remove default sampling rate * style * Update test_modeling_phi4_multimodal.py * update doc * doc * torch only feature extractor * make fake tokens adjustable * Update feature_extraction_phi4_multimodal.py * fix * Update processing_phi4_multimodal.py * simplify mask * last touch * fix copies * style * Update audio_utils.py * style * Update feature_extraction_phi4_multimodal.py * Update __init__.py * docstrings * copies * fix all checks * back to fix-copies * trigger CIs * Update feature_extraction_phi4_multimodal.py * improve tests with multimodal inputs * trigger CIs --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> |
||
|
|
487dab1b2b |
Shieldgemma2 (#36678)
* single commit * correct config * fixup * dummy pt * Use ShieldGemma2Config in conversion script * Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py * Adding shieldgemma2 to models.__init__.py * Adding ShieldGemma2 to main __init__.py * Update shieldgemma2.md * Update shieldgemma2.md * Adding tests. Addressing review feedback. * Minor docs update * Fixing code quality feedback from CI * Fixing empty messages bug reported by ghunkins --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Ren Pang <ain-soph@live.com> |
||
|
|
a957b7911a |
Add SigLIP 2 (#36323)
* Docs * Inits * Auto classes * Add siglip base * Add base tests * Fix Siglip V1 for fix res version * Add image processor * Update conversion * Experimenting with vectorized embeddings * Fixup * Add modular Siglip2Processor * Add modular configuration * Rename num patches * Correct image and text features merging * Working conversion script * Refactoring conversion script * Remove unused code in conversion script * Shorten dict a bit * Refactoring conversion * Done conversion refactoring * Fixup * Modular siglip2 * Make model exportable and compilable without graph breaks * Remove position_ids from image_processor * REmove position ids from modeling file * Update modular * Type hint * Fixup * Set defaults to processor * Add integration test * Revert spatial shapes back to tensor * Change order * Fix most of the tests * Fix docstring * Remove interpolate_pos_encoding arg (not needed) * Update docs * Standardize processing * Fix attention_mask in vision head * Siglip v1: remove double transpose in FA2 * Update modular file * Update FA2 test * Update expected logits * Fix interpolation for siglip2 image processor * Skip init test * Skip dispatch on flash test * Fix modeling tests * Fixup * Add dummy objects * Fix some docstrings * Add siglip2 in index.md * Fix consistency * Add docs * Remove size and data format * Add image processor tests * Fix * Add fast image processor * Fix style * Fix * Docs * Set lowercase for tokenizer * Adjust head size for Siglip v1 * Update siglip2 for consistency with siglip1 * Update siglip2 conversion * Update pipeline * Update checkpoints in tests * Update checkpoint name * Fix pooling for image classification model * Fix FA2 test * Update processor * Fix check repo * Update docs * Fix typos * Fix docstring for fast image processor * Add siglip2 to FA2 docs * Fix fast ip tests * Fix constitency * Fix tokenizer class for siglip v1 * Fix missing header * Refactor scaling for clip, siglip, siglip2 * Remove unused imports * Make fast IP default for siglip2 * Update docs * Update checkpoints * Update modular * Update paper link * Fixup * Fix name in toctree * Fix test |
||
|
|
4397dfcb71 |
SmolVLM2 (#36126)
* smolvlm init * updates * fixing bugs * minimal run, no checks * minimal run, no checks * passing first check + adding url support * updating video dataloading logic * fixing image logic * trying modular, but fails * modular is working, changing processor to match PR comments and general transformers logic * fixing kwargs * offloading video loading logic to image_util * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * update * add idefics3-based tests * add keyword to all * add PreTrainedModel * updateing video loading logic * working inference * updates for PR comments * updates for PR comments * moving SmolVLMPretrainedModel higher to fix import error * CI test pass * CI test pass * removing lambda * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * processor tests * add example in docs * typo * fix copies * skip compile tests - sdpa for VisionTransformer * fix init * raise import error for num2words * update doc for FA2 * more doc fix * CI * updates for PR comments * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Joshua Lochner <admin@xenova.com> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc * adding smolvlm to VQA models * removing vqa auto class * Update src/transformers/models/smolvlm/processing_smolvlm.py Co-authored-by: Joshua Lochner <admin@xenova.com> * removing smolvlmvisiontransformer from index.md * my bad, video processing had typos * fixing docs * renaming params in SmolVLMModel.inputs_merger * removing un-needed dtype/device in model forward * ruff for CI * update docs * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * return cache position * return cache position * return cache also in modular * needed to run modular again * fix training tests * push vectorized inputs merger * format * format * reduce number of mappings * addressing PR comments * happy CI, happy me :) * skip non-nested images * adjust integration test for smaller GPUs * format * fix kwargs in chat template apply * skip this for now --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Joshua Lochner <admin@xenova.com> |
||
|
|
f3f6c86582 |
add qwen2.5vl (#35569)
* add qwen2.5vl * fix * pass check table * add modular file * fix style * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * padd copy check * use modular * fix * fix * fix * update flashatt2&sdpa support_list * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update config * update * fix hf path * rename Qwen2_5_VLVideosKwargs * fix * fix * update * excuted modular * rollback init * fix * formated * simpler init * fix * fix * fix * fix * fix * update docs * fix * fix * update Qwen2VLRotaryEmbedding for yarn * fix --------- Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: gewenbin0992 <gewenbin292@163.com> Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com> |
||
|
|
90b46e983f |
Remove old benchmark code (#35730)
* remove traces of the old deprecated benchmarks * also remove old tf benchmark example, which uses deleted code * run doc builder |
||
|
|
94ae9a8da1 |
OwlViT/Owlv2 post processing standardization (#34929)
* Refactor owlvit post_process_object_detection + add text_labels * Fix copies in grounding dino * Sync with Owlv2 postprocessing * Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection * Add test cases * Move text_labels to processors only * [run-slow] owlvit owlv2 * [run-slow] owlvit, owlv2 * Update snippets * Update docs structure * Update deprecated objects for check_repo * Update docstring for post processing of image guided object detection |
||
|
|
80dbbd103c |
🧹 remove generate-related objects and methods scheduled for removal in v4.48 (#35677)
* remove things scheduled for removal * make fixup |
||
|
|
52e1f87c7d |
[WIP] Emu3: add model (#33770)
* model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
|
|
8490d3159c |
Add ViTPose (#30530)
* First draft * Make fixup * Make forward pass worké * Improve code * More improvements * More improvements * Make predictions match * More improvements * Improve image processor * Fix model tests * Add classic decoder * Convert classic decoder * Verify image processor * Fix classic decoder logits * Clean up * Add post_process_pose_estimation * Improve post_process_pose_estimation * Use AutoBackbone * Add support for MoE models * Fix tests, improve num_experts% * Improve variable names * Make fixup * More improvements * Improve post_process_pose_estimation * Compute centers and scales * Improve postprocessing * More improvements * Fix ViTPoseBackbone tests * Add docstrings, fix image processor tests * Update index * Use is_cv2_available * Add model to toctree * Add cv2 to doc tests * Remove script * Improve conversion script * Add coco_to_pascal_voc * Add box_to_center_and_scale to image_transforms * Update tests * Add integration test * Fix merge * Address comments * Replace numpy by pytorch, improve docstrings * Remove get_input_embeddings * Address comments * Move coco_to_pascal_voc * Address comment * Fix style * Address comments * Fix test * Address comment * Remove udp * Remove comment * [WIP] need to check if the numpy function is same as cv * add scipy affine_transform * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * refactor convert * add output_shape * add atol 5e-2 * Use hf_hub_download in conversion script * make box_to_center more applicable * skipt test_get_set_embedding * fix to accept array and fix CI * add co-contributor * make it to tensor type output * add torch * change to torch tensor * add more test * minor change * CI test change * import torch should be above ImageProcessor * make style * try not use torch in def * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fix * fix * add caution * make more detail about dataset_index * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * add docs * Update docs/source/en/model_doc/vitpose.md * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Revert "Update src/transformers/__init__.py" This reverts commit 7ffa504450bb9dbccf9c7ea668441b98a1939d5c. * change name * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vitpose/test_modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * move vitpose only function to image_processor * raise valueerror when using timm backbone * use out_indices * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove camel-case of def flip_back * rename vitposeEstimatorOutput * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix confused camelcase of MLP * remove in-place logic * clear scale description * make consistent batch format * docs update * formatting docstring * add batch tests * test docs change * Update src/transformers/models/vitpose/image_processing_vitpose.py * Update src/transformers/models/vitpose/configuration_vitpose.py * chagne ViT to Vit * change to enable MoE * make fix-copies * Update docs/source/en/model_doc/vitpose.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * extract udp * add more described docs * simple fix * change to accept target_size * make style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change to `verify_backbone_config_arguments` * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove unnecessary copy * make config immutable * enable gradient checkpointing * update inappropriate docstring * linting docs * split function for visibility * make style * check isinstances * change to acceptable use_pretrained_backbone * make style * remove copy in docs * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * simple fix + make style * change input config of activation function to string * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * tmp docs * delete index.md * make fix-copies * simple fix * change conversion to sam2/mllama style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * refactor convert * add supervision * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * remove reduntant def * seperate code block for visualization * add validation for num_moe * final commit * add labels * [run-slow] vitpose, vitpose_backbone * Update src/transformers/models/vitpose/convert_vitpose_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * enable all conversion * final commit * [run-slow] vitpose, vitpose_backbone * ruff check --fix * [run-slow] vitpose, vitpose_backbone * rename split module * [run-slow] vitpose, vitpose_backbone * fix pos_embed * Simplify init * Revert "fix pos_embed" This reverts commit 2c56a4806e30bc9b5753b142fa04b913306c54ff. * refactor single loop * allow flag to enable custom model * efficiency of MoE to not use unused experts * make style * Fix range -> arange to avoid warning * Revert MOE router, a new one does not work * Fix postprocessing a bit (labels) * Fix type hint * Fix docs snippets * Fix links to checkpoints * Fix checkpoints in tests * Fix test * Add image to docs --------- Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: sangbumchoi <danielsejong55@gmail.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> |
||
|
|
7176e06b52 |
Add TextNet (#34979)
* WIP * Add config and modeling for Fast model * Refactor modeling and add tests * More changes * WIP * Add tests * Add conversion script * Add conversion scripts, integration tests, image processor * Fix style and copies * Add fast model to init * Add fast model in docs and other places * Fix import of cv2 * Rename image processing method * Fix build * Fix Build * fix style and fix copies * Fix build * Fix build * Fix Build * Clean up docstrings * Fix Build * Fix Build * Fix Build * Fix build * Add test for image_processing_fast and add documentation tests * some refactorings * Fix failing tests * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Introduce TextNet * Fix failures * Refactor textnet model * Fix failures * Add cv2 to setup * Fix failures * Fix failures * Add CV2 dependency * Fix bugs * Fix build issue * Fix failures * Remove textnet from modeling fast * Fix build and other things * Fix build * some cleanups * some cleanups * Some more cleanups * Fix build * Incorporate PR feedbacks * More cleanup * More cleanup * More cleanup * Fix build * Remove all the references of fast model * More cleanup * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix Build * Fix build * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Fix style * Fix build * Incorporate PR feedbacks * Fix image processing mean and std * Incorporate PR feedbacks * fix build failure * Add assertion to image processor * Incorporate PR feedbacks * Incorporate PR feedbacks * fix style failures * fix build * Fix Imageclassification's linear layer, also introduce TextNetImageProcessor * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix build * Incorporate PR feedbacks * Remove some script * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix image processing in textnet * Incorporate PR Feedbacks * Fix CI failures * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Add textnet to readme * Improve readability * Incorporate PR feedbacks * fix code style * fix key error and convert working * tvlt shouldn't be here * fix test modeling test * Fix tests, make fixup * Make fixup * Make fixup * Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_image_processing_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * space typo Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/configuration_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * make conv layer kernel sizes and strides default to None * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix keyword bug * add batch init and make fixup * Make fixup * Update integration test * Add figure * Update textnet.md * add testing and fix errors (classification, imgprocess) * fix error check * make fixup * make fixup * revert to original docstring * add make style * remove conflict for now * Update modeling_auto.py got a confusion in `timm_wrapper` - was giving some conflicts * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * add changes * Update textnet.md * add doc * add authors hf ckpt + rename * add feedback: classifier/docs --------- Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com> Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co> Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> |
||
|
|
b2f2977533 |
Applies the rest of the init refactor except to modular files (#35238)
* [test_all] Applies the rest of the init refactor except to modular files * Revert modular that doesn't work * [test_all] TFGPT2Tokenizer |
||
|
|
6e0515e99c |
Add DINOv2 with registers (#35348)
* added changes from 32905 * fixed mistakes caused by select all paste * rename diff_dinov2... * ran tests * Fix modular * Fix tests * Use new init * Simplify drop path * Convert all checkpoints * Add figure and summary * Update paths * Update docs * Update docs * Update toctree * Update docs --------- Co-authored-by: BernardZach <bernardzach00@gmail.com> Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com> |
||
|
|
9ad4c93536 |
Add Aria (#34157)
* Add Aria --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
|
|
21d5025826 |
Attn implementation for composite models (#32238)
* first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
9ba021ea75 |
Moshi integration (#33624)
* clean mimi commit * some nits suggestions from Arthur * make fixup * first moshi WIP * converting weights working + configuration + generation configuration * finalize converting script - still missing tokenizer and FE and processor * fix saving model w/o default config * working generation * use GenerationMixin instead of inheriting * add delay pattern mask * fix right order: moshi codes then user codes * unconditional inputs + generation config * get rid of MoshiGenerationConfig * blank user inputs * update convert script:fix conversion, add tokenizer, feature extractor and bf16 * add and correct Auto classes * update modeling code, configuration and tests * make fixup * fix some copies * WIP: add integration tests * add dummy objects * propose better readiblity and code organisation * update tokenization tests * update docstrigns, eval and modeling * add .md * make fixup * add MoshiForConditionalGeneration to ignore Auto * revert mimi changes * re * further fix * Update moshi.md * correct md formating * move prepare causal mask to class * fix copies * fix depth decoder causal * fix and correct some tests * make style and update .md * correct config checkpoitn * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make style * Update src/transformers/models/moshi/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * change firm in copyrights * udpate config with nested dict * replace einsum * make style * change split to True * add back splt=False * remove tests in convert * Update tests/models/moshi/test_modeling_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add default config repo + add model to FA2 docstrings * remove logits float * fix some tokenization tests and ignore some others * make style tokenization tests * update modeling with sliding window + update modeling tests * [run-slow] moshi * remove prepare for generation frol CausalLM * isort * remove copied from * ignore offload tests * update causal mask and prepare 4D mask aligned with recent changes * further test refine + add back prepare_inputs_for_generation for depth decoder * correct conditional use of prepare mask * update slow integration tests * fix multi-device forward * remove previous solution to device_map * save_load is flaky * fix generate multi-devices * fix device * move tensor to int --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> |
||
|
|
f052e94bcc |
Fix flax failures (#33912)
* Few fixes here and there * Remove typos * Remove typos |
||
|
|
e2001c3413 |
Add auto model for image-text-to-text (#32472)
* Add Auto model for image-text-to-text * Remove donut from processing auto, add chameleon ti image text to text models * add qwen2_vl and llava_onevision * add pixtral to auto model for image-text-to-text * add mllama and idefics3 * remove models in IGNORE_NON_AUTO_CONFIGURED * add AutoModelForImageTextToText to tests and doc |
||
|
|
f2c388e3f9 |
Add Idefics 3! (#32473)
* Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
19d58d31f1 |
Add MLLama (#33703)
* current changes * nit * Add cross_attenttion_mask to processor * multi-image fixed * Add cross_attenttion_mask to processor * cross attn works in all cases * WIP refactoring function for image processor * WIP refactoring image processor functions * Refactor preprocess to use global loops instead of list nested list comps * Docstrings * Add channels unification * fix dtype issues * Update docsrings and format * Consistent max_image_tiles * current script * updates * Add convert to rgb * Add image processor tests * updates! * update * god damn it I am dumb sometimes * Precompute aspect ratios * now this works, full match * fix 😉 * nits * style * fix model and conversion * nit * nit * kinda works * hack for sdpa non-contiguous bias * nits here and there * latest c hanges * merge? * run forward * Add aspect_ratio_mask * vision attention mask * update script and config variable names * nit * nits * be able to load * style * nits * there * nits * make forward run * small update * enable generation multi-turn * nit * nit * Clean up a bit for errors and typos * A bit more constant fixes * 90B keys and shapes match * Fix for 11B model * Fixup, remove debug part * Docs * Make max_aspect_ratio_id to be minimal * Update image processing code to match new implementation * Adjust conversion for final checkpoint state * Change dim in repeat_interleave (accordig to meta code) * tmp fix for num_tiles * Fix for conversion (gate<->up, q/k_proj rope permute) * nits * codestyle * Vision encoder fixes * pass cross attn mask further * Refactor aspect ratio mask * Disable text-only generation * Fix cross attention layers order, remove q/k norm rotation for cross atention layers * Refactor gated position embeddings * fix bugs but needs test with new weights * rope scaling should be llama3 * Fix rope scaling name * Remove debug for linear layer * fix copies * Make mask prepare private func * Remove linear patch embed * Make precomputed embeddings as nn.Embedding module * MllamaPrecomputedAspectRatioEmbedding with config init * Remove unused self.output_dim * nit, intermediate layers * Rename ln and pos_embed * vision_chunk_size -> image_size * return_intermediate -> intermediate_layers_indices * vision_input_dim -> hidden_size * Fix copied from statements * fix most tests * Fix more copied from * layer_id->layer_idx * Comment * Fix tests for processor * Copied from for _prepare_4d_causal_attention_mask_with_cache_position * Style fix * Add MllamaForCausalLM * WIP fixing tests * Remove duplicated layers * Remove dummy file * Fix style * Fix consistency * Fix some TODOs * fix language_model instantiation, add docstring * Move docstring, remove todos for precomputed embeds (we cannot init them properly) * Add initial docstrings * Fix * fix some tests * lets skip these * nits, remove print, style * Add one more copied from * Improve test message * Make validate func private * Fix dummy objects * Refactor `data_format` a bit + add comment * typos/nits Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix dummy objects and imports * Add chat template config json * remove num_kv_heads from vision attention * fix * move some commits and add more tests * fix test * Remove `update_key_name` from modeling utils * remove num-kv-heads again * some prelimiary docs * Update chat template + tests * nit, conversion script max_num_tiles from params * Fix warning for text-only generation * Update conversion script for instruct models * Update chat template in converstion + test * add tests for CausalLM model * model_max_length, avoid null chat_template * Refactor conversion script * Fix forward * Fix integration tests * Refactor vision config + docs * Fix default * Refactor text config * Doc fixes * Remove unused args, fix docs example * Squashed commit of the following: commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830 Author: qubvel <qubvel@gmail.com> Date: Wed Sep 18 13:39:15 2024 +0000 Move model + add output hidden states and output attentions * Fix num_channels * Add mllama text and mllama vision models * Fixing repo consistency * Style fix * Fixing repo consistency * Fixing unused config params * Fix failed tests after refactoring * hidden_activation -> hidden_act for text mlp * Remove from_pretrained from sub-configs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Reuse lambda in conversion script * Remove run.py * Update docs/source/en/model_doc/mllama.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/processing_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused LlamaTokenizerFast * Fix logging * Refactor gating * Remove cycle for collecting intermediate states * Refactor text-only check, add integration test for text-only * Revert from pretrained to configs * Fix example * Add auto `bos_token` adding in processor * Fix tips * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Enable supports_gradient_checkpointing model flag * add eager/sdpa options * don't skip attn tests and bring back GC skips (did i really remove those?) * Fix signature, but get error with None gradient * Fix output attention tests * Disable GC back * Change no split modules * Fix dropout * Style * Add Mllama to sdpa list * Add post init for vision model * Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model * if skipped, say it, don't pass * Clean vision tester config * Doc for args * Update tests/models/mllama/test_modeling_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add cross_attention_mask to test * typehint * Remove todo * Enable gradient checkpointing * Docstring * Style * Fixing and skipping some tests for new cache * Mark flaky test * Skip `test_sdpa_can_compile_dynamic` test * Fixing some offload tests * Add direct GenerationMixin inheritance * Remove unused code * Add initializer_range to vision config * update the test to make sure we show if split * fix gc? * Fix repo consistency * Undo modeling utils debug changes * Fix link * mllama -> Mllama * [mllama] -> [Mllama] * Enable compile test for CausalLM model (text-only) * Fix TextModel prefix * Update doc * Docs for forward, type hints, and vision model prefix * make sure to reset * fix init * small script refactor and styling * nit * updates! * some nits * Interpolate embeddings for 560 size and update integration tests * nit * does not suppor static cache! * update * fix * nit2 * this? * Fix conversion * Style * 4x memory improvement with image cache AFAIK * Token decorator for tests * Skip failing tests * update processor errors * fix split issues * style * weird * style * fix failing tests * update * nit fixing the whisper tests * fix path * update --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal> Co-authored-by: qubvel <qubvel@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> |
||
|
|
d6534f996b | Repo checks: check documented methods exist (#32320) | ||
|
|
7591ca5bc5 |
🚨 Add Blip2ForImageTextRetrieval (#29261)
* add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
19e6e80e10 |
support qwen2-vl (#32318)
* support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> |
||
|
|
70d5df6107 |
Generate: unify LogitsWarper and LogitsProcessor (#32626)
|
||
|
|
16ed0640be |
Add Qwen2-Audio (#32137)
* add qwen2audio * Update check_repo.py * fix style * fix test * fix style * add model size * Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * switch the attention_mask and the feature_attention_mask * add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py * fix initialization * update chat_template * fix consistency issue after copy * add docstrings to _merge_input_ids_with_audio_features * add copied from to prepare_inputs_for_generation * add more details to docs * rm comment * add init_std * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * update * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update tests * rm ignore_index * update processor * rm ffmpeg_read * Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * typo * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * fix quality * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * add official model --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
24cfcc2114 |
Chameleon: add model (#31534)
* Chameleon model integration Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com> * fix 7B, again. mask away image tokens * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove pretrained_config_map * make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file * remove tokenizer (use llama's); remove codechameleon tests * a few copied from statements and minor changes * copied from in ChameleonModel * some copies in ChameleonForCausalLM * a few more copies * VQModel moved to ChameleonModel (as opposed to being in the processor) * ChameleonProcessor ready * Fix chameleon weights convert * update conversion script * clean-up processing * update modeling a bit * update * update (throws error...) * correct conversion ready * fix tests * fix docs * docs * ve swin norm * fix device for vocab map * add normalization * update * update script with rope rotations * final fix on model conversion * add slow tests * more info in docs * fix repo consistency tests * fix repo tests * fix-copies * hope this will make CI happy * fix for 30b model * Update docs/source/en/index.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments * remove assertion in conversion script * add image processor test * not copied * port changes for qk layernorm * fix-copies * read token decorator for tests * [run-slow] chameleon * one more read-token * address some comments * qk norm changes * tests and repo check * moved rope permutations to conversion, YAY! * fix past kv check * docs * layernorm done! * let's be consistent in naming * fix slow tests * weird thing with slow CI, but let's see * once more try * remove past-kv as tuple following llama * ignore * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: jacobkahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com> Co-authored-by: Leonid Shamis <lshamis@meta.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
c1e139c2b0 |
Adding hiera (#30356)
* initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * Removed tim dependency * added HieraBlock * fixed: Model name * added tests for HieraModel, HieraBlock * fixed imports * fixed quality & copies * Fixes * Update docs/source/en/model_doc/hiera.md Fix name Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fixed formatting * Code quality & Import differences * quality and repo-consistency fix * fixed no torch error * Docstring fix * Docstring fix * doc string fix * fixed example usage * Resolved issues in modeling_hiera * Removed Hiera MAE * Added test and resolved bug * fixed doc string * First commit * Finished conversion script and model forward working * Resolved all issues * nits * Improving tests * Nits * More nits * Improving HieraForMaskedImageModeling * More improvements and nits * Fixed docstrings of outputs * More fixes * More imrpovments * Updated conversion script * Fixed docstrings * Improved tests * Fixed attentou outputs test * All tests green * Removed unnecessary file * contribution attribution * Resolved a few issues * Resolved Comments * Updated model repo id and fixed bugs * Removed loss print * Make tests green * Updated docstrings * Fix style * Fixed num_heads in config * Removed unnecessary video checkpoint related code in the conversion script * Fix style * Changed atol in conversion script * HieraConfig * Fix copies * Fixed typo * Resolved few issues * make * converted conv_nd -> nn.Module * Removed video complexities * Removed video complexities * fix style * Addressing comments * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix style * Fixed tests * Fixed typo * Fixed interpolate test * Made torch fx compatible * Made sure imageprocesor is correct * Addressed comments * Noise directly as torch * Remove unnecesary attr * Added return_dit * Update src/transformers/models/hiera/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated checkpoints * [run_slow] hiera * Fixed device mismatch * [run_slow] hiera * Fixed GPU tests * [run_slow] hiera --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com> Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
fc689d75a0 |
Add video modality for InstrucBLIP (#30182)
* squash in single commit * add docs * dummy obj * more changes in diff converter * tiny fix * make docs happy * skip test * repo consistency tests * update docstring * style * fix tests * change diff imports * [run-slow] instructblipvideo * [run-slow] instructblipvideo * fix tests and remove logit check * [run-slow] instructblipvideo |
||
|
|
673440d073 |
update ruff version (#30932)
* update ruff version * fix research projects * Empty * Fix errors --------- Co-authored-by: Lysandre <lysandre@huggingface.co> |
||
|
|
0fe44059ae |
Add recurrent gemma (#30143)
* Fork. * RecurrentGemma initial commit. * Updating __init__.py. * Minor modification to how we initialize the cache. Changing how the config specifies the architecture. * Reformat code to 4 spaces. Fixed a few typos. * Fixed the forward pass. Still unclear on the cache? * Fixed the RecurrentGemmaForCausalLM * Minor comment that we might not need attention_mask and output_attention arguments. * Now cache should work as well. * Adding a temporary example to check whether the model generation works. * Adding the tests and updating imports. * Adding the example file missing in the previous commit. * First working example. * Removing .gitignore and reverting parts of __init__. * Re-add .gitignore. * Addressing comments for configuration. * Move mask creation to `_prepare_inputs_for_generation`. * First try at integration tests: 1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed * Transfoering between machines. * Running normal tests. * Minor fix. * More fixes. * Addressing more comments. * Minor fixes. * first stab at cleanup * more refactoring * fix copies and else * renaming and get init to work * fix causal mask creation * update * nit * fix a hell lot of things * updates * update conversion script * make all keys importable * nits * add auto mappings * properly convert ffw_up and down * add scaling * fix generations * for recurrent dtype * update * fix going beyong window * fixup * add missing files * current updates to remove last einops * finish modeling refactor * TADA * fix compile * fix most failing testt ? ? * update tests * refactor and update * update * nits, fixup and update tests * more fixup * nits * fix imports * test format * fixups * nits * tuple typing * fix code quality * add model card * fix doc * skip most generation tests * nits * style * doc fixes * fix pr and check_copies? * last nit * oupsy * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * update * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * doc nit * fix quality * quality * fix slow test model path * update default dype * ignore attributes that can be safely ignored in check config attributes * 0lallalala come on * save nit * style * remove to dict update * make sure we can also run in float16 * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Aleksandar Botev <botev@google.com> Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com> Co-authored-by: anushanf <anushanf@google.com> Co-authored-by: botev <botevmg@gmail.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
c43b380e70 |
Add MusicGen Melody (#28819)
* first modeling code * make repository * still WIP * update model * add tests * add latest change * clean docstrings and copied from * update docstrings md and readme * correct chroma function * correct copied from and remove unreleated test * add doc to toctree * correct imports * add convert script to notdoctested * Add suggestion from Sanchit Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct get_uncoditional_inputs docstrings * modify README according to SANCHIT feedback * add chroma to audio utils * clean librosa and torchaudio hard dependencies * fix FE * refactor audio decoder -> audio encoder for consistency with previous musicgen * refactor conditional -> encoder * modify sampling rate logics * modify license at the beginning * refactor all_self_attns->all_attentions * remove ignore copy from causallm generate * add copied from for from_sub_models * fix make copies * add warning if audio is truncated * add copied from where relevant * remove artefact * fix convert script * fix torchaudio and FE * modify chroma method according to feedback-> better naming * refactor input_values->input_features * refactor input_values->input_features and fix import fe * add input_features to docstrigs * correct inputs_embeds logics * remove dtype conversion * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation * change warning for chroma length * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change way to save wav, using soundfile * correct docs and change to soundfile * fix import * fix init proj layers * remove line breaks from md * fix issue with docstrings * add FE suggestions * improve is in logics and remove useless imports * remove custom from_pretrained * simplify docstring code * add suggestions for modeling tests * make style * update converting script with sanity check * remove encoder attention mask from conditional generation * replace musicgen melody checkpoints with official orga * rename ylacombe->facebook in checkpoints * fix copies * remove unecessary warning * add shape in code docstrings * add files to slow doc tests * fix md bug and add md to not_tested * make fix-copies * fix hidden states test and batching --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> |
||
|
|
1fc505b816 |
Add PvT-v2 Model (#26812)
* Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Updated index.md * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Fixed config docstring. Added channels property * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Fixed config backbone compat * Ran fix-copies * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Fixed issue from rebase * Fixed issue from rebase * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported * Fixed issue from rebase * Fixed issue from rebase * Changed model name in docs * Removed duplicate PvtV2Backbone * Work around type switching issue in tests * Fix model name in config comments * Update docs/source/en/model_doc/pvt_v2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed from using 'sr_type' to 'linear_attention' for clarity * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed old code * Changed from using 'sr_type' to 'linear_attention' for clarity * Fixed Class names to be more descriptive * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed outdated code * Moved paper abstract to single line in pvt_v2.md * Added usage tips to pvt_v2.md * Simplified module inits by passing layer_idx * Fixed typing for hidden_act in PvtV2Config * Removed unusued import * Add pvt_v2 to docs/source/en/_toctree.yml * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Move function parameters to single line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Update year of copyright to 2024 Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Make code more explicit Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated sr_ratio to be more explicit spatial_reduction_ratio * Removed excess type hints in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Move params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Removed needless comment in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update copyright date in pvt_v2.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated copyright date in configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Cleaned comments in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Renamed spatial_reduction Conv2D operation * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py " This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538. * Updated conversion script to reflect module name change * Deprecated reshape_last_stage option in config * Removed unused imports * Code formatting * Fixed outdated decorators on test_inference_fp16 * Added "Copied from" comments in test_modeling_pvt_v2.py * Fixed import listing * Updated model name * Force empty commit for PR refresh * Fixed linting issue * Removed # Copied from comments * Added PVTv2 to README_fr.md * Ran make fix-copies * Replace all FoamoftheSea hub references with OpenGVLab * Fixed out_indices and out_features logic in configuration_pvt_v2.py * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py * Ran code fixup * Fixed order of parent classes in PvtV2Config to fix the to_dict method override --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
|
|
836921fdeb |
Add UDOP (#22940)
* First draft * More improvements * More improvements * More fixes * Fix copies * More improvements * More fixes * More improvements * Convert checkpoint * More improvements, set up tests * Fix more tests * Add UdopModel * More improvements * Fix equivalence test * More fixes * Redesign model * Extend conversion script * Use real inputs for conversion script * Add image processor * Improve conversion script * Add UdopTokenizer * Add fast tokenizer * Add converter * Update README's * Add processor * Add fully fledged tokenizer * Add fast tokenizer * Use processor in conversion script * Add tokenizer tests * Fix one more test * Fix more tests * Fix tokenizer tests * Enable fast tokenizer tests * Fix more tests * Fix additional_special_tokens of fast tokenizer * Fix tokenizer tests * Fix more tests * Fix equivalence test * Rename image to pixel_values * Rename seg_data to bbox * More renamings * Remove vis_special_token * More improvements * Add docs * Fix copied from * Update slow tokenizer * Update fast tokenizer design * Make text input optional * Add first draft of processor tests * Fix more processor tests * Fix decoder_start_token_id * Fix test_initialization * Add integration test * More improvements * Improve processor, add test * Add more copied from * Add more copied from * Add more copied from * Add more copied from * Remove print statement * Update README and auto mapping * Delete files * Delete another file * Remove code * Fix test * Fix docs * Remove asserts * Add doc tests * Include UDOP in exotic model tests * Add expected tesseract decodings * Add sentencepiece * Use same design as T5 * Add UdopEncoderModel * Add UdopEncoderModel to tests * More fixes * Fix fast tokenizer * Fix one more test * Remove parallelisable attribute * Fix copies * Remove legacy file * Copy from T5Tokenizer * Fix rebase * More fixes, copy from T5 * More fixes * Fix init * Use ArthurZ/udop for tests * Make all model tests pass * Remove UdopForConditionalGeneration from auto mapping * Fix more tests * fixups * more fixups * fix the tokenizers * remove un-necessary changes * nits * nits * replace truncate_sequences_boxes with truncate_sequences for fix-copies * nit current path * add a test for input ids * ids that we should get taken from c9f7a32f57440d90ff79890270d376a1cc0acb68 * nits converting * nits * apply ruff * nits * nits * style * fix slow order of addition * fix udop fast range as well * fixup * nits * Add docstrings * Fix gradient checkpointing * Update code examples * Skip tests * Update integration test * Address comment * Make fixup * Remove extra ids from tokenizer * Skip test * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update year * Address comment * Address more comments * Address comments * Add copied from * Update CI * Rename script * Update model id * Add AddedToken, skip tests * Update CI * Fix doc tests * Do not use Tesseract for the doc tests * Remove kwargs * Add original inputs * Update casting * Fix doc test * Update question * Update question * Use LayoutLMv3ImageProcessor * Update organization * Improve docs * Update forward signature * Make images optional * Remove deprecated device argument * Add comment, add add_prefix_space * More improvements * Remove kwargs --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
|
|
3fcfbe7549 |
Adding SegGPT (#27735)
* First commit * Improvements * More improvements * Converted original checkpoint to HF checkpoint * Fix style * Fixed forward * More improvements * More improvements * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Remove asserts * Remove unnecessary attributes * Changed model name to camel case * Improve forward doc * Improve tests * More improvements * Fix copies * Fix doc * Make SegGptImageProcessor more flexible * Added few-shot test * Fix style * Update READMEs and docs * Update READMEs * Make inputs required * Add SegGptForImageSegmentation * Make tests pass * Rename to out_indicies * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Fixed naming convention * Copying SegGptMlp from modeling_sam.py * Some minor improvements * Remove mlp_ratio * Fix docstrings * Fixed docstring match * Objects defined before use * Storing only patch_size and beta for SegGptLoss * removed _prepare_inputs method * Removed modified from headers * Renamed to output_indicies * Removed unnecessary einsums * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixing issues * Raise error as soon as possible * More fixes * Fix merge * Added palette to SegGptImageProcessor * Fixed typo * Fixed shape typo * Added permute before doing palette to class mapping * Fixed style * Fixed and added tests * Fixed docstrings * Matching SegFormer API for post_processing_semantic_segmentation * Fixed copies * Fixed SegGptImageProcessor to handle both binary and RGB masks * Updated docstrings of SegGptImageProcessor * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/seggpt.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/convert_seggpt_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Object definitions above & fix style * Renamed output_indices to intermediate_feature_indices * Removed unnecessary check on bool_masked_pos * Loss first in the outputs * Added validation for do_normalize * Improved SegGptImageProcessor and added new tests * Added comment * Added docstrings to SegGptLoss * Reimplemented ensemble condition logic in SegGptEncoder * Update src/transformers/models/seggpt/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/convert_seggpt_to_hf.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Updated docstrings to use post_process_semantic_segmentation * Fixed typo on docstrings * moved pixel values test to test_image_processing_seggpt * Addressed comments * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated docstrings for SegGptLoss * Address comments * Added SegGpt example to model docs * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * moved patchify and unpatchify * Rename checkpoint * Renamed intermediate_features to intermediate_hidden_states for consistency * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Replaced post_process_masks for post_process_semantic_segmentation in the docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
7c4995f93d |
Add feature extraction mapping for automatic metadata update (#28944)
* add feature extraction mapping * added prefix * ruff check * minor fix * Update modeling_auto.py * fix typo * remove prefix to make variable public/importable * Update src/transformers/models/auto/modeling_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixes * addressed comments * nit * fix-copies * remove from tests * this should fix * Update tests/models/convnextv2/test_modeling_convnextv2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|
|
ff86bc364d |
improve dev setup comments and hints (#28495)
* improve dev setup comments and hints * fix tests for new dev setup hints |
||
|
|
4fb3d3a0f6 |
TF: purge TFTrainer (#28483)
|
||
|
|
3b742ea84c |
Add SigLIP (#26522)
* Add first draft * Use appropriate gelu function * More improvements * More improvements * More improvements * Convert checkpoint * More improvements * Improve docs, remove print statements * More improvements * Add link * remove unused masking function * begin tokenizer * do_lower_case * debug * set split_special_tokens=True * Remove script * Fix style * Fix rebase * Use same design as CLIP * Add fast tokenizer * Add SiglipTokenizer to init, remove extra_ids * Improve conversion script * Use smaller inputs in conversion script * Update conversion script * More improvements * Add processor to conversion script * Add tests * Remove print statements * Add tokenizer tests * Fix more tests * More improvements related to weight initialization * More improvements * Make more tests pass * More improvements * More improvements * Add copied from * Add canonicalize_text * Enable fast tokenizer tests * More improvements * Fix most slow tokenizer tests * Address comments * Fix style * Remove script * Address some comments * Add copied from to tests * Add more copied from * Add more copied from * Add more copied from * Remove is_flax_available * More updates * Address comment * Remove SiglipTokenizerFast for now * Add caching * Remove umt5 test * Add canonicalize_text inside _tokenize, thanks Arthur * Fix image processor tests * Skip tests which are not applicable * Skip test_initialization * More improvements * Compare pixel values * Fix doc tests, add integration test * Add do_normalize * Remove causal mask and leverage ignore copy * Fix attention_mask * Fix remaining tests * Fix dummies * Rename temperature and bias * Address comments * Add copied from to tokenizer tests * Add SiglipVisionModel to auto mapping * Add copied from to image processor tests * Improve doc * Remove SiglipVisionModel from index * Address comments * Improve docs * Simplify config * Add first draft * Make it like mistral * More improvements * Fix attention_mask * Fix output_attentions * Add note in docs * Convert multilingual model * Convert large checkpoint * Convert more checkpoints * Add pipeline support, correct image_mean and image_std * Use padding=max_length by default * Make processor like llava * Add code snippet * Convert more checkpoints * Set keep_punctuation_string=None as in OpenCLIP * Set normalized=False for special tokens * Fix doc test * Update integration test * Add figure * Update organization * Happy new year * Use AutoModel everywhere --------- Co-authored-by: patil-suraj <surajp815@gmail.com> |
||
|
|
d83ff5eeff |
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style |
||
|
|
c9fb250a25 |
Add Swinv2 backbone (#27742)
* First draft * More improvements * More improvements * Make all tests pass * Remove script * Update image processor * Address comments * Use new gradient checkpointing method * Convert checkpoints, add integration test * Do not keep aspect ratio for now * Set keep_aspect_ratio=False for beit, add integration test * Remove print statement |
||
|
|
a52e180a0f |
[docs] General doc fixes (#28087)
* doc fix friday * deprecated objects * update not_doctested * update toctree |