Add MarkupLM (#19198)
* First draft * Make basic test work * Fix most tokenizer tests * More improvements * Make more tests pass * Fix more tests * Fix some code quality * Improve truncation * Implement feature extractor * Improve feature extractor and add tests * Improve feature extractor tests * Fix pair_input test partly * Add fast tokenizer * Improve implementation * Fix rebase * Fix rebase * Fix most of the tokenizer tests. * propose solution for fast * add: integration test for fasttokenizer, warning for decode, fix template in slow tokenizer * add: modify markuplmconverter * add: some modify on converter and tokenizerfast * Fix style, copies * Make fixup * Update tokenization_markuplm.py * Update test_tokenization_markuplm.py * Update markuplm related * Improve processor, add integration test * Add processor test file * Improve processor * Improve processor tests * Fix more processor tests * Fix processor tests * Update docstrings * Add Copied from statements * Add more Copied from statements * Add code examples * Improve code examples * Add model to doc tests * Adding dependency check * Add dummy file * Add requires_backends * Add model to toctree * Fix more things, disable dependency check for now * Apply more suggestions * Add soft dependency * Add annotators to tests * Fix style * Remove from_slow=True * Remove print statements * Add sanity check * Fix processor test * Fix processor tests, add more docs * Add doc tests for mdx file * Add more tips * Apply suggestions Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: lockon-n <45759388+lockon-n@users.noreply.github.com> Co-authored-by: SaulLu <lucilesaul.com@gmail.com> Co-authored-by: lockon-n <dd098309@126.com>
This commit is contained in:
@@ -46,6 +46,7 @@ from .utils import (
|
||||
is_accelerate_available,
|
||||
is_apex_available,
|
||||
is_bitsandbytes_available,
|
||||
is_bs4_available,
|
||||
is_detectron2_available,
|
||||
is_faiss_available,
|
||||
is_flax_available,
|
||||
@@ -239,6 +240,13 @@ def custom_tokenizers(test_case):
|
||||
return unittest.skipUnless(_run_custom_tokenizers, "test of custom tokenizers")(test_case)
|
||||
|
||||
|
||||
def require_bs4(test_case):
|
||||
"""
|
||||
Decorator marking a test that requires BeautifulSoup4. These tests are skipped when BeautifulSoup4 isn't installed.
|
||||
"""
|
||||
return unittest.skipUnless(is_bs4_available(), "test requires BeautifulSoup4")(test_case)
|
||||
|
||||
|
||||
def require_git_lfs(test_case):
|
||||
"""
|
||||
Decorator marking a test that requires git-lfs.
|
||||
|
||||
Reference in New Issue
Block a user