[testing utils] get_auto_remove_tmp_dir more intuitive behavior (#8401)
* [testing utils] get_auto_remove_tmp_dir default change Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal. 99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug. * simplify things * update docs * fix doc layout * style * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * better 3-state doc * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * s/tmp/temporary/ + style * correct the statement Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -716,11 +716,11 @@ Temporary files and directories
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
|
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
|
||||||
each other's data. Also we want to get the temp files and directories removed at the end of each test that created
|
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
|
||||||
them. Therefore, using packages like ``tempfile``, which address these needs is essential.
|
them. Therefore, using packages like ``tempfile``, which address these needs is essential.
|
||||||
|
|
||||||
However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to
|
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
|
||||||
know it's exact path and not having it randomized on every test re-run.
|
to know it's exact path and not having it randomized on every test re-run.
|
||||||
|
|
||||||
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
|
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
|
||||||
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
|
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
|
||||||
@@ -736,32 +736,33 @@ Here is an example of its usage:
|
|||||||
|
|
||||||
This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.
|
This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.
|
||||||
|
|
||||||
In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless
|
* Create a unique temporary dir:
|
||||||
``after=False`` is passed to the helper function.
|
|
||||||
|
|
||||||
* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
|
|
||||||
specific directory:
|
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def test_whatever(self):
|
def test_whatever(self):
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
|
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||||
|
|
||||||
* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the
|
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
|
||||||
temp results:
|
test.
|
||||||
|
|
||||||
|
* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def test_whatever(self):
|
def test_whatever(self):
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
|
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
|
||||||
|
|
||||||
* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion
|
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
|
||||||
in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
|
leave any data in there.
|
||||||
|
|
||||||
.. code-block:: python
|
* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
|
||||||
|
following behaviors:
|
||||||
|
|
||||||
def test_whatever(self):
|
- ``before=True``: the temporary dir will always be cleared at the beginning of the test.
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
|
- ``before=False``: if the temporary dir already existed, any existing files will remain there.
|
||||||
|
- ``after=True``: the temporary dir will always be deleted at the end of the test.
|
||||||
|
- ``after=False``: the temporary dir will always be left intact at the end of the test.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
|
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
|
||||||
@@ -815,7 +816,7 @@ or the ``xfail`` way:
|
|||||||
@pytest.mark.xfail
|
@pytest.mark.xfail
|
||||||
def test_feature_x():
|
def test_feature_x():
|
||||||
|
|
||||||
Here is how to skip a test based on some internal check inside the test:
|
- Here is how to skip a test based on some internal check inside the test:
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
@@ -838,7 +839,7 @@ or the ``xfail`` way:
|
|||||||
def test_feature_x():
|
def test_feature_x():
|
||||||
pytest.xfail("expected to fail until bug XYZ is fixed")
|
pytest.xfail("expected to fail until bug XYZ is fixed")
|
||||||
|
|
||||||
Here is how to skip all tests in a module if some import is missing:
|
- Here is how to skip all tests in a module if some import is missing:
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
|
|||||||
@@ -522,45 +522,47 @@ class TestCasePlus(unittest.TestCase):
|
|||||||
- ``repo_root_dir_str``
|
- ``repo_root_dir_str``
|
||||||
- ``src_dir_str``
|
- ``src_dir_str``
|
||||||
|
|
||||||
Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test.
|
Feature 2: Flexible auto-removable temporary dirs which are guaranteed to get removed at the end of test.
|
||||||
|
|
||||||
In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`.
|
1. Create a unique temporary dir:
|
||||||
|
|
||||||
# 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
|
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
def test_whatever(self):
|
def test_whatever(self):
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||||
|
|
||||||
# 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a
|
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
|
||||||
specific directory
|
test.
|
||||||
|
|
||||||
|
|
||||||
|
2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
|
||||||
|
empty it after the test.
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
def test_whatever(self):
|
def test_whatever(self):
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
|
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
|
||||||
|
|
||||||
# 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the
|
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
|
||||||
temp results
|
didn't leave any data in there.
|
||||||
|
|
||||||
::
|
3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
|
||||||
def test_whatever(self):
|
following behavior:
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
|
|
||||||
|
|
||||||
# 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in
|
``before=True``: the temporary dir will always be cleared at the beginning of the test.
|
||||||
the previous test run and want to make sure the that tmp dir is empty # before the new test is run
|
|
||||||
|
|
||||||
::
|
``before=False``: if the temporary dir already existed, any existing files will remain there.
|
||||||
|
|
||||||
def test_whatever(self):
|
``after=True``: the temporary dir will always be deleted at the end of the test.
|
||||||
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
|
|
||||||
|
|
||||||
Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are
|
``after=False``: the temporary dir will always be left intact at the end of the test.
|
||||||
allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem
|
|
||||||
will get nuked. i.e. please always pass paths that start with `./`
|
|
||||||
|
|
||||||
Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise.
|
Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
|
||||||
|
allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
|
||||||
|
filesystem will get nuked. i.e. please always pass paths that start with ``./``
|
||||||
|
|
||||||
|
Note 2: Each test can register multiple temporary dirs and they all will get auto-removed, unless requested
|
||||||
|
otherwise.
|
||||||
|
|
||||||
Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
|
Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
|
||||||
This is useful for invoking external programs from the test suite - e.g. distributed training.
|
This is useful for invoking external programs from the test suite - e.g. distributed training.
|
||||||
@@ -573,6 +575,7 @@ class TestCasePlus(unittest.TestCase):
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
|
# get_auto_remove_tmp_dir feature:
|
||||||
self.teardown_tmp_dirs = []
|
self.teardown_tmp_dirs = []
|
||||||
|
|
||||||
# figure out the resolved paths for repo_root, tests, examples, etc.
|
# figure out the resolved paths for repo_root, tests, examples, etc.
|
||||||
@@ -660,21 +663,42 @@ class TestCasePlus(unittest.TestCase):
|
|||||||
env["PYTHONPATH"] = ":".join(paths)
|
env["PYTHONPATH"] = ":".join(paths)
|
||||||
return env
|
return env
|
||||||
|
|
||||||
def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
|
def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
|
||||||
"""
|
"""
|
||||||
Args:
|
Args:
|
||||||
tmp_dir (:obj:`string`, `optional`):
|
tmp_dir (:obj:`string`, `optional`):
|
||||||
use this path, if None a unique path will be assigned
|
if :obj:`None`:
|
||||||
before (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
|
||||||
if `True` and tmp dir already exists make sure to empty it right away
|
- a unique temporary path will be created
|
||||||
after (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
- sets ``before=True`` if ``before`` is :obj:`None`
|
||||||
delete the tmp dir at the end of the test
|
- sets ``after=True`` if ``after`` is :obj:`None`
|
||||||
|
else:
|
||||||
|
|
||||||
|
- :obj:`tmp_dir` will be created
|
||||||
|
- sets ``before=True`` if ``before`` is :obj:`None`
|
||||||
|
- sets ``after=False`` if ``after`` is :obj:`None`
|
||||||
|
before (:obj:`bool`, `optional`):
|
||||||
|
If :obj:`True` and the :obj:`tmp_dir` already exists, make sure to empty it right away if :obj:`False`
|
||||||
|
and the :obj:`tmp_dir` already exists, any existing files will remain there.
|
||||||
|
after (:obj:`bool`, `optional`):
|
||||||
|
If :obj:`True`, delete the :obj:`tmp_dir` at the end of the test if :obj:`False`, leave the
|
||||||
|
:obj:`tmp_dir` and its contents intact at the end of the test.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp
|
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-selected tmp
|
||||||
dir
|
dir
|
||||||
"""
|
"""
|
||||||
if tmp_dir is not None:
|
if tmp_dir is not None:
|
||||||
|
|
||||||
|
# defining the most likely desired behavior for when a custom path is provided.
|
||||||
|
# this most likely indicates the debug mode where we want an easily locatable dir that:
|
||||||
|
# 1. gets cleared out before the test (if it already exists)
|
||||||
|
# 2. is left intact after the test
|
||||||
|
if before is None:
|
||||||
|
before = True
|
||||||
|
if after is None:
|
||||||
|
after = False
|
||||||
|
|
||||||
# using provided path
|
# using provided path
|
||||||
path = Path(tmp_dir).resolve()
|
path = Path(tmp_dir).resolve()
|
||||||
|
|
||||||
@@ -691,6 +715,15 @@ class TestCasePlus(unittest.TestCase):
|
|||||||
path.mkdir(parents=True, exist_ok=True)
|
path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
else:
|
else:
|
||||||
|
# defining the most likely desired behavior for when a unique tmp path is auto generated
|
||||||
|
# (not a debug mode), here we require a unique tmp dir that:
|
||||||
|
# 1. is empty before the test (it will be empty in this situation anyway)
|
||||||
|
# 2. gets fully removed after the test
|
||||||
|
if before is None:
|
||||||
|
before = True
|
||||||
|
if after is None:
|
||||||
|
after = True
|
||||||
|
|
||||||
# using unique tmp dir (always empty, regardless of `before`)
|
# using unique tmp dir (always empty, regardless of `before`)
|
||||||
tmp_dir = tempfile.mkdtemp()
|
tmp_dir = tempfile.mkdtemp()
|
||||||
|
|
||||||
@@ -701,7 +734,8 @@ class TestCasePlus(unittest.TestCase):
|
|||||||
return tmp_dir
|
return tmp_dir
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
# remove registered temp dirs
|
|
||||||
|
# get_auto_remove_tmp_dir feature: remove registered temp dirs
|
||||||
for path in self.teardown_tmp_dirs:
|
for path in self.teardown_tmp_dirs:
|
||||||
shutil.rmtree(path, ignore_errors=True)
|
shutil.rmtree(path, ignore_errors=True)
|
||||||
self.teardown_tmp_dirs = []
|
self.teardown_tmp_dirs = []
|
||||||
|
|||||||
Reference in New Issue
Block a user