Better messaging and fix for incorrect shape when collating data. (#18119)

* More informative error message

* raise dynamic error

* remove_excess_nesting application

* incorrect shape assertion for collator & function to remove excess nesting from DatasetDict

* formatting

* eliminating datasets import

* removed and relocated remove_excess_nesting to the datasets library and updated docs accordingly

* independent assert instructions

* inform user of excess nesting
This commit is contained in:
Sebastian Sosa
2022-07-21 02:35:41 -06:00
committed by GitHub
parent d23cf5b1f1
commit 5e2f2d7dd2

View File

@@ -733,8 +733,10 @@ class BatchEncoding(UserDict):
"Please see if a fast version of this tokenizer is available to have this feature available."
)
raise ValueError(
"Unable to create tensor, you should probably activate truncation and/or padding "
"with 'padding=True' 'truncation=True' to have batched tensors with the same length."
"Unable to create tensor, you should probably activate truncation and/or padding with"
" 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your"
f" features (`{key}` in this case) have excessive nesting (inputs type `list` where type `int` is"
" expected)."
)
return self