[Tokenizer Utils Base] Make pad function more flexible (#9928)
* change tokenizer requirement * split line * Correct typo from list to str * improve style * make other function pretty as well * add comment * correct typo * add new test * pass tests for tok without padding token * Apply suggestions from code review
This commit is contained in:
committed by
GitHub
parent
d1b14c9b54
commit
538b3b4607
@@ -64,7 +64,7 @@ def get_tfds(
|
||||
label_name = features_name.pop(label_column_id)
|
||||
label_list = list(set(ds[list(files.keys())[0]][label_name]))
|
||||
label2id = {label: i for i, label in enumerate(label_list)}
|
||||
input_names = ["input_ids"] + tokenizer.model_input_names
|
||||
input_names = tokenizer.model_input_names
|
||||
transformed_ds = {}
|
||||
|
||||
if len(features_name) == 1:
|
||||
|
||||
Reference in New Issue
Block a user