Small README changes
This commit is contained in:
@@ -51,9 +51,10 @@ by `pregenerate_training_data.py`. Note that you should use the same bert_model
|
|||||||
Also note that max_seq_len does not need to be specified for the `finetune_on_pregenerated.py` script,
|
Also note that max_seq_len does not need to be specified for the `finetune_on_pregenerated.py` script,
|
||||||
as it is inferred from the training examples.
|
as it is inferred from the training examples.
|
||||||
|
|
||||||
There are various options that can be tweaked, but the most important ones are probably `max_seq_len`, which controls
|
There are various options that can be tweaked, but they are mostly set to the values from the BERT paper/repo and should
|
||||||
the length of training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision
|
be left alone. The most relevant ones for the end-user are probably `--max_seq_len`, which controls the length of
|
||||||
training on recent GPUs. `max_seq_len` defaults to 128 but can be set as high as 512.
|
training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision training on
|
||||||
|
recent GPUs. `--max_seq_len` defaults to 128 but can be set as high as 512.
|
||||||
Higher values may yield stronger language models at the cost of slower and more memory-intensive training
|
Higher values may yield stronger language models at the cost of slower and more memory-intensive training
|
||||||
|
|
||||||
In addition, if memory usage is an issue, especially when training on a single GPU, reducing `--train_batch_size` from
|
In addition, if memory usage is an issue, especially when training on a single GPU, reducing `--train_batch_size` from
|
||||||
|
|||||||
Reference in New Issue
Block a user