Model parallel documentation (#8741)
* Add parallelize methods to the .rst files * Correct format
This commit is contained in:
@@ -71,14 +71,14 @@ GPT2Model
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.GPT2Model
|
.. autoclass:: transformers.GPT2Model
|
||||||
:members: forward
|
:members: forward, parallelize, deparallelize
|
||||||
|
|
||||||
|
|
||||||
GPT2LMHeadModel
|
GPT2LMHeadModel
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.GPT2LMHeadModel
|
.. autoclass:: transformers.GPT2LMHeadModel
|
||||||
:members: forward
|
:members: forward, parallelize, deparallelize
|
||||||
|
|
||||||
|
|
||||||
GPT2DoubleHeadsModel
|
GPT2DoubleHeadsModel
|
||||||
|
|||||||
@@ -99,14 +99,14 @@ T5Model
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.T5Model
|
.. autoclass:: transformers.T5Model
|
||||||
:members: forward
|
:members: forward, parallelize, deparallelize
|
||||||
|
|
||||||
|
|
||||||
T5ForConditionalGeneration
|
T5ForConditionalGeneration
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.T5ForConditionalGeneration
|
.. autoclass:: transformers.T5ForConditionalGeneration
|
||||||
:members: forward
|
:members: forward, parallelize, deparallelize
|
||||||
|
|
||||||
|
|
||||||
TFT5Model
|
TFT5Model
|
||||||
|
|||||||
@@ -492,7 +492,8 @@ PARALLELIZE_DOCSTRING = r"""
|
|||||||
- gpt2-xl: 48
|
- gpt2-xl: 48
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
Here is an example of a device map on a machine with 4 GPUs using gpt2-xl, which has a total of 48 attention modules:
|
|
||||||
|
# Here is an example of a device map on a machine with 4 GPUs using gpt2-xl, which has a total of 48 attention modules:
|
||||||
model = GPT2LMHeadModel.from_pretrained('gpt2-xl')
|
model = GPT2LMHeadModel.from_pretrained('gpt2-xl')
|
||||||
device_map = {0: [0, 1, 2, 3, 4, 5, 6, 7, 8],
|
device_map = {0: [0, 1, 2, 3, 4, 5, 6, 7, 8],
|
||||||
|
|
||||||
@@ -505,7 +506,8 @@ DEPARALLELIZE_DOCSTRING = r"""
|
|||||||
Moves the model to cpu from a model parallel state.
|
Moves the model to cpu from a model parallel state.
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
On a 4 GPU machine with gpt2-large:
|
|
||||||
|
# On a 4 GPU machine with gpt2-large:
|
||||||
model = GPT2LMHeadModel.from_pretrained('gpt2-large')
|
model = GPT2LMHeadModel.from_pretrained('gpt2-large')
|
||||||
device_map = {0: [0, 1, 2, 3, 4, 5, 6, 7],
|
device_map = {0: [0, 1, 2, 3, 4, 5, 6, 7],
|
||||||
|
|
||||||
|
|||||||
@@ -196,7 +196,8 @@ PARALLELIZE_DOCSTRING = r"""
|
|||||||
- t5-11b: 24
|
- t5-11b: 24
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
Here is an example of a device map on a machine with 4 GPUs using t5-3b, which has a total of 24 attention modules:
|
|
||||||
|
# Here is an example of a device map on a machine with 4 GPUs using t5-3b, which has a total of 24 attention modules:
|
||||||
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
|
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
|
||||||
device_map = {0: [0, 1, 2],
|
device_map = {0: [0, 1, 2],
|
||||||
|
|
||||||
@@ -209,7 +210,8 @@ DEPARALLELIZE_DOCSTRING = r"""
|
|||||||
Moves the model to cpu from a model parallel state.
|
Moves the model to cpu from a model parallel state.
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
On a 4 GPU machine with t5-3b:
|
|
||||||
|
# On a 4 GPU machine with t5-3b:
|
||||||
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
|
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
|
||||||
device_map = {0: [0, 1, 2],
|
device_map = {0: [0, 1, 2],
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user