From c40e3581c7a5ffcee92729fd9921a84c4b9134ed Mon Sep 17 00:00:00 2001 From: Ishan Jindal Date: Mon, 20 Feb 2023 21:59:51 -0800 Subject: [PATCH] Fix axial positional encoding calculations for reformer.mdx (#21649) * Update reformer.mdx Fix axial positional encoding calculations * Update docs/source/en/model_doc/reformer.mdx Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --- docs/source/en/model_doc/reformer.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/model_doc/reformer.mdx b/docs/source/en/model_doc/reformer.mdx index 2313da5571..2d4b5511a5 100644 --- a/docs/source/en/model_doc/reformer.mdx +++ b/docs/source/en/model_doc/reformer.mdx @@ -83,8 +83,8 @@ factorized embedding vectors: \\(x^1_{k, l} + x^2_{l, k}\\), where as the `confi \\(j\\) is factorized into \\(k \text{ and } l\\). This design ensures that each position embedding vector \\(x_j\\) is unique. -Using the above example again, axial position encoding with \\(d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}\\) -can drastically reduced the number of parameters to \\(2^{14} + 2^{15} \approx 49000\\) parameters. +Using the above example again, axial position encoding with \\(d^1 = 2^9, d^2 = 2^9, n_s^1 = 2^9, n_s^2 = 2^{10}\\) +can drastically reduced the number of parameters from 500 000 000 to \\(2^{18} + 2^{19} \approx 780 000\\) parameters, this means 85% less memory usage. In practice, the parameter `config.axial_pos_embds_dim` is set to a tuple \\((d^1, d^2)\\) which sum has to be equal to `config.hidden_size` and `config.axial_pos_shape` is set to a tuple \\((n_s^1, n_s^2)\\) which