From a39dfe4fb122c11be98a563fb8ca43b322e01036 Mon Sep 17 00:00:00 2001 From: Faiaz Rahman <42232624+faiazrahman@users.noreply.github.com> Date: Sat, 1 Aug 2020 03:20:48 -0700 Subject: [PATCH] Fixed typo in Longformer (#6180) --- docs/source/model_doc/longformer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/model_doc/longformer.rst b/docs/source/model_doc/longformer.rst index badfb4c091..c2d44a60b0 100644 --- a/docs/source/model_doc/longformer.rst +++ b/docs/source/model_doc/longformer.rst @@ -16,7 +16,7 @@ Longformer Self Attention ~~~~~~~~~~~~~~~~~~~~~~~~~~ Longformer self attention employs self attention on both a "local" context and a "global" context. Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer. -A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`. +A selected few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`. Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices. Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally" attending tokens so that global attention is *symmetric*.