From 01068abdb9aae6e5a74f7f34a49581b89b861f91 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Wed, 31 Mar 2021 18:36:00 +0300 Subject: [PATCH] add blog to docs (#10997) --- docs/source/model_doc/bigbird.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/model_doc/bigbird.rst b/docs/source/model_doc/bigbird.rst index 8d3936a795..b3c2c5d2a4 100644 --- a/docs/source/model_doc/bigbird.rst +++ b/docs/source/model_doc/bigbird.rst @@ -41,6 +41,8 @@ propose novel applications to genomics data.* Tips: +- For an in-detail explanation on how BigBird's attention works, see `this blog post + `__. - BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using **original_full** is advised as there is no benefit in using **block_sparse** attention. - The code currently uses window size of 3 blocks and 2 global blocks.