From fa661ce749b0d14ae1999d1b097866248624a842 Mon Sep 17 00:00:00 2001
From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Date: Fri, 5 Jun 2020 12:22:50 -0400
Subject: [PATCH] Add model summary (#4789)

* Add model summary

* Add link to pretrained models
---
 docs/source/imgs/local_attention_mask.png | Bin 0 -> 27493 bytes
 docs/source/index.rst                     |   1 +
 docs/source/summary.rst                   | 492 ++++++++++++++++++++++
 3 files changed, 493 insertions(+)
 create mode 100644 docs/source/imgs/local_attention_mask.png
 create mode 100644 docs/source/summary.rst
diff --git a/docs/source/imgs/local_attention_mask.png b/docs/source/imgs/local_attention_mask.png
new file mode 100644
index 0000000000000000000000000000000000000000..284e728820c8fb57e4d295b5c1e33653acae405c
GIT binary patch
literal 27493
zcmdSBbwHGT*DfleB1%dsB_JgrAR^t}DJ=*HD4o)vf^;b%AV?@7jdV-b0D`2Hv@kT%
zb=LSk&-1<SbIv~dd}r@3_8-6ubI;5j>$ldmuIpNNkcyHt_LUn~&Ye4lEh{6TcJ3Tn
zCj1m!z6Ad{tGTfQf6yG&q{YtV_mD5alMCjeilXPv6-8V<Ho6GUG3{h@9M7Gj2tfUy
zMa*xSo;zo8C@UeV;cBp2i{bQeKBi7MN-S%wEMd@N{SEv)jD4&VTRrZvUh6{U654R-
zTLU5bB@;fs1SPR+*TkNmdlC7VUVZ3Dcq$&ze(1(O<x+6hrz^^(U?(>w_pQfTCEw!8
zB4I)C872#HNQ?1KQCgxlJQ`dKb}%l|X3<P$?hI;yM_&1i48J$1{IA#`bN<(68`bQu
zrEfg08!7Fn;sJL18%AQ6sn4WUy$qJmg8A1f(5bwB^%;guyYjMTE++d>dV72CXZjp6
zy!JU=^l9a;+r<92?XPfibL&^wO&7G<)g;D#YY8ElMk1N>390=eeXR5!K4fBmZN08`
z2H}axB>Oth_~G?0rX*gb_WQ+~pO9$aePFtObQHpzzQ6mN%&ykMM}IvvGkKVO{I2WG
z!)mNodMU}X7xvSpA{%~kI|U|MZc3gRiRU_#-g<0?Wk=!v4g(vHli*TS^-SyONi)K)
zp=rXdiIqDf9#gzV!0x1*I~X2$3D{3ya|gy_-tnH?-`$xiIxSx}jmKQx9-goh)IafR
z)(;5Tsazj-MfX2&52?i2shz4^k~&(yInOt+SWrJtZ^@5E;z*G{#9=31ua|rB<srg3
zt2gJwHZmY5&az0z=i8T4(w_@7i@Kh9v*tH1@r<Ff9AaM|2%-G3x3`3b?(q#>Ki?a-
zaMy^b`daql$u@T#HQ$%9v9VLFiJsVR4u_i&KB1A3c=qXp;vr(w!os*{SFe`LS&;`s
z`(Ghcx)G+6iOYuw<;T5QoQK9s(QTY>vhYdoHFp`&)9yjklO;V>0cVzLIUz0Mew~#M
z1MeV+ISH2Ec1t~Jz0ARxYI<^QpP?C!pMrnb*j2kQE}ngJi1nFoqW$voIAolsE5Cr)
zd5-!0I~>D(&Q2ILX<{4=y0}cksdd9_LN`pF<q<hjsN0VDrbJ&TvDT$u;LDh(BQerp
zdZNo^h;UXAuI!yWvmjQ4Q^dJ89<jeA7Tue`xWH$o+qMx{<v^&ohUZAJOp36|6Yn8i
zbbn4$P9i8`wD&yQvK*r%oAS|WliD-7H<og%QF0w<K|w*T+uht}7FcavPkC$4PLD2M
zy-J`ZH{EwFqP-S7u&Pv6ezmqM4|^<Vp){oM#G{uNG34@Ia=wNs>)}0dgtL{$;`!yb
zy`oE@o5nU`IfEKot)&nINghMA){n#D;=ejuG0ZSuX+ijnn&Z&YaEl`pw153%x^Sx{
zC>}HCj=<&hc6LiW0p<n1p}LJguSeFd^Y$FWMQiEO_|XZ0F}*@J7x)JD{LM3sI!rQK
z^xIm8d#E#bNF3Ms;l$jp-z0pKEziC>X8`Y+@7{i9qRD`%mT!U_>m4!vIZ`AAk+YiL
ze6PQQvhZZ$YE~TQ_0o_eBwFyHsU~jRnkVj)v5FEc!AH4+==~m}=-1rG&>i|)R^$j3
ztjgDUWN>7ZmD{S<>yETccnqSWF3wBvswzbjz@LYg{^_yq|Mr*8EFG>aA;c$Fzp2_;
zWW=@}4d3+AN0)o6(D;%>nICZ+;*$1w$h9-bAuW-SftUNa!y(3NbzNx+z6k5aim`Hw
zPv1F7*4(ByJ}F8ZCrGvFOa#XHi_JYNd(<N+O)g~}eT?}ba#D?Ufj%pV?^A{ZLLp0+
z9h*av*Vs&~snhLpH#xN#eTlV!O$VJOLgD9lE#>=ZYGbp;Ioh$+VUq5;qry1kPrkaV
zP2%XQ!}~Io;hz|uHSyoZio&rafY|ehA#Yn7g$tK4RYcXm6!c{3vl?88VWC%xAJg1c
z#W;pJwttCwoaN}<l}9)us*rfsZilRBZOByOluisXBZ4!Xf-A#E&BJdxY6v>C@eQLB
z9?a%ZvVVr1VNAN;vy7GwJG%4oQ|&TZ81o_x3#TMPL8YKat$;o1Gjpl|uXFk^$&L<h
zCB^}@0CUS-`vl^3{MQvd^ThhXp63oFEBRYQY`Kpl{rb0@{me5P1xKjvcpDh!(~cXv
zKEd;M^$4uQG{8^YmFG@JTu6?)eAH5@X7=(slkN4pG6;ps3gK;v=!cN=sGkN^q)v73
zmPI;HP%p{u%5#cyzG=}9I2>0*$GFMEB?|}blJM!R|D~`%Sgl4wL6Y@G;y;o}X(6UP
zb_PPEy=G}(dp=E!%4(gLIIxXHHTLGOLS>bY@F~YCw#r8*fol2U6u!&zwLQD%Ug<>&
z`yaT$S1-?HX$ju^yeBA9%cIq)u%R6ItO6s;dN<70Tir<{IVQ>-b2^qgnV9Y3nIXME
z5y>4Q{0;J?TS8I&4k?(l^h29vl(xv<zd9<>YV!7Q9rPj`9I*a~sLoH-j0{jieuo$Q
zsru^$cwF+|<fYT$IgPu{EW|>d{X*K#q5P#3r}G_wPkN>MUM%6LSJ>X5dAAkGo=u*|
zFwCdWwQ8xtla4-0p*V9GufpYXTa7FS>X$!}qjjZBJSL}$jf9-4P<Y+ONA+BuGOD8M
zpCH`*t8K+)^h4v~3umV}H?XfZybe(gtPG!sm<&kyDYN3SMGaYJF9-wgJz>Nu?gT3;
zQ?-7~d^9`3(5&C~B9ltg*oIhh@>xzCl4l?BL57*NISrj3UP?8m5|=X#V#RoR)?f5<
zB}wK}!I%9(`Q~{fkHaSUZT>O#XB`3@!&}EuPvY*Pu@+~vSE7Ab=n+}<Q{vz>I^KH4
zNHNG_Mz1hdaiy{}2KKaOLeG5|J$$Wv*Qm{1q~)ik=c2!*f>M)bEnXbCd2e~&bTXkY
z8s#0*8GiD*AXS{%D7+xOag151ax_v>p>ZbveXO=e%DDC?3U=5V4L4o1I5<VSqtv$J
zJ|g^7ayW1u?M2;{%G8^0HJf~mY=Fhr82I=X6}t#RK}Jr#h4ti+T96->a;S`Qab3Nz
z+TFWwkb$92Sfo5(4WP2UHum!V`lBJR{1itmpO%gjUECjA+)C_a98#vcvzc8>OQNME
z7=XY*L;jsz0XjG<sS9q{CA(+O3svk7`1*5K*Oq2|zeM733>$k*woOrazj)zb^V~<}
zrsuIIl$IY~Rj~H9u5l8G(bE1{Si56JFVWii=n0$q^Y44)t7z{hC-)ah@At|o4vhag
zj2fChE!G`i4NA-Bc%oY?f&v%P&V&uMOvJB`$SCML(>$jiM=MY<+%fFsd@@?t{JxQC
z(Y<v=*K$hp^<4s`B(KY~dU~5}^~dkjYc`g1va)gsISD2YrXwO_l32~?kB_zLrHs|C
zoo=w5-Cp4H-WiK{ZAO7rf3o7pF}(RsSy<A>TWIp^hn8`^xR>oyH{QCv4^$zQ;mST|
zbjvqxY;+>0001QW>|m6Xmd`F-?G367XRAA<z5QK}-TTbU9zy0IEK`*|<D!SH@SQt`
zZCXC3BrLjBgcuRz@J7Qkq+y}Wtd!MQkvc+gq|${=Gv9#LjGnx9^}+%onazwov2ssH
zA@QCLLcw*VBh!5l-N1AHL9~`e;Ao6?p(}??)qz<b+8M)pUTZJpbwcG;iQ=fecQNE?
zt%Oaryh`5m>C}fqW*RIB4vNB#Rmh^^Cr>S#>W|xe_*%L&)pK*R;X2fvMeRi<MZwp4
zZoG@I;yqcKIt3JWyzp&`dZNlgP0^s!+4J~-q@9`Z3!Ct9SR7JiO3#d9MA&sb(>Q;2
zTi7$~uvMyNcR>%HbqSwb{~#67%1M$88Bz`>=7XNc4&sxY74z<ty_kL-6pgnmhT%<Z
z;cym(&yIW^HDz><)LE9ty({!M^UgJ>RzfK7cpu3k^a@oF3Ke@VFgQNt$ix3!PCW}e
zp7V*ud}c1P>rm@lvK9Taz^f`fQ&kd7h4T744OXQlvi$K7{&d{w{KW7Z2LK#U=l+-S
zKPt()<M(IcT=sYPobu;qFfvp#X1-VX2Y^sjnov<9{_`^k<{y*H4-ovBaX*oDcD(%a
zGtv@Q4(sVM>E)FkmL-<edNeTUdhm9o?>V?PPjU>3E@QF%fk{qi+hws&b?r}ua_jaZ
zV`F;9ZR`80&HAt)OD&ILL<*Hr$|gN{Qg0sB6k6qhp2k}FYQQ{grujTWTtxJzo%Sa=
zB*VRaL)h|>8tU|B^eh4uSL5efDvpxgCK)5+qZn^9H53S+HZ#5VIUxF&kq-FH)@%JJ
zFet1EgX%;R;aqd{fr{3x`SS5k^L2prbTNormYRdLJ;3{ilP?>^YBoA^;BwX`pEX)w
zJ#gJR=7eYFRL57a6}w%3OiM$~II2J83Qj=w)j6kE_$+B3temvld}>=b@o1Sxs*HIJ
zjaQcGLuqtpu6;9>utU~7eu1w1RR}Q@#Ol4(rs$<_kPJ*1Y}$TuBtce;KU$N1=v5mR
zzp)>soLZPAgUE0GrBiG_tH>RM^Rw9D#@zI!^(=N*=>TNWX*F7Y(~6s%aemX`u?RN_
zmXpg1bB(N73I895MkjU?Devr}&Qvlh`kc=l;*!x6B%?KX3fg!55H&V?nN!bZL&<Z_
zn*(2oLxATTItES&Sq-N|TXULWy*DIFeNO?KvYs!HOZWXQj5~C(%E%}X0#0%~Ioc~9
z7Vm@vgiFef$1yY|3XgN7xNvXZy)B1uzC}l8(*8B|GA=H}y=+GZDUZ`-RQW%8ON=Dd
zII6c5d0yd3AmEQ1P+wmk7yo14I<GpS03OUDF_e{+IXD}p4p-OK)?gP@?AZ17^`YXa
z<WS*WzYB;jFf`Pu0;9si^#|G1zW>kXLd*=#L?x&5ZG}tK%1=i(mI)XW@7)MvUI?nh
z;hL%P9J=LkBQF}(>H1i#MMQAKxcWLfK5;6+_A19<|CS<Ytyig_EI^SsS3Bkzxox-*
zuUsgkp1NK1K5G-^Z6O;X3aHwOyh|_;<7=_8yc-K2wViT$)Ts!EJQmFB=C<E)Bn-#7
z>>)4LwZbd{{g93P(7@pIMdTF?)HZLHV|xgcQ7ItVbJHWbl*?pg5_{$}Rh&ZnfTDZE
z`8cq`6Yx{csK^m6<u!m(l!MF8$U?WV`yAXxWBCuY>l|X@;_ZI^<TzSZ?o{A$-n$-}
z%&rPqaBj1$@1Da)h?DUUwzQI51w_7IRhZki(r5F(&XAm$fAl-0evI>cWi=;Q$5NS1
zrF?Rp^rm92T73_^76zQHG6feNb}DGQpIK-9;A9YToZ)2P;C#`W=q|>q^VgD(j(Qla
zuKZ^n$+Y<wB=tGOVHxw!u3`Xt{_%fiF8ZazJ9y{4QT->^Z8Bxc(T(rCgi4fQtufoM
zC|ie<kCV&a$U}kF-2M@6<vP?%eV`JcE}?WsRM&$-v*uTddWrSJ@aWdt03Zsd<P2h=
z%CJ-%u?Kf{ZcJ;{y98@ZdUts<S6VpqY<x5He!lM~airL$mOo1pt1+P0A$&Zi&M}<2
zfcB~UO-}g-`O5Gq?I}IulatBQQ(V{qTj3*eF6SNB&B0-KmT7oQ$@)QRbUWpve%pH8
z@s)WWFXqY;pPwpiD&B1}D|M?`5D3e8Gs}5%7&sKfP%?Tfhw|EDt;e8WU#)f-v!#1`
zmjH)K(0|l%5wcij@3`+K0Rwcc=2SlC6>0TgnS`*_TGhwscjh-AzZ(9b*D>fxzZWUu
z>UU`gGKIG|#u#Tjq_v%c(<cA4`YsjTVI!|of9jKsq!-*Dk|RP0u$scek7ip%PuN+$
zjxk+wub9brfj+mx`6%!$hRx3@WrTv==1waLII-OBWO^wkw&JE>@%}ZZpR~k}Rx(=&
zCxy^BWME)u3GOv6Tv=3x_~fQyA&6=VJnK(SR^#~qNZsbUae{#(VP_Wz^)s33-V?m@
z=)TSR(i%=!hqyNa8-~1Zty1acn*5AyK%ZWkWxZn<@O|x0e*VVR7OaouZHw4937B;A
z^jPIW$b|(+`K}&%ZJ$FzAcG*0Ar0dGz+s@Fz1S6?XZj5XIY&RP$YmX<5049EE?@js
zxcf>vHz%{LG}mL3Ws4zH?Bp{q&?6J_5ouPf7c_d7P>XO)W2bS6ad{cfk+nb|{7p*~
z`z>Koe12pijfoMN+A{OpJQ<)G`sUa<xDwx#RwT7GvjK3Qjf>0lwwLsNNtzB=KN+AT
z&04r3TZ!Y`67?n_*zovJ9O8w(!)$S321D3Q86ey`>m8;Bq-`w<gx?xj{;sR<IxPu!
zYnmZBr81o;LdaeTwi-)I&r|$4UaB-+$2Y(cH#X@<A&nE<HN4reXgE^dyt^ihq8qFS
zV=RXrH7&3j+cd_mjuknLIshJixYHdcs0XOQZTVhnI2DArz<#^Vz585HeHN3e1K;$k
zaa3O`UhJL~$gG#NP3E=!Cu$^r9@qzMUhJk|@s>qZ`p0bR71_OsX34yvf#X;loYN!Q
z%3JXS$-JAWUnjVKM-&gw|CJcf{}mdfCEj`QCsJ5O`+L%OEAl6v*?InVBqIM8FmV_8
zTaNlh_gBgiO!*t|S-SRD{-Oi~<0Yz8nqy9zLVuBHkHsN4(bNR*VN}TVUP{N{&44of
z2fVA^3;4IpKh7Q*dd?8`p3EHu21hhiC;tWv`==53$$aMvPM5vlQf;yM1P>%y>F;m;
z12QDOp%FM;umq5>Gj)8eg=?_l?7$o-pm+5rMA>=?_FF;4@LHa-kVIcrhCv5Yeep%h
z2ui%S1>)9w`4=6FY^H4BQ)}6TS`!DE3thL}p$?3eb;ng&6+wh8{qPQb$Mq1QFygql
zO3ItB4p;Vc<0R{e1u&dzl^cW(av{J3DJ1ffaT{3L(oq3eC<s(>mGp-cz|R%(bi>Yt
zcleq`@5>uM*KR+i_tQlpEqYVlI12juSC4Kd`d{+kmlTdjt?>TRO-equL@4ehX#ef`
z=}cApj~hQi$POoH%%uD-%t#@zZr{7#%5<K#$-YUEo95PiU4-)mNlEP~AA*-3Xj}B9
zRWxqdRLz@t{9gT&?~e9-eO$BRke9>5#p=<}e~?{cS>O{hPW$S!A+mqrA}}YBaZi4c
z30#KmZ*Db*j_|}E%fGsy{C9Q4wRe0~U7i{DjvdiNai>YJ8F4|Ycbas04OV}}k*i5z
z7dv#A70GOuaIVupLb!PNVJhVXU>s$;$XZ<a(HPPhckCX=+wXYs<$EW>h!)qseO~F{
zY+rlwnPF6}GTc;o!naYDf-#;CmK%%Dp+R(aLg2W49^{2e$<>*78QV`pjs$AvTt<(A
z*~GXtc_h25Mgu#0uf4Tyynqbl<zbaYD8xNG654N;fb8SACTND6t7l^^&os=j{(@km
z?8aPUu?sZe7jcvQ48`s(j5}(;7YTSUyi~u&u^N5?U$prl6YqVJzidD`bWU{2<FDe<
zten;i7#8?q-}C!6XB(H;pjg#dLAMCO4jskfSO<!(_f;f1t6^CNKlE|6Pk(ta#*Ku(
zdD^pcPI451*-nPk$5qMV=UyJFiH`qT^Xx?}Hek8FF-;x8n-DgI9wHWLaSww+%iQ%{
zns(CAUG;irvZscMRWO|2({VF#Qkx%;V)ft<CQHYEt#WC3P@7lYZjp~BYY!MT`zo%k
z17>kq3&gTGcG<6ph%yXeIe=U&Z0&sPT0%!a8LLd?%zd6K`UYEs@kTK9w2U8x!=ZKC
z3cUKbd`*GBQVV!kw{3y`gd7iNJvMU%PQGWk_$V@oL~-ql41>q>lW~`x$H-fKm6Rze
zO-k~a{8Q-P@84N{fJ<6!F4*sJFwUD=E_jpSApjpf>iwQD?m(QsulxVhh9?b$oBn8P
zoW=i^o&JWPN=;l?k^)ANTGK5NK1=K*!iT&GT8R2=hwO=axYR^TKV3bm)#nP5FHa@%
zl)RsqBP8-z&~|XJYO9IPVa;@oO*EFeHT^01L}a{`ucvlT@RphR%(ROk;7sM(GxU!n
zL_ZKYB#?Fv4396nJs{%QM9f0k88xg8=wG-*?m8#LF>F6QEG#LiXHqa1Jv{m{JzzHw
zN=l70*0Y|ZAi8Q)7?QAL<Z6fNDD9qWsp&?t+i!ZpGHa<zBIT-uMs&pW2!Fu23<`<I
zTK(0Qol*#lqRbwe4;3Pae2KQ)r@L8X9cW&fN)|#YGVa&JIHjIo$||?2tWVor-6C;h
z9a^kKN5><)@e;IzfDTrc=N>7*6S{VGaetw`b<S6Cq6Kjw;;YCn7mDC71Fo4rsebq)
zFHHu%P<^k|Cp<#IjZ4Xx{|@sMlcPel`z8Keg7ebNv;KNEWv4D%Al!&DVUn`j-Tb`0
z)oh(h%gjdNdkI&{{aG0*F!>+<RQYK37#8Hpu|P#d6%8-f1BCM&;apR96FO#xERd?1
zDd)rm$EUC3)0}!aXPLOR+gYAP1QWh3pe--Nt2;ft91@7<NO1n8Z!RU?Z8Q1{dM5~B
zcIeYDM}+#77eG6MM??}bZSWq|NEx5kpBM*t_A#0ei{d^Cr$;1H)g5%(2BZ2pwYMJ2
zckS0s+DZ|1i-BJ9sjkb~9K#Ols-X#YJ1ActAC9f_Aee1yg0}LN9+wPubPd~2ad;3b
zI9SIglBpEi@fini!-q%OyH(KjV~chLC(=ApD|4PFV7@cvXMc2GCyZ4C=y1NWt2QsZ
zcgvz3C^&7VLgFfT1omd+XYLPeu57d29OdIlc0H^?ot$^^F|RPFctP7hwakQsMAHND
z9)Ms&3Tnf*I^}?oxeo=ZA2}tdY5K{_QvUF0InIQliHXZpyUHjD@^wMo0dcExVqi?p
zaA0L@`DAOTPI2NI+7&Jb&(4x12$BS6WMAJXlNr6$R3!nnq9Xh&FCZeNq(&cjj@e%i
z+dnJy0A>rN5Jux3cMW^UdvfNbXFIe!kD^C-77f~leGa%obxSw0Cx?*Z#gmj8^rrO9
zEcp%3AKPxaMPt5Z&dy&Nb0;$p2)T#Mp8q!G!(?MZLFIFLaVs|`z|ec23~*Rod2ZVI
zzcDvvTu+&XH|LX2LsdUW&&G%J0yR@`dd!wZKJh(g%>9*#UeQ@N!OWKPF)%|E^091f
z(<W`O$db=agbR(nnPqxBH{Lrv(XM@#1U=jFZd4h0I>OI`$`voFstT@=<6`dzfO62h
zjq-_pDpazAqhw5-a@y`rL;vs7RsFv+JTvRmnc;&WD!h&~yQdo$+YTD8lGeH%Jcv<m
z?7xJiv?<;=t*V4yosDC}`)Oo<C`zg5$Lq0z?8KpZJ7T7xhK*_S<ms;Z;}#>K{a@<8
zsiRZ;xjCZi!4>_U&2LGbsGcPy%+FC!+*#n8>AZq>jUou>hkNBEd<x&%iTzronuG-i
zE4hf2@DuQ^4xgif-c4mY(*MvE8X=n^(Tb-dBCe*A{IB#9cMho0@gm>;BwyOBFx|Lm
zKcNtKyS))K5WTT}tK<WpASOiX7iwQsl?Op@4sQf>6x?nDPFG`iDEKH4hmIIGVtT0I
zAsqngtP5<mwuEnE0i(1onC%}PP}c(^hoV3BwnZN~B<}AG8c{W!3IVB%|F)oo?-DHy
zEeB0YM{jw_P5S#oOOz=xGIC-XPC&^=zDe*e+_2-|q^4!Q`a%?F7%`w#Jv}`kIA&(9
zxnfb>G1H<EpgBKvI8-iy0sXCMm`(G?rZJT8${H21BBG|qw3W-r20TM60k>0l>%h}*
zn**G^e2B{J-bbw~qd&>H4G_)(PdS`V^_z7tK9zXWXXypMcDoK=1=`bPIa#26vtyOo
z6w$<|xf$A+-k1$H(cV(gFT(^4_f<%8c`a+Ye%pv3*$<nfAECm9I&h?J8u&#V6-Sdt
zkV<D$rKyS&)Nd<|+h@ihQ7pvkm?&Rg>2Nqb(!GK~jcGJ1<Hgdj54&^fGORu2(n<0@
zR1{GWYa+%^(ahK$-U9NsGP8F~%X_&K#Uh!gbo-(>`*FXeVzX4o6&|QD(=&9MO*Z(P
z1eyg8Lpql&;9p(g6z%2tr?8$*FmOkY3Zowe4XJ*xYr-b~<}K;&p{G2dHsbi>`M0;f
zmSY=xJ_O`1TMG^j%>EW}J6zVJ>f@$owiyZJY?9m_*dp`1rt$aNq)nQbBVdg?vk3+d
zj9X0(%~}fr|53j4f9F5A|0F$8b^M=1CKUhS|5q!6`S(@^=+Zyl2(0YixLC5kh*O5Y
zcTRvFRF!;ic?Y?EzyH^i<UgZ{Z>E2JYV@~sc<EShu4HcnUqhKPq<YF{j45Mn%HhHM
zy`2FAEO{)EzNJwmRd2gHH8YJI>j(TvX~^l5jjB!uIYj4|uJx14!bi)F0d9-jR4E^8
z65&t)U3Y#mKPl*7y2(Vmq*>hkB=6gsNwup$I$lV>6<A4(+%BSh6Zty+71Os2UWMU-
zPrDW8x#XofOBIs*$6ucJYm@rmJj0VI)818Pt{YU}`@FqI0c(2)7<S`e$FIF!jA_WN
z#71E@v|-%3i*@OkO4BhUCcVCDV`{p>6ihorKZUI>tC2DP)+8TsYZ2ZXOf^<`*A!jC
z)zH<3VZFM#C6s2=__5>i<;x=K+cJ>M6Le*sFj^pysJGPEK4GVE=x?cyz;VX%cy*Ds
zI!guUc4sVldOB)0VK5^&>9jqv&`tOzDAJch`AAhoC9q=u0$3nQa<N{ks=`8XR>Xek
zD-LV2H3QcBVen`SaG%ZP^(WSSC_J<Lh^p2rMle%ax-EP#khP<jEGpV^wDM96E!nCL
zi}_mhl+jw<o`DB3qCHj<Rm#2M?we!?W+@@CpS%;gNWIjtwa%FWWqzNb721WE?Tit0
z8VKx?r-~JmAB1Zd()n99c5h+rP};0swFG?)^qdQKPA8v?)hHmGOKkM7?EN@*uRsUv
z6H?X&;aL^hE2HIxKn^Iq=QHi?AN_1y9fz#g&aaNN<{IWg4nc&gyCjPmE74zJa1N|j
z4&d`Q$tCz0#AyzQ&wf~N@B5jqYSK?&D!2mVlSsjtLOARG4V?vi1q*EYfoU_>VEr9r
z5AJ>uW8yTLjm;~wg2FX0$o=CGSwKSZWz3z}Yr1UGdil-d_exmDI}A96gJbT+z2s67
zHvZtIaPp+H!J(uVinvQ{<v94DK9A%tm&E;=!wXw-A#QpR<vS2~P|$V$Wik8*TNeJ8
zzv~w+O=cm2kWhj2RbvN*?iKnn+M67$@9Ajgb?L)e16LFXiAhX0KbztdKUq>Ed>cc}
z&!B}+%*xF#TI>J`w{nnZBlPlXpJX)57|C4$MigZwt_CC_!tdvI&vEt=s$-;DVq+!?
z8aFOQV05^}$Iwsqam}#~Say!0Ya_(=Cr&7X#8*4+$6+?W&bw)v?Eczo$_kq0aBdI?
zh`;ASFP1y|bleXUIA@cFw3fi&a4C;AQV>@jZ5m&Yl%}c66*=|-T4{xZET})h0)$JT
zm|n*}<(6jR7VUoNP=>)6EHCddh@R~}ihkq9JG|t`n_#mDw~&^24CM1yUK@dzFJCsV
z$bB?@{G$q)PeAe2d_G$HfnUbpHiyFm4Xa`f$<4cL=pCh|#`&h*VuXVlMvpDuc`eK{
z282^ezH=VZQP*Ny;KQL1G~<d1Vl2AAFi$@aLFM~pj^|+SdP>4)Wd*;uc$I;{py^ap
z!5qz%jUBJ)D=~YauEClZjPxJJD(LIjxDhqO{+`u>GMWL5J;cMx(kJ~oAPbA`G(I)X
z|Ilz~5D_V&(Y+*wtZKpeR^vcvOh8a*Jb2sw>}%2459*J)-6FD5ozCPuPNW=5Ozo5t
zw@!s9x`*zE7Anb%vGU5pLfL3n)zElggcvD;mX^ZTf^wjJNRxv*wX}9(8O<3$Y{e#l
ziZP}Vs0n=5d=3tdpe9sqlRT1;Lve#vEmxYGn<14j=45?wsoSQ2RI*n#**`jR+^RmZ
zyLORPZ&Dy)hRI|_)K)Eje$|}aamT@Gtg@@F#wA+#bbhLEvEc15uvJXF0ab8ym##MY
zUEyh&xQt~!Dtw^67TO|Z<1vfZh5Q1?onXcGjq>A15;B<4=eqBrjsN5)1)&(vIi<sh
zrhKr0P3RiZ&|5*j4vOD<NfnFpnnI5CDELqI-a3g(*d2r%EB96M?A+G<Wis2~qk!-!
zpA+e$3&>+fXe$G%ShYE-hzM*ZX!4W8S#6V~Ma`LalWG*<^9s&(*mmP9-h_fy()Qy}
z@m=9W2W84gsV$Xeo(?ley8QP13wN|3h{hey(BpXAK7)kO{B5z%I4MOw9+Rzhm#>rg
zHjr2>`k(!J6Hkdf7o2vId09Uve0~!Z9bdTI=zB(&i5@n?$wcvb49UtX`LNbpX=v{@
zFRR;1NgQ$n#PY1{f>}-mq?JAjUqtpM?jum*=l`tW{;t#@gZ@qY1i|AUl^P|3{!RSE
z{)^!x_&dW1;_Dx8{GXzUC0+&Iva6Uxk|PHvAxa#Jh}S~YnLp7L1BH0{jnbao`@#)Z
z%fOpZZ^~=PXsB+#eekXNLka`8S;gay@5$uz+x2z(F;KZ7Yxt1zN|KnR8uvWb=uS>c
zcd#K5>yN_Id&97xyPzbhDNCJgVFBvWvw+bOXELamtHTp0iNtLu`!ume0HQfJ9kZ_J
zHL5`6dfj@NM#Jtle=RfX9s4C4H1C<uitSXwSHx@lq3Lf?=d+*viWa!aBDpehHz3G@
z`3tL!@9VER;d4QQI8k3!!y6_kcgHr<cfP6KcRa6;?zwv1OfCO|@sPv?zkt+IOByq6
z2Ho(x&IiU4tu4k|15am8(@kmzN16y`U-zp}=O;tW#$agsP**AmVHKuVMpIvir`QeT
zVgTQz!F~Vb1YOwQ)@8I7Iuz$+>mkw>r{qBy9UaeZ??5s$GYjkQda)TTi-11x5eA}Y
z`PD;GX3d22+We~;11t&o>Fy00erBN}`Z_P-EOpqiM@?IHSz@jEUHA|~6#OeX?CJCK
zU=Oq$WAqq_kB(QF9I~2<y#|Zz1A}5AMvQFNeWT@E9lNW)2;N$YH8l5;&Im#W%a|mD
zgFd0Z;w9Rx;Sl|1$;5u*>$_=9670PUo&futuvQUet!M=@X4LV~WXyfMWC+CB;P{|G
zR!tEr*b|$KvKJI3?Gy+z6q2cJ3@P4~TZ>CLoL767-3({an!>k%_S)BxNqW#l;54Ls
zdCdxp|H}l7c`IC%flrhwLm`yx*)D=9j^jKZb+AF+ixY6RC~*I>Qn`(Xk56r;mG*@%
zwPImTqx^BKSB^254lPt2lY^^TN5nDG^}F6JaySm&wZxcu+H2+zEqvTnf7;7J0{w<r
zA9?kvpWupor=qsY@lBxZ=BLcRd@9BrHMi;itmwM@ieyE$86Iixw^Lk}2_@nr7$B!B
zXjjP^H2`yCOYz?H!k5&`QwF~6Ct)}Qb$OQzFsHTcPYqBdhcjm*SHmMe269IEMUyLz
zD`PqEo>|MqNG-!KhUD6ta3TynPoy<U${!uS=!O>D%6s-HVd&F_$+3tK;H?6L&q4&T
zwXi9G-&iKqFSn;H;~hmO<lkFs=^V5KDE9%Zlbk+v(3daN9kJW@KL-6`lOcKe0j}$2
zjFzg+Lod0}7?md01H_W;7jvcF>HM~y-DU4>PPSUV_MBrx2mf8fWGw8|d<({@yb`}<
ziw}D$t*3MzU1ng;E<(10E`#w8gXaLIfq^*od+)Wh3NPo4-vSAh#O1$X_J4Z%cR2rF
z^3vH&2Q}aI8(Ij-?d6NQH?(@hB@Fo9SagCi_bvN?F)LqUyPfzbuQl8OIZeBZTpMH|
zg`f};em3s<yEn#kp5?=fBxg7kBB1Y1crh1EXg^Ai(EChkU?fUfY7?Zc(Uah3R5uAq
zf{LKe?Mt{W#pUVINU(4c4Oj2_D(hW2ZvDs~s%&}X(m}-srykdV{J14{qN~+EC|4pt
zWop^^ma4QMKr5y)&IJb9do0Nzk$&2UL}A=&+{tBj{@qn^jpfUaOSwT9OAVji2*oGm
z^3DZ629Mp(!V@Vgu5U7^Vo;c$iq_4V2P?*6Wv~?T#;Y4{sSBb|I3KEK((p4#gC~=g
z4bX2ai(dH8I8qSD(&vfxdrI%$rKMr$ro5TC@arcC1;j6CXYgZ^=@xV|!o)4S!v#Z3
z2fvr%6gza^K!&DXIk<QcWt6JRuS2^Y9}h{y$cwAyL*v|(xybg<XC7^DVyzn`kfE%M
zw=RTe<v9gs$j(|-${@|eCMz_t?5+o2v(y7!0t~P0U#v+#w?#;JYgIQrj<9T1%txO#
zAja;kat*j{Kort2xrV|jP=wR+aXj8@y36N55f{HEDDE3Msu6YrDbb*souT|ZqV$YT
zGX~&3oSNtRPc-fifFC2AC`W57tTLkvB{KeQ!$B(@_j5pIi(>ai-6hb#m06d>ee%>=
zQq2L1^w&P#GA%)oho!Hy1w}gZB{ZMi&#zwN``r?ihL{^S<tUGo?>bp|o{(Q*a7pRo
zvjmOeUHuVZMuGNA)-jinat8AGJ)dKOXKSOrL^+xvm9gNFk`AY8TXEZ-8-LCvjXUBy
z`Ekm;kpfrvQ4&{KVAuWW8F{ub^L#b4MnN{At7pyHK;wYaQM(a*xrP4sIeO45bgsTa
zCf98mA3*J4WdQrFehg|PGhStRgVBHh<Jo{Vi|77Fl#WEH;Us$BY{FRvrKtu3;Er?f
zd<%eK|2u(<b#8No7a5e6go)TGj%yw^{I}Pr018*wbzB>_cql3hcD6hdyY9msrM+`4
z;S<$K>uAKwv5#%tzdYRgt;B;8Y@W<BoI|-$Gl<^FQVFtCT)iE>P1O>8ADZXF;|6o3
zTAziFYus9z4`5r|n%zb)VU!HLc~kGake^Tjsn3D*+N$r@?guDngI~Rj7fI~$Aeoo$
z!P`gQJ8Dr%i<?H%odZNrZzA+x8&ncc{t`ofyR`n}fy!Ur+W+sg$5OF_+@}JOqgbM!
zBXzp+M?*1zOg)nBj^Lgnt+giVd`I@RZ#3wec@GaF$z>|co`8Pjw?~30<kPpz57`+~
z2*oGyeAl6Ao>}KK8hRtSi#$v>eOEZiS4eP;07`ti-u*^S#&gAcwvNHFWLid9!*(8e
za`nn>GRWrEmR_<vG8A(c>m0)Ibr^e^;0iux4uGnRXyYf5+*@YODGuFO0Ze*A7bp9s
zvl96w^@S-u+%G6p!ZBXmBm%hN)g|ZmkuLR-G4#|bZFWuu^sUOt^uJ>xYk@;?ErA9o
zLXIr6&%Dq!o3ur^RADe?C?Ox>#_{{0T?q+>(v!}twVRE>SlPyK3*3rXpK_XZh7H;H
zQhGaOxxSD05YI(eetTn{WPhMzUTYN31Q)vdyc|M7+FLaB<cf>5syV*}v4WXqS=_r6
zvxZO^S>VM!cgb@IuxKB9(Q9yX0bwWx{N0B(CmME?20_YiL?cjJ!)|%%nHk4!dgTzC
z!pc^DX>Y0h#$X^A#v&*BZY}VA9(%|BwTXx}f%!F8Mu$=VcMJOuND-_q%p!9!UIqJG
z{)g@~m1PQWK9;o|uWl6wcmht@IhZyQ1<C_um?zyWozvCOq5gd?;u2zQdLKtY@y(Zf
zW|J=%rP!&+SlApHA5UjS4>ry3O{aiPzgM}#HG$~2(3e_w|3Z3!X9QhM{ab#mn-bU6
zOp}wYh*3uarz|~=i&5YE3XStoi*uPEuFR9;zV(DZ_(?xBd7chZ6?*O2uoza#Axe*@
zEy?-Z35yqcVjR&msU{<$Tje(Q;_Z%n_fntE$v_E0IVwD2L!#q1TKU&$n)nKVXi2U)
zb99yT6X_;;<DDE1a<H$h)h$ioy%e~_+bIKIy3&=$f%(<cnc}}UL?X%@j9-*FX<s%-
z!@^C(sXTETpjQ=ZyozdI{s%BdgA-o;#%=azz3#w7g-_S?fvsz$5K6zN8HMjQ32Wo2
z9xg;+FQkv1DrZ#sC`+n_eozkV8H)Z%T;|~bN|y7_s(fip=>?8QW^32b(oIc`pV2{^
z^u<ld=c<rn)nRWU%AUF+n*q5HqyFG$wy@(BImKo<c{RW!wL7)qv&VQf848~C>9h_m
z=rmXE?xXSqP%1)>hnBy6oBzLw$p5Uqh>zE_WY7fQU7V9^bu8u(+H0T@5;;(#OfwC6
zS!5Gt)+0==E8|f(j=Mn&?9Q!NPN7RcdPKbEH(!qq>+Xg+yLdc$xmT9JCmB<Is6f)j
zpsvD-;Mw|EMD&0L0KWFkSs*}Q?rA`Ps^FZk6`vQm-WuODJ2TFDskBs9xdn=<PO2Un
zrIjDhr(qiId+JpnpT#DH$d>L&PzvbKAHo@KKlicwiC9M~6WH~iU8{+rQ7L_tUE$3~
zuia+h`s_xhnG#Gg5#ZQ}TG`+EZl>7s-b0BOk<aOpmRm&0xriXz!-b`@LM}`hmB|A$
zBI+5+<<f0H1v%zXe*GFknUX=1)yEGP*`HM}mogIC_bkek)>9cUF^ql>M%q@0{+^V$
z;Tdc55c6@#&{J94Pp@1wNMEX5Yu1;Sr!bbjHCn8lq3~{v9T3fE@gu2^AA>6~Er>&J
z4A7a<e<WVe>mD>b^|}<T@qmWsmINe@ux{FE<RW}iNoj<C+r~6d8F59k?1+|&Q5o9Z
zgWoB`@4Aok{35%5;gjNnnJMXj8(L`#qIAPCzsKNw1;+6VbplIbZv<TL(|$pwe`uoe
zhRuKe6_1h<`yn<}a8{!g&{Q8Ia$i-`?V+d%SS3bWxN_Z(yKF4&;&O-F_obz8w(&K*
z*~KML`_>;BKM&4@$2wZ4Fe&p&G7u?=%gD+mEuj$&G|)r6u`bH5ScZm%fXpz1WcPQ=
z>se8$^wW1fU-fd`0c&CkBnW>1omA?r!sC$IXZ<E|pPi`hTTQ;4IysdOIG78WGC_t@
zXjgf@ibL)X%nXSKaG2~`fzM{XL6B6{kQyMYP4zk@$|MB{1PG5+ojY(rEN$yv{R&lp
zFhZHTVviV-eW<}HzhAt!>ZZh;TxocG^izBDSM6n)@TQqIC2*kb)TUmiWnZr|fe4;|
zg~84^iGjAgjaO9MW7Jr#*~H1&Ul($d75X?$X>H@R@lvoOgDXw-$VPUoH62AVHJK9d
z7n)27*&i(fm(s~)2|8*p>sYx-$LD#tN@#=VX1r7AK>LOj&9*{Mjw|Fq{990cxr|1a
zo(?I*toL2pKT^nH{U(BmgQ;VZ7%Ysw5yJjHje{NwAM3Oxi1oHco&tE9E@fcn>BAM3
z4{0`6<D=QT|An97hPv$(Ehsiqv++??tW!lMDA#+N-Qd6`kKM1VKl_^#<^NMM%sjBA
z-&Z|2W(V?gGm^#wq7SbSDv+}-=&tcRe@)P4ys*MsvESc?vps&5&A8{~#W?n-pW`gu
zHtAUOimrk5;9w&%yw$Q>fxfwSz3^GveL|4=AxPS-%c*HOQEn`+UctbS2og>LO1>jT
zvAh#m>#LwCe>I6PfhnyFswrPL7m8<bPurd~@ZTkz-Lt)RR>xNrzUpVFA3c$nt}i{j
z`H&QE4Q<MP>GDIh8FuQc!9No40uxL*zd9e;1jIXeV$f`Js@B>iK+4HT9*?NlIv?Ge
z2vPuQY%L!9L0qvb{(JA~i*0VFZVCx^TR?t9^}U1W^iXzdSJCA1OPx_<Wq$1u3{iwn
zK0Cj3{W=7SRK#^(viwpvrMjUP3Ho|A5ih=`n*GGW#VzacG=a?W@FESt3_A@M5ennP
z_1Mownw?Oi5SV_z4TlV~1_|l5z)H-Z(RjPfJrp77M%QxrG8Q2)BM%g@Yier&srk3k
zS5#I*m@bFf{`{Kydjc)*!*`BmQ_^hR1XbaxZll}M_l;7J>>vaO`xsIL4bs8L?8}ii
z`W(ZrH68|^u8B`4Uys<>!V17-Rg?ywnw=unOBd1H*}0ZSkr5bTci*7eM93}?3f%xk
zScf=Pt`qwO;s-HWJhcmM-{<jolu_anm$D#=LTfssyn(^?br^Fb#NQd7z_8p6Hv&97
zASN}22O7H?DDeo$LTn#eZYSPed}cKG3InX8#kv3!@L*ar?OBHcESZIwrns6XltVTn
zff3$&@lBrl9hoD*tP$(H&4OjdoqUROgrlwXoYNd~aM$j739j7|z>_(|(*f@L!4Np(
z2j(jJYTzKL`Sl%H_5;9)i}$|xQK$Uxu<=c<F<0L7Tqmh7IXzn^UuYBdpnA<DtG&M=
zySbiYG8(ma^OoRs5L&=5Z1G;>vCXB7_ai1JXD}|=9d!kv_;RYm%FnjIKCX~;@bVoc
z(YD>ep{+URW4$aBZESKYopNnd=f_j-cxyQpam`UM;?b}1CGEw>xGGlTpHZPR;_>5L
zMt2+xR*pa0y4C6w4y?|a*M{f-H_sIM`x5u(Ra<X9GjllhkoPzq>iy<AJ`r4SmKVcL
z0xSrY3tr>9Y6agpGs6m!#1&NUEKIt9gazCU(*=@rnnwmN_7&v1>ZJT0Jk}pHSCWk4
zXa8g9HLlPJ?YFUX-ihfC;PTsq{+}bg{(BM6TP3AeL*W@XB^Pm%zTRCwo!uo1lh2F*
zH`X>)1-9bD**-2?oSxd#;3k31W%;<eC)BM@io{?pt_^)<shL`7Qj$o<?$&*`Id&nW
zzSl3t<JrxE5>#nEQhG09p_zellbm|sC*gsAcYLSW<T561+w2f(f^$#E3+FWlhwDZo
zYA)r}{JoTl{=JOxGQDH4<v0(EUbxk2+?b~atJM9Sn~d&Vm<R)B&%(IJh12VsfpKat
zqghvRIE_unJd2H0OCAZdP8JOswT{<aSDok0ikqSXX2ETUBiElIw%hJAhJfq&K>|K@
z(|kuk5e~0u&qa^tEiVU|0lC2RTJCt+MI!7S$StwxM|qSgFzc5^8Tn>O8~4feF}Kxj
z$F1-y;+}43*KP-wWL@W5qej~Z3IeRON%U$X=(41U)2+`I%;faN8RuUVscvB7!dTey
zHnpb77S<I)M*^`ovmA(FdFXf~G9tdP_f7*D9oW^;3<5FODLVL&RP0WH;IL5B4b;1Y
zQ+OIX4=%3K8d7izi0MiOLDfn|g07r)wYa6eAbg}>0Oc!duYoZmeDkC@EC-j;{5m!8
zt?KKbUBY(pwRIqD4DP3(FJZ?ncg1S>A!A!%?$4_vMQXE2er3JF4iHV2$q4HRRb3H^
zL~?Nr97LySUbSrW*?Eo!J9LX027b{%u5y*THhzNgcRWFMPNkCw-W1Z>IY$EE#eiCL
zf1$jN?l*Wb-o1tLo#s|;q{Ko$Qfyv1z>)|Sm;sh|Gx<u$T+5eGgm%$15b>eMo7PZ-
zCO+%H=o0}af$k8Sl$zov7}qyjyir{w7mo?^%*BiNvM?XICh&+WmJ#IwKgpIYV@|FG
z@jHuGVcG9V$1G|reMWlS&&N2_i--+wVz759lg<ck836z9XIPVxP=d0w2I#sgxvGuQ
ziN|K@5(>CIn5yA%4U-ogiM<_wIiZ31c3dTzC4mt|M*BPRC8!j)gIm+AjYyHw+R&||
z@_q1VHby8Y|I4wuf+?ltf+;Y>vtRw|oF!H4XXe*~lYvN^3InHe$-FvNUW+;!J?~zA
zm*fPuS((e76z>D_XkGrY=yzYMp2iEZt9HUrcX6FoZ{k3lo%1e}DSg(M9DXG@TN=}(
zm8ows!N8mff0fJQxRVvsNRpA`>1N+<F%{FFgZZv7`X`rtr=cXu0}j#*sg2F3=)eAp
z<sd%VA2?|{=5UFfDjk%)OYlAKCj!Lx7L=}VayJJ<!KDtP#9IG*4ki20GTKt@3L2`E
z4q?6l7s|_^Rq&n;k5~(Kc5wV96|JTHv)yCq?w>C(bT0M(7rm6}gU9<xs(yd-oA&O2
zpl+;yJ;`S&?lgxrLMUXv#wUiTp5FV%5PQ1v`=TM2^A3E{!6>)KU`WQ81HL1Af@ZI@
z@dL1-f2}jU2q*+f`sNWgHELvBA<n@#LU2o>Q$Yp8Us~ol%0v`kv25@<ReR&>DnQv<
zq6s#kE5&O<VEcwp3zPXWK!CamjnZo(+M2iN$;V;plvs}eV>xVKxqAS5+RfR_A5jG8
zy&N5vbquF@EW6lQ_(@-Fx1spmGLMC<OBy;1@oM?tx7ub1{jBWJGJcMZo{K^hc0rwJ
zK;bq>={wK;I%&Yz;f0iwnBU3AK-pJ-<hM1dUzn6zyG|MUGmZR+>l7T6jz5ThT5_Xg
z4xx3T8;^TU<e1nT;_yu+H;Q|{3yPnI8Jy=MjII~<0Ysr~a*)o<qq+pKyjGsa71hvU
zrrlLz|7scsw$FD_FT8y*U^C2%ycKRDJ|Clrp!siZ)E~e5<mQ{;ZAdZ3s+5Pg*F*PT
zTEU8{b4lq^L9RzKE%3dO%k8bk5w~-H1Grj(#MC4*;2mAND{9#ddasGI-yA@{!z3Ec
zNrT5-iThGd;+)vFIt2sUMe!2>6dP$lKfY0TlY?7yho$_cFwD{M^I@%>*3Im4QS(p}
z=L;D?Hqw@(@pt`IhJi&?%*-M1m1^pEfm;&gGy7C|+oClrf%AjOC+GQS!K#irE{1u&
z*d%^mdq*?lgU$!8079JNl8=uUwQIcv7ZC5*7TmpGfLLZ^to1fBzUWaK{ZXgvp%?v1
z^$Ab5rDcohyNw^C(yA)jj4*TJJk9$L;%Xr9C$Q?r`zOiZN*SJXAEVqp9oqAO0hY-^
zoV1cqEEEocc=56|@D@TzDBv+W3J{jic*aMstEJSxJ6N<|q#jswA)S5+7;?IZDbwWA
zaw~1)B>ZH1Qs+I|qWlX+vs?IOsr^P`!&;3W7;nrwG6W&b#a=ki!{mK#Tztwx19Smk
zyx9pcGhX>93nyrLs41XtXcCH8qJS3(NTiUjF?!!idW*gQi+ujmZ+5d`+`%|7BX`xg
z-u<7PsL7|RQ$>qk(0-5WIv*cQouQXccwgHlz5^v|D4AzG$6U!p@EI|ywKdc$6^E?<
zr$OV&ENkj{KJV?N^+=nJ-*(jnKB3d&t8+RS3xCZSYpTAWs@a2yv&~U0ACZ^NJ5HrA
z>+H5&<0Wq~lcxtMbErI7)ZK$3%{(Tg_hSk96F(Q{^EWhm3Emc@;|e(5sC~+7*7J_R
z#t;D94^f5BN_GG?!I8RGg`uIL3C0BNipNa<7^D6dR}72>|F)z3dodA=2LJC-$NvYU
zvutvVI^57@_grNu+xx-K!2GXJKLi~DRvTBZaA-%!g3+pf*#1wKWL`u-vS+ZOUb*q>
zodD1zvN_iH?U(`9GA+=vjFl6BH3hik>Gt?<S+-Wtu{2h04`GVQ27xc3-?D(Z`}_5O
zz;asPE=R}2I-QSy)b5=w=KTbOk3!@+qob}z7{aI<*mztG3Et>O7C>+(dvBl_Zx^$l
zG#gmCZmI_8Y_IqRGIzK6>2Ze~1>iX6(ho<cnH*W5RErsL8Jld8<~d`DwJgWDz*XlB
zmJrE#u^d}{&#z3u8BQrM$okvqIybN(z{HvA)*_&gc%P{LbdDgC$9?9Rlhd;q+MHw=
zlykMb%Ks9S{Ej^Eg~8}Bg4faR+aIIDTbD5<Bn&R4<)SM|KQ^G(KdwQM$bbo`J<yC&
zQPv=5#k{7hVMJSVvvxg$K4NF!rVus$i-ra<GbvXw@=G37D0up@M8nnH9l94VGm+%m
z*g#(KHlp^&qJTCgIy%<)$Cnqsd%JYgc~Sva_RhQ<NUW-w1}{>Z{8LB%anm5Jlanh~
zLO@)sbZ+Gv8N7)9v_s~0k7Jx)BRSSHmsdVwp%tjv$mNvjwT-Ya)Z8&C<;whv2GRGV
zv4lT(g3u~PP1yBcdf`RK{3L8TuAQbNznPzmnQdNqX&;uh0Y3Ex=7zMYX{CBIJzZnl
z@bGu4ME*+y?Qf?IP<il~&7E=b6|@Fw>iwZVFqmxj10CqAJgiE?B_qaPAXIl4lLg^s
zC@kK;l$?4J{pq@5$I2g&&mRe&Vru3ZgvA#xg1tStCZJod-01kDtb@&$aaQCi2eKb@
zis7OrSn9+OOSQN{PkL$J+$Es&S$=vef)rXQea|J+qd75DhnKp*PN&Uq*&;tG8tFP;
zZF`f8m7=$H{|#illf3fCsCNMj?}fk}HR-zeP%ji$;24G`U6CGB(gv@b=ZmdlexD=@
z0M%0Y4}k+9PO3$xUG7pXcZwC%>r{oO1hQeKqy|(0kzA%q__}YOM!3Hn>b)>n{8<`x
z|0|1NTMHAIFaBC+d`FIKdBvN?+JvG!spf-=;Fqec&h9P(C=;J7uPpMOu>R(g^LNaZ
zlFz0s2e|IdlyXnOtfb-Tl5!ULXyA2*y&!N2)wgkG0%iNhU5re49IXgETR(R8e#8n~
zW2}SeL73nvrBVjI(_}AKdUgb-I$35+8w17%OBqq4#s9%hoBu&B`FHqslM)B4GO!DT
zjIPR|ZtIk4=YRoXIdppergWCg=71Za>G(9lr!id618im%9NfS?kFk493V<=IyANH)
zIvb26m@Wdx=MwaEn@!)POkT-P$cc_mnG8;fj{(i(c;zkCO(Bjx6kR7SOUK2EU4MWY
zINr>e@(u;7j>jDc85IW>Cg!Fw;r&!#GFQ!$KV;<Nu+<$VK)2(NYhm%FpC5O6AJ@=6
zj&{A~d1!`3=vcMtc>QmQ_Vaf#xM2Y>pd-H~7+#N#1{Q8!Y_H1eVoH-H`S3^zgIHf+
zo>0=UHLWm5Twj<#N|#|+?kjL4*F5I4h>w45D<{o_2T)1Ow!fY^H3GU3<&^=&f=%BE
zIz0#6EpctlD~-kKJKARx>{M(X?}Ouz?Q*XS?eH3DX;v;F*@rF#g$9es$OI-L84C(A
z<_|C~!2M!Ui~30Crf;7Kq0dO)YB1Ti0|n=!xmjaN3*0QJy`#HykqH_bu--jKLtjBv
z&4HA9QQu3W=mbEW8QB*JM%=DX0dqc~&Q#8+w}q8|G?MWS)2nyY{CDpbL$;B9dVO3#
z0Cnpd{EL-1QHgKEs3~nW!mkkQu>o1cGx81nJ${1&>TU{nY`9n(&b}UMg|oLVeW=g8
zdg-vDiJz{Y>C%DQqpXO`8t+S3xY#8=zR~eNi14qxou+JOxosDm_3=W5-qCX)nIet$
z&S^oPhxX#d@B6rXvg?0>sNG{+5Ox&{xuKXesb*=YCSMtQ6!F39`(qaE5>nf`Bf)@!
zHU9NS9b5Bp_r3gKtv7u?hrDkku~=abW4(x-91wt@Ry{)I9jzuw>Xu<<t_hi3mFiS*
z{n;nII_3a0wR*3Ch{WStbw|7WsEMydLlL->18S59aO46{n2c<TdN1nEWdnQJvQ@ak
zhj9=rS_&6qAFYFD8MTe-1C0mM`>PCq&U0YwC~?{0UVa?&njznUPMaOfA7qm#fc&?e
zYr@lA<=|L_Az(*FgLLG`y@e&pESK0=DsXKxjBaZSTSeV93fT!mT+Zc8`S)OWTt&;~
zAe|22mgvqoXG;SZQ?Cdj;O$+q1!>%Kq1jx?r!}kZ!zQ9-5$K94Np;?;oope;+X>@F
z+}hB@txjB(p)|Y*HY(JR&-*;t_Qa7}cc%PMDd(5Lujfo{&r;1EM^$(!{tynt`!{Jk
zQ$FOMEq$3?{|#S4`Y0+<riun4dL=YO%ibTeR2&|C;S`YQDSb@ME<o7V$Fck6e!!=^
z;zf>E0n)HgK>E0R1rwBDnrpMGZaiD--#BRmu?-9iP#(SRuHr>ZSSPO}TS)nwvOqS`
z)wJm~QA1%DMh>j&cbh56VC{6~`8tdg8V$Ke|JEvOY;1P*8KF|^Jh1Ma?d<A>r1oHt
zYh!n}tnC@xE-gFZMumq*1KS<B@%j?&w!(y0a|9H37Ii=Wm|2{F&bWP;g4-3R`8kg=
z+;GGa&wn4MJU0C>cGgwO?@?UA+nsuF0?rZ;^yN_gNXA2VQC7A;N=Wgy&i2v<GfCm#
zcAaxxw`1r3Rot0JQ@!_lyiU$3L*^5iQpQ4TL*`9Lna5Bvr6QSUZNmu_p~w)KLZOgM
z+mI;3F-E3MhAr8MO=jD?pU+QE&$`cD>#qCU`<%P(fA(7b*!wqpzn{<h^M1d+oI}Ak
z1xQx`S6HT9YkN#x()gKmw>`ku#ybwM%GwCu=t#5e>~mw5+V74Bc^<KkCcp8YI0dsS
znCn%U2&)_0VAr>_&B{IGA3V^4rPXx>%e3|eby8w%MUk}IslaEa?-BhEW&T^G00Spe
zh{seqWKSO{tx7WH3zgsRKtns)OUiy>_vJNa9v*jxw0OC*6>|hmc&CygSmrdUfATs^
z)RXvdcY7Llw3=4r*LZ5G=_g9~A3*A#MxZ|-!`1LU&=cKeOkh-ZA}5PdqJTU{^BEUE
zx)5wnAuQ}lRo-gfaFIC-8kEN278AN=&`NqoUVT~te_Dpu&Q&#jfv>xE>5L?X@Ge_9
zw|YQS(thhHUp>Q)#|<sv{FwEo)t1==M5jl;*A(P97s<B1C|`X(Od&<k8Nq>v=XQ|!
zJi`_?t<;gpa+Bz!2v-3Lm~by#8eopLmmUw@hV3QFjy(V44-M1u)xgCu7icHQ!?>*$
z4E)@Z!rechrmSB3GZ`AQsPZ5GJKk1{$0dIuIK2~bQ4E=-opK>Ml|3B8<8gV35|mJf
zhgDLRY27&S?4(Zhgk5mwfaJ9(w_U9DRGJr9mXTZaa851QE13H_e>)iR45^kEdJ54j
z@H_8nB`hTMj0PjNT^M2S76Fu?d_uC)$NlmkM)taooM;Xn?NK~o+(j*$DjR96rK?mz
z;GN@P+&;*__OpWcfjjy)9KX|$bK6w^5m5W<xpIJkBe1iFKYwb-+aG_^fII@7`+DPf
zutW`tt2H!TRnw&UGm?`i#+jOWK}So~AE;x=(MhRMx!>$TTto>SI30)vk1lt4?xEfN
z`;pdiIkdVOhAb4T>P}L~hG6e}vuNt<OJt9XOSHIRfwZ@ia+l9X1;4$2{~Z_H^?<^h
zCoD7UY_QleMii!}E1CkbqtwTfn_G<jI_&1JZC4Ue3sf}97eJ5*n$~Z6gNasBF=CAA
z(x0G+U!bDC7Zkj#_+Chy8QsMIV6&$=lJ7?kpT2Y);<>$?r|+P{@=keDkqN9c%gc=Y
zA44G~L9@67oPra};$pq^oMl;;yX3O#!w%(5S$Hg`h+ih6fE1>!P^!}wdLTJQ|I)Qu
zcdOR2;i})uvWCQ<3hh9<b=YwEZBuF|p*geU|AZ&ZgGE?b)u;q2SXq}NbspVduzUa5
z>(;D|4H!|ubDqveLgxMpMdR2MXyW9*S5cq+@BxSQz0&TLQZvPckW11IazTILGwM^T
z=(<FhMgr>rE=yqyGVa@ubafk5r9;wsm;xbR805X@dBi<hwKP;50Dd6fh1skNC!5cG
zj$8!qvTD>|m}p0R>95Pceezu|YBCraj@<5u`MOlbG5mmGbz*JgAWE7$tE?j-KQ1Y{
z=X2y)jhcnzm^o&IbEq3$928ON&l!)4xBV363va*kv@phuz)K8DX0$T-!zE}+KJ&n4
zeo8Oh$jV60Y?~KgN{qVVQ1+JZkTl{@u!N~dJvd8dhg2Ret=8`T`j#}`^h)X$Y~#ns
zOD;0rU%k!s4V~+%#_vbtuJC|-rK_|lrjErTW2nY{%aZewT1k}iyWg7Dm`z}rV5!~x
z1WO9`ZX9uawJCj|ez$i{7Op_oJh~+3(=PDhVCXc1>AToQ3fyLb!t||ZZE+#mYUevU
zEE*Aowb(*@hm|klE%ej>J<EKS=^TQ{a~@;y#9EKem4_p)-JlKA_RAIfoQOx@Bls{c
zwx@+YJ=X(r+vbZB1G#`|xU<#H$FFFeVDNg@brNy6REtxa-@TqJw(*S?^Cb(+-o)3c
zhdOnkQ!x;;VSfE~k&WX_vz_kb-Z097*qK7rguP1U6o4GndaPHlG^R@>@^Y=&m70Xw
zdWoFP>mH7>uvPN(iw2v&<P#W!(gGbT;VAp^LpXpo5heH>;dunaHUO?Ludi$RkFXyQ
z&N5T}2PXf2zxDW66AaHgTTcq|?FTf(8Y>Owr7ZR_cJ@2bu%q;gS5YqBJPCDlW4^2h
z#|;>Ed)peilz}E9A)=|N6_c1EJjs78j=2sJp<od)?(I9#JIDpElKoMs*0h7*AIVKp
zPD_JSdtF0I?AXlY4Hsx)Q8!zeeSLkck#zU#tXnh2n6$o7jaYC-ZCWY?1rg`P;33dX
zMb~p3&BP0zo{-D_m=X>#SV^fz@Zy2>FE~g|Fq68tVMptHf)~xE+P#eTa<tDUh>0l7
z8OHo^0R>wM_ohhC;2SW_;&xR_sR)F?y<!_wA`rN5x$vh>peia~toP(*6K#_<dCXg>
zea1BfwB3sJtuP{tb<HIQ+EgKS@8ii@;|E2C=FMDXFe0v9)8?ETGC6_eWtrxI_oQvD
zFW25#`TUgGuvh0hT927_MX=IkD%$J9G~trUtIFK7Zidb(#L^aQN+T>Zvo9iIVv<fm
z%>Hn4IrniCP`L&w?Ftq3Gyv4)OE~n4uxy%~2bh&P(19GsGcshHEk1x|nI)jhEiT7a
z&V@#-gQP$Cc&+>KVDB7n;Pg4cC-r+W))7Wu0(z<HCf*Mf*Uq(i4ZCfg=DlV&Xjj(_
z5s4M-#juW#4WUPS5cDPh!61oeebtoZdj%K-Wc~Ur1fBEC?oVn!NujCA_k(KIj=LQ?
zm7U>4q{?Y#>!Eluuy0+b!;^~CD(KsI?*Uw+s-=|&hZstP7%UQ_Tmq(kxO%;`Zd?Tq
z{;avT+?fE92D6vfvzvXtfM7Kdm4mXKFmxO#>Pb6D5S&T4=Wt5Mv=E_p9QoKaq-))=
zy#f8opl$hyvX_@(W;eXiteSAe91(@Lg$gsPGYugGKXp*4G-p4u=#Xbju^fHKJoA|U
ziaD`WdL*rlcR~)mO<*ESPL12uZUgXg8@Gw?UUIMAsy_y73cZ?jMTdgP?WJ3Y*wzU|
zEwB<S_Li`~V^>vRhQ|lPY$AN?39Z!I0&8vRwKN$dF;OK_Y2RYFMNme(;I1>pjk%UG
zI?&$2_*<o=6woFrZB>oU(;VR<II{P>hpD%|n8y9l9B^27Oj?3{?A%dsuZu0O6s#!-
z0Hu3ld`RX$bT9qu5(2m0ie9Q2PRkXB&J-mR(Ab@~+5XaL+>j7;3nu4sr2G&ojm&k0
zRtoax$}qtW5F39XpAFsjGerc5=D!}x{7-7`|0Yif8ApLt?;#+Mg7Mu%t1C{Ig1_mw
zxTq2I*?LDKdaF4=^|NW5>7y<2?>n^>HSD*s0a?F3hhAL2`A3vd$6>+TykOI7iBOi?
zJG!VEdzQk;gkSq9!ZQb{@*~sI;woI8$&PeL-Pg|{H46v{6fQ0<!k3HHK!P)?9!P7F
zw8Q&De~QoL<lu;$-op+GTUNDm$%-RfOfTUwHjJmEuv}9Vl<Qii`N|g@C4tZkbvcYC
z4}8rsFt7qRYSU7DUi9_UVX$~0NoH2<j~~_J`g~s5X?s5SHcW%bj$PT(4m;g^FMU-S
zxPM<;mrydzOqOL3Qhv-3nX%upIdyoFIee9mb%uD;Nvou$U_mePgY?g6u!Dfh6{*o0
zw(hRd+jsb~X?Xh+pr(_eHZU`~8&lvUb1l_Rw)I2S`K}wOUVsS<uT02l5_7Jjh>ME{
zr19!_lUeY?CrF&nuCp#I&KT4WVT&S0l!*tF@;Ik~Fl#MmXz{VB@?hArL)P9RTM`$=
zZ@=1f?jC@Gtsft-_69BG4Qs~i=h|akD3n>PTP$cim@g*0W~L7{J7BOz{-AoH?CG7y
z`Ef+ZMu=VQF`E653JVL^&Bl^W+NX<ILfi@h{h67?-CQ`q31G1ysR<ns5gBWdwxdsn
zF2@+BAUTg-g<O523l#9s+wgG99yy#hC4X)$k$;8GGlZ1Loa34oS*NVz6L&gE_;q?{
z)u9cXSO0)Q4h(qTjKw!0JlIl4-&INkUjKvHm))P<urd)*1o^bBwR>6qi>t@xCbK*6
z&jFc7UKiN%Zh#O|!*FQdd0OtWB28FVQvAOKN=7*{fY9n8T61kQ0fbh(KMB*^-U^Kr
zeydRwym=1h(7mRUH&ucEIo5wH2w8R&azR7eCQQ|WYJcqVXr8VveiKiEQuwLrMI+Nk
zES7#igz;b6U|Dk-@z~=xt~s?Bd}J;62|;S%lA$Ge^{b>s9+(#A4n5V(KPcP+_GmAf
z*O$uih;Qa<Dz{YoETZx86x>La$Odk$zN(QCq}tm<DG!#v=aV`VgCVA#unt5IGlihG
zk^n2h*Xz_ybp9fdBUJZyLBxNu^zT@arkZ#5Y4SSolULS~dIl7UG582Bz|<eS(xcJ<
zY&MG1r$aIBf|KuDHLuiAz?$;gPr5+3`@e41|L++EJ66#~76<m$E3(C3v9jbOe!XHW
zn&?1B?lyR5&88x8p1Y<RR(tS2JWMIR+KcNOlF!M4AS^W)8wG_5={*$Re!}6MrKM&2
zjk?Y-1}+*P!GLyxMwp&9GQ5E>2|dG6pv&a?se$$erOGK{!~3?Op}`~aPzw(pW(R5^
z`x&xnh<ltDyYy!sB_x-+r-%7zYbEUTYdp>Z_fsF&C2^l!WU~VnCojLi(DHM53<0di
z5EJ)D>7)Y?jlqS-reX+TJTzQ$;Q3NVM4*cM9KNf(wqrS!A1h>s`6CmXvER+_5`4qs
z8O4Xr;zpHQwSk-c?IK;X{mXMoN-P!9jk1lO>gZQU-e>XcUGNa7bU$7wAigg0(=Foj
z!%D)=)NNQ||65(LA-$CWX#5T=oodd0lruBzZ*Fy3-v3;Qp<_!<p1TtQuXfJ4w_doX
zR%U{e{KnYwfsj^6)2K|vwE(^VVU-R4N=<+CcMz9zZM^Q>fiQL5;41s9%^e5N@HnDb
z?M<d_5m~3fDtPi8hpQVCdY-QC5fv=?NY{l-V6G-IgyrEs-NE$~kfOaDuXwd|C2n*g
zT>yoMxNH~N<H?#R%xP$KG}IvEgv9#1MNgeP3;9V{wjX`~jt2m8R8pVah-bFx$d=MI
z=r(qdzBN5>tD<Skx6PO`zdy-omG#Ki5C)@}loQ1$x&!{h(K3~6l(Bh{t`T%~(lt9G
zdubynTvBfpOXnbD1@eP7P+LoHY&O6mm@%oYV}Wf(Z*6bqDao`0WwRA6bI~@&oiqIs
zhk`o=xUU&Jm`peaZ_x?G2BuRAa8>jl$p>X54ZM<Z3GpxxDXiT&BQB0O8;<SF1~drp
ze(|X#!!ZEcE1<?hZC1Pe@jO&&Q?;8?#U37u_366+as$e1<BJ#T--qUbC}l-nvjvFE
z*RK(xP^COV@1zFwEe^!$!H3bnHCavb<WH5D)V+S%)e^(x&7}xH6sSw&g2q&4<z0cg
zzTPf(&$PgT=m{>ok)ryVvh|iHR&tGLAqQiWZaiNnp)vx7H1iG`!<dVS))9M}%Dw1&
z{Q1<Yt>G36+EnLv!>7aWJ=a3D+*oTCQ-P=MKS^NSlUE@~?UgP&hp!!l6-vhaE5(C9
zA2E{A{42XjJ-WQc%z)4VE8#sKO<pR!&Lim>8Z<&LQg-=&+V6hgp8rp0x-EbavbWEz
zFnj=;#H_S5ViXi%w+i^>ZmfEi+k^tVjnk_FsDMUBM?7`}eSvc#qL3WMiCMD-hXULe
zNSDP1=zmO;*K5}?3n1)6c=?bgYq`&*4D=w_O%djmw?{l|65DuE^xmMQniOGm;`on2
z3*6XU`j<Cvt}l%1fdmGG=p%cK-%iBo-AhV{iyteP4*(aH|H|7=eIwKRKlh177F<h<
zU<w97XuX+`dv$bt&6%DAZ^PkVH#m8Uw$=^cb*Onms0Qi+hm0?hK!Y$@6%r=9%a~lE
zsO<KonSt38`yl8T6JYSLx0PQY2e5oGX^5YM#}bZAp(`XDB8;LrVb_Raz07g@oa~EB
zP2(41dgSM;L!Le3hI3)M?tVtj3c0wP<M1+N=q_}804Rm>!_O%CqG1@srMEb*KP>hm
zo`Je73x>GxB|0BDI+Y|R$0h6Ix+z90E&y@i6WT@r2~Q(4sg`WZE2pmWBg2XPh?_0g
zfhx_?a^Tty!UzI8`WL7|YT5Omn3kA)_QhJb1cwlP<_}ZTm9xz<!rgMNR|jLTH7=zR
zRF!w;`tR1uZw*SK`fX|}&Y<vtcN0nxVae6iU17%D9NL+I((8FU&aK&#G4TIzfCQoS
zGuN|kI&}JyWu@G{YQgaIAFssHfa#93Z?zjgX7d3`_O);PG%IeZ3%SA-QVpzb9=X)J
z8|I_*ob)!UvHg~qx&P|aA-A!*yR~ffi<!ttL)#-6KMouc+&hbi_Mgros&H_>JrQpj
zotSuth2@D*rpZHkm1eTe^BVG~n!5Ve7f&oV^H3b3aBNaKc_V#&Ca-Xzk;N*^e`mFS
zeL0=1edsuWEcmKZYsU(L%G>+Ni^s@|$muM)ZF`43<&z8kKTdi2PbW<k{j#eRLV2B|
Vyggq8j>g)jaZXq5wTgA{-vB9vLva8A

literal 0
HcmV?d00001

diff --git a/docs/source/index.rst b/docs/source/index.rst
index 8288f49069..8529712f32 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -60,6 +60,7 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
     installation
     quickstart
     glossary
+    summary
     pretrained_models
     usage
     model_sharing
diff --git a/docs/source/summary.rst b/docs/source/summary.rst
new file mode 100644
index 0000000000..94c7752fbb
--- /dev/null
+++ b/docs/source/summary.rst
@@ -0,0 +1,492 @@
+Summary of the models
+================================================
+
+This is a summary of the models available in the transformers library. It assumes you’re familiar with the original 
+`transformer model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer 
+<http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
+models. You can check them more in detail in their respective documentation. Also checkout the 
+:doc:`pretrained model page </pretrained_models>` to see the checkpoints available for each type of model.
+
+Each one of the models in the library falls into one of the following categories:
+
+  * :ref:`autoregressive-models`
+  * :ref:`autoencoding-models`
+  * :ref:`seq-to-seq-models`
+  * :ref:`multimodal-models`
+
+Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the 
+previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full 
+sentence so that the attention heads can only see what was before in the next, and not what’s after. Although those 
+models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. 
+A typical example of such models is GPT.
+
+Autoencoding models are pretrained by corrupting the input tokens in some way and trying to reconstruct the original 
+sentence. They correspond to the encoder of the original transformer model in the sense that they get access to the 
+full inputs without any mask. Those models usually build a bidirectional representation of the whole sentence. They can 
+be fine-tuned and achieve great results on many tasks such as text generation, but their most natural application is 
+sentence classification or token classification. A typical example of such models is BERT.
+
+Note that the only difference between autoregressive models and autoencoding models is in the way the model is 
+pretrained. Therefore, the same architecture can be used for both autoregressive and autoencoding models. When a given
+model has been used for both pretraining, we have put it in the category corresponding to the article it was first
+introduced.
+
+Sequence-to-sequence models use both the encoder and the decoder of the original transformer, either for translation 
+tasks or by transforming other tasks to sequence-to-sequence problems. They can be fine-tuned to many tasks but their 
+most natural applications are translation, summarization and question answering. The original transformer model is an 
+example of such a model (only for translation), T5 is an example that can be fine-tuned on other tasks.
+
+Multimodal models mix text inputs with other kinds (like image) and are more specific to a given task.
+
+.. _autoregressive-models:
+
+Autoregressive models
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so 
+that at each position, the model can only look at the tokens before in the attention heads.
+
+Original GPT
+----------------------------------------------
+
+`Improving Language Understanding by Generative Pre-Training <https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf>`_, 
+Alec Radford et al.
+
+The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
+
+The library provides versions of the model for language modeling and multitask language modeling/multiple choice 
+classification.
+
+More information in this :doc:`model documentation </model_doc/gpt>`.
+
+GPT-2
+----------------------------------------------
+
+`Language Models are Unsupervised Multitask Learners <https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_, 
+Alec Radford et al.
+
+A bigger and better version of GPT, pretrained on WebText (web pages from outgoing links in Reddit with 3 karmas or 
+more).
+
+The library provides versions of the model for language modeling and multitask language modeling/multiple choice 
+classification.
+
+More information in this :doc:`model documentation </model_doc/gpt2>`.
+
+CTRL
+----------------------------------------------
+
+`CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_, 
+Nitish Shirish Keskar et al.
+
+Same as the GPT model but adds the idea of control codes. Text is generated from a prompt (can be empty) and one (or 
+several) of those control codes which are then used to influence the text generation: generate with the style of 
+wikipedia article, a book or a movie review.
+
+The library provides a version of the model for language modeling only.
+
+More information in this :doc:`model documentation </model_doc/ctrl>`.
+
+Transformer-XL
+----------------------------------------------
+
+`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_, 
+Zihang Dai et al.
+
+Same as a regular GPT model, but introduces a recurrence mechanism for two consecutive segments (similar to a regular 
+RNNs with two consecutive inputs). In this context, a segment is a number of consecutive tokens (for instance 512) that 
+may span across multiple documents, and segments are fed in order to the model.
+
+Basically, the hidden states of the previous segment are concatenated to the current input to compute the attention 
+scores. This allows the model to pay attention to information that was in the previous segment as well as the current 
+one. By stacking multiple attention layers, the receptive field can be increased to multiple previous segments.
+
+This changes the positional embeddings to positional relative embeddings (as the regular positional embeddings would 
+give the same results in the current input and the current hidden state at a given position) and needs to make some 
+adjustments in the way attention scores are computed.
+
+The library provides a version of the model for language modeling only.
+
+More information in this :doc:`model documentation </model_doc/transformerxl>`.
+
+.. _reformer:
+
+Reformer
+----------------------------------------------
+
+`Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_,
+Nikita Kitaev et al .
+
+An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks 
+include:
+
+  * Use :ref:`Axial position encoding <axial-pos-encoding>` (see below for more details). It’s a mechanism to avoid 
+    having a huge positional encoding matrix (when the sequence length is very big) by factorizing it in smaller 
+    matrices.
+  * Replace traditional attention by :ref:`LSH (local-sensitive hashing) attention <lsh-attention>` (see below for more 
+    details). It's a technique to avoid compute the full product query-key in the attention layers.
+  * Avoid storing the intermediate results of each layer by using reversible transformer layers to obtain them during 
+    the backward pass (subtracting the residuals from the input of the next layer gives them back) or recomputing them 
+    for results inside a given layer (less efficient than storing them but saves memory).
+  * Compute the feedforward operations by chunks and not on the whole batch.
+
+With those tricks, the model can be fed much larger sentences than traditional transformer autoregressive models.
+
+**Note:** This model could be very well be used in an autoencoding setting, there is no checkpoint for such a
+pretraining yet, though.
+
+The library provides a version of the model for language modeling only.
+
+More information in this :doc:`model documentation </model_doc/reformer>`.
+
+XLNet
+----------------------------------------------
+
+`XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_,
+Zhilin Yang et al.
+
+XLNet is not a traditional autoregressive model but uses a training strategy that builds on that. It permutes the 
+tokens in the sentence, then allows the model to use the last n tokens to predict the token n+1. Since this is all done 
+with a mask, the sentence is actually fed in the model in the right order, but instead of masking the first n tokens 
+for n+1, XLNet uses a mask that hides the previous tokens in some given permutation of 1,...,sequence length.
+
+XLNet also uses the same recurrence mechanism as TransformerXL to build long-term dependencies. 
+
+The library provides a version of the model for language modeling, token classification, sentence classification, 
+multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/xlnet>`.
+
+.. _autoencoding-models:
+
+Autoencoding models
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can `
+look at all the tokens in the attention heads. For pretraining, inputs are a corrupted version of the sentence, usually 
+obtained by masking tokens, and targets are the original sentences.
+
+BERT
+----------------------------------------------
+
+`BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_,
+Jacob Devlin et al.
+
+Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually 
+15%) are masked by
+ 
+  * a special mask token with probability 0.8
+  * a random token different from the one masked with probability 0.1
+  * the same token with probability 0.1
+
+The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a 
+separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50% 
+they are not related. The model has to predict if the sentences are consecutive or not.
+
+The library provides a version of the model for language modeling (traditional or masked), next sentence prediction, 
+token classification, sentence classification, multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/bert>`.
+
+ALBERT
+----------------------------------------------
+
+`ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_,
+Zhenzhong Lan et al.
+
+Same as BERT but with a few tweaks:
+
+  * Embedding size E is different from hidden size H justified because the embeddings are context independent (one 
+    embedding vector represents one token) whereas hidden states are context dependent (one hidden state represents a 
+    sequence of tokens) so it's more logical to have H >> E. Als, the embedding matrix is large since it's V x E (V 
+    being the vocab size). If E < H, it has less parameters.
+  * Layers are split in groups that share parameters (to save memory).
+  * Next sentence prediction is replaced by a sentence ordering prediction: in the inputs, we have two sentences A et B 
+    (that are consecutive) and we either feed A followed by B or B followed by A. The model must predict if they have 
+    been swapped or not.
+
+The library provides a version of the model for masked language modeling, token classification, sentence 
+classification, multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/albert>`.
+
+RoBERTa
+----------------------------------------------
+
+`RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_,
+Yinhan Liu et al.
+
+Same as BERT with better pretraining tricks:
+
+  * dynamic masking: tokens are masked differently at each epoch whereas BERT does it once and for all
+  * no NSP (next sentence prediction) loss and instead of putting just two sentences together, put a chunk of 
+    contiguous texts together to reach 512 tokens (so sentences in in an order than may span other several documents)
+  * train with larger batches
+  * use BPE with bytes as a subunit and not characters (because of unicode characters)
+
+The library provides a version of the model for masked language modeling, token classification, sentence 
+classification, multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/roberta>`.
+
+DistilBERT
+----------------------------------------------
+
+`DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_,
+Victor Sanh et al.
+
+Same as BERT but smaller. Trained by distillation of the pretrained BERT model, meaning it's been trained to predict 
+the same probabilities as the larger model. The actual objective is a combination of:
+
+  * finding the same probabilities as the teacher model
+  * predicting the masked tokens correctly (but no next-sentence objective)
+  * a cosine similarity between the hidden states of the student and the teacher model
+
+The library provides a version of the model for masked language modeling, token classification, sentence classification 
+and question answering.
+
+More information in this :doc:`model documentation </model_doc/distilbert>`.
+
+XLM
+----------------------------------------------
+
+`Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_, Guillaume Lample and Alexis Conneau
+
+A transformer model trained on several languages. There are three different type of training for this model and the 
+library provides checkpoints for all of them:
+
+  * Causal language modeling (CLM) which is the traditional autoregressive training (so this model could be in the 
+    previous section as well). One of the languages is selected for each training sample, and the model input is a 
+    sentence of 256 tokens that may span on several documents in one one those languages.
+  * Masked language modeling (MLM) which is like RoBERTa. One of the languages is selected for each training sample, 
+    and the model input is a sentence of 256 tokens that may span on several documents in one one those languages, with
+    dynamic masking of the tokens.
+  * A combination of MLM and translation language modeling (TLM). This consists of concatenating a sentence in two 
+    different languages, with random masking. To predict one of the masked token, the model can use both the 
+    surrounding context in language 1 as well as the context given by language 2.
+
+Checkpoints refer to which method was used for pretraining by having `clm`, `mlm` or `mlm-tlm` in their names. On top
+of positional embeddings, the model has language embeddings. When training using MLM/CLM, this gives the model an
+indication of the language used, and when training using MLM+TLM, an indication of which part of the input is in which
+language.
+
+The library provides a version of the model for language modeling, token classification, sentence classification and 
+question answering.
+
+More information in this :doc:`model documentation </model_doc/xlm>`.
+
+XLM-RoBERTa
+----------------------------------------------
+
+`Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_, Alexis Conneau et 
+al.
+
+Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective, only using 
+masked language modeling on sentences coming from one language. However, the model is trained on many more languages 
+(100) and doesn't use the language embeddings, so it's capable of detecting the input language by itself.
+
+The library provides a version of the model for masked language modeling, token classification, sentence 
+classification, multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/xlmroberta>`.
+
+FlauBERT
+----------------------------------------------
+
+`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_, Hang Le et al.
+
+Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
+
+The library provides a version of the model for language modeling and sentence classification.
+
+More information in this :doc:`model documentation </model_doc/flaubert>`.
+
+ELECTRA
+----------------------------------------------
+
+`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://arxiv.org/abs/2003.10555>`_, 
+Kevin Clark et al.
+
+ELECTRA is a transformer model pretrained with the use of another (small) masked language model. The inputs are 
+corrupted by that language model, which takes an input text that is randomly masked and outputs a text in which ELECTRA 
+has to predict which token is an original and which one has been replaced. Like for GAN training, the small language 
+model is trained for a few steps (but with the original texts as objective, not to fool the ELECTRA model like in a 
+traditional GAN setting) then the ELECTRA model is trained for a few steps.
+
+The library provides a version of the model for masked language modeling, token classification and sentence 
+classification.
+
+More information in this :doc:`model documentation </model_doc/electra>`.
+
+.. _longformer:
+
+Longformer
+----------------------------------------------
+
+`Longformer: The Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_, Iz Beltagy et al.
+
+A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g., 
+what are the two tokens left and right?) is enough to take action for a given token. Some preselected input tokens are 
+still given global attention, but the attention matrix has way less parameters, resulting in a speed-up. See the 
+:ref:`local attention section <local-attention>` for more information.
+
+It is pretrained the same way a RoBERTa otherwise.
+
+**Note:** This model could be very well be used in an autoregressive setting, there is no checkpoint for such a
+pretraining yet, though.
+
+The library provides a version of the model for masked language modeling, token classification, sentence 
+classification, multiple choice classification and question answering.
+
+More information in this :doc:`model documentation </model_doc/longformer>`.
+
+
+.. _seq-to-seq-models:
+
+Sequence-to-sequence models
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As mentioned before, these models keep both the encoder and the decoder of the original transformer.
+
+BART
+----------------------------------------------
+
+`BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension <https://arxiv.org/abs/1910.13461>`_, 
+Mike Lewis et al.
+
+Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is 
+fed the tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder, on the 
+pretraining tasks, a composition of the following transformations are applied:
+
+  * mask random tokens (like in BERT)
+  * delete random tokens
+  * mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
+  * permute sentences
+  * rotate the document to make it start by a specific token
+
+The library provides a version of this model for conditional generation and sequence classification.
+
+More information in this :doc:`model documentation </model_doc/bart>`.
+
+MarianMT
+----------------------------------------------
+
+`Marian: Fast Neural Machine Translation in C++ <https://arxiv.org/abs/1804.00344>`_, Marcin Junczys-Dowmunt et al.
+
+A framework for translation models, using the same models as BART
+
+The library provides a version of this model for conditional generation.
+
+More information in this :doc:`model documentation </model_doc/marian>`.
+
+T5
+----------------------------------------------
+
+`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`_, 
+Colin Raffel et al.
+
+Uses the traditional transformer model (except a slight change with the positional embeddings, which are learned at 
+each layer). To be able to operate on all NLP tasks, it transforms them in text-to-text problems by using certain 
+prefixes: “Summarize: …”, “question: …”, “translate English to German: …” and so forth.
+
+The pretraining includes both supervised and self-supervised training. Supervised training is conducted on downstream 
+tasks provided by the GLUE and SuperGLUE benchmarks (changing them to text-to-text tasks as explained above).
+
+Self-supervised training consists of corrupted pretrained, which means randomly removing 15% of the tokens and 
+replacing them by individual sentinel tokens (if several consecutive tokens are marked for removal, they are replaced 
+by one single sentinel token). The input of the encoder is the corrupted sentence, the input of the decoder the 
+original sentence and the target is then the dropped out tokens delimited by their sentinel tokens.
+
+For instance, if we have the sentence “My dog is very cute .”, and we decide to remove the token dog, is and cute, the 
+input becomes “My <x> very <y> .” and the target is “<x> dog is <y> . <z>”
+
+The library provides a version of this model for conditional generation.
+
+More information in this :doc:`model documentation </model_doc/t5>`.
+
+.. _multimodal-models:
+
+Multimodal models
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There is one multimodal model in the library which has not been pretrained in the self-supervised fashion like the 
+others.
+
+MMBT
+----------------------------------------------
+
+`Supervised Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/abs/1909.02950>`_, Douwe Kiela 
+et al.
+
+A transformers model used in multimodal settings, combining a text and an image to make predictions. The transformer 
+model takes as inputs the embeddings of the tokenized text and a the final activations of a pretrained resnet on the 
+images (after the pooling layer) that goes through a linear layer (to go from number of features at the end of the 
+resnet to the hidden state dimension of the transformer).
+
+The different inputs are concatenated, and on top of the positional embeddings, a segment embedding is added to let the 
+model know which part of the input vector corresponds to the text or the image.
+
+The pretrained model only works for classification.
+
+..
+    More information in this :doc:`model documentation </model_doc/mmbt>`.
+    TODO: write this page
+
+More technical aspects
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Full vs sparse attention
+----------------------------------------------
+
+Most transformer models use full attention in the sense that the attention matrix is square. It can be a big 
+computational bottleneck when you have long texts. Longformer and reformer are models that try to be more efficient and 
+use a sparse version of the attention matrix to speed up training.
+
+.. _lsh-attention:
+
+**LSH attention**
+
+:ref:`Reformer <reformer>` uses LSH attention. In the softmax(QK^t), only the biggest elements (in the softmax 
+dimension) of the matrix QK^t are going to give useful contributions. So for each query q in Q, we can only consider 
+the keys k in K that are close to q. A hash function is used to determine if q and k are close. The attention mask is 
+modified to mask the current token (except at the first position) because it will give a query and key equal (so very 
+similar to each other). Since the hash can be a bit random, several hash functions are used in practice (determined by 
+a n_rounds parameter) then are averaged together.
+
+.. _local-attention:
+
+**Local attention**
+
+:ref:`Longformer <longformer>` uses local attention: often, the local context (e.g., what are the two tokens left and 
+right?) is enough to take action for a given token. Also, by stacking attention layers that have a small window, the 
+last layer will have a receptive field of more than just the tokens on the window, allowing them to build a 
+representation of the whole sentence.
+
+Some preselected input tokens are also given global attention: for those few tokens, the attention matrix can access 
+all tokens and this process is symmetric: all other tokens have access to those specific tokens (on top of the ones in 
+their local window). This is shown in Figure 2d of the paper, see below for a sample attention mask:
+
+.. image:: imgs/local_attention_mask.png
+   :scale: 50 %
+   :align: center
+
+Using those attention matrices with less parameters then allows the model to have inputs having a bigger sequence 
+length.
+
+Other tricks
+----------------------------------------------
+
+.. _axial-pos-encoding:
+
+**Axial positional encodings**
+
+:ref:`Reformer <reformer>` uses axial positional encodings: in traditional transformer models, the positional encoding 
+E is a matrix of size :math:`l` by :math:`d`, :math:`l` being the sequence length and :math:`d` the dimension of the 
+hidden state. If you have very long texts, this matrix can be huge and take way too much space on the GPU.
+
+To alleviate that, axial positional encodings consists in factorizing that big matrix E in two smaller matrices E1 and 
+E2, with dimensions :math:`l_{1} \times d_{1}` and :math:`l_{2} \times d_{2}`, such that :math:`l_{1} \times l_{2} = l`
+and :math:`d_{1} + d_{2} = d` (with the product for the lengths, this ends up being way smaller). The embedding for 
+time step :math:`j` in E is obtained by concatenating the embeddings for timestep :math:`j \% l1` in E1 and 
+:math:`j // l1` in E2.
+