From e8e0c76162263840661fc0ca0da3952861754759 Mon Sep 17 00:00:00 2001 From: Chong You Date: Tue, 1 Jul 2025 22:11:03 -0400 Subject: [PATCH] Add activation sparsity reference in gemma3n doc (#39160) Add activation sparsity reference in the description of gemma3n --- docs/source/en/model_doc/gemma3n.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/en/model_doc/gemma3n.md b/docs/source/en/model_doc/gemma3n.md index d38368e829..423261da04 100644 --- a/docs/source/en/model_doc/gemma3n.md +++ b/docs/source/en/model_doc/gemma3n.md @@ -29,7 +29,7 @@ rendered properly in your Markdown viewer. Gemma3n is a multimodal model with pretrained and instruction-tuned variants, available in E4B and E2B sizes. While large portions of the language model architecture are shared with prior Gemma releases, there are many new additions in this model, including [Alternating Updates][altup] (AltUp), [Learned Augmented Residual Layer][laurel] (LAuReL), -[MatFormer][matformer], Per-Layer Embeddings (PLE), activation sparsity, and KV cache sharing. The language model uses +[MatFormer][matformer], Per-Layer Embeddings (PLE), [Activation Sparsity with Statistical Top-k][spark-transformer], and KV cache sharing. The language model uses a similar attention pattern to [Gemma 3](./gemma3.md) with alternating 4 local sliding window self-attention layers for every global self-attention layer with a maximum context length of 32k tokens. Gemma 3n introduces [MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a newly @@ -201,4 +201,5 @@ echo -e "Plants create energy through a process known as" | transformers run --t [gemma3n-collection]: https://huggingface.co/collections/google/gemma-3n [laurel]: https://arxiv.org/abs/2411.07501 [matformer]: https://arxiv.org/abs/2310.07707 +[spark-transformer]: https://arxiv.org/abs/2506.06644 [usm]: https://arxiv.org/abs/2303.01037