[docs] Format fix (#38414)

fix table
This commit is contained in:
Steven Liu
2025-06-03 09:53:23 -07:00
committed by GitHub
parent 0f41c41a46
commit 78d771c3c2

View File

@@ -56,10 +56,10 @@ Attention is calculated independently in each layer of the model, and caching is
Refer to the table below to compare how caching improves efficiency. Refer to the table below to compare how caching improves efficiency.
| without caching | with caching | | | | | without caching | with caching |
|---|---|---|---|---| |---|---|
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V` | | | | | for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) | | | | | attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |