From 78d771c3c21922642fc9546ccb973cc7a182ab34 Mon Sep 17 00:00:00 2001
From: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Date: Tue, 3 Jun 2025 09:53:23 -0700
Subject: [PATCH] [docs] Format fix (#38414)

fix table
---
 docs/source/en/cache_explanation.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/source/en/cache_explanation.md b/docs/source/en/cache_explanation.md
index 0ccf612d21..4190cefdb8 100644
--- a/docs/source/en/cache_explanation.md
+++ b/docs/source/en/cache_explanation.md
@@ -56,10 +56,10 @@ Attention is calculated independently in each layer of the model, and caching is
 
 Refer to the table below to compare how caching improves efficiency.
 
-| without caching | with caching |  |  |  |
-|---|---|---|---|---|
-| for each step, recompute all previous `K` and `V`  | for each step, only compute current `K` and `V` |  |  |  |
-| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |  |  |  |
+| without caching | with caching |
+|---|---|
+| for each step, recompute all previous `K` and `V`  | for each step, only compute current `K` and `V` 
+| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |