Suggestions on Pipeline_webserver (#25570)

* Suggestions on Pipeline_webserver

docs: reorder the warning tip for pseudo-code

Co-Authored-By: Wonhyeong Seo <wonhseo@kakao.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/pipeline_webserver.md

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Kihoon Son
2023-08-18 17:17:44 +09:00
committed by GitHub
parent 659ab0423e
commit 08e32519f8
2 changed files with 14 additions and 10 deletions

View File

@@ -87,6 +87,13 @@ of the model on the webserver. This way, no unnecessary RAM is being used.
Then the queuing mechanism allows you to do fancy stuff like maybe accumulating a few
items before inferring to use dynamic batching:
<Tip warning={true}>
The code sample below is intentionally written like pseudo-code for readability.
Do not run this without checking if it makes sense for your system resources!
</Tip>
```py
(string, rq) = await q.get()
strings = []
@@ -104,11 +111,7 @@ for rq, out in zip(queues, outs):
await rq.put(out)
```
<Tip warning={true}>
Do not activate this without checking it makes sense for your load!
</Tip>
The proposed code is optimized for readability, not for being the best code.
Again, the proposed code is optimized for readability, not for being the best code.
First of all, there's no batch size limit which is usually not a
great idea. Next, the timeout is reset on every queue fetch, meaning you could
wait much more than 1ms before running the inference (delaying the first request