استریم کردن خروجی تولید متن

استریم کردن خروجی تولید متن

streaming, chat-completions
preview

به طور پیش فرض، هنگامی که شما از Gilas API درخواست تولید متن می کنید، کل متن تولید شده در یک رسپانس به کلاینت برگردانده می شود. اگر قصد تولید خروجی‌های طولانی را دارید٬ تولید متن می تواند چندین ثانیه طول بکشد. برای دریافت سریعتر پاسخ ها٬ می توانید متن را در حالی که تولید می شود stream کنید. در این صورت پاسخ های ناتمام را به صورت استریم از سمت سرور دریافت خواهید کرد.

توجه: ورودی‌های داده شده و خروجی‌های تولید شده توسط مدل در این مثال به زبان انگلیسی هستند. برای تولید خروجی به زبان فارسی٬ کافی‌ست از مدل بخواهید که خروجی را به زبان فارسی تولید کند.

برای استریم کردن متن ها، هنگام فراخوانی v1/chat/completions مقدار پارامتر stream=true را تنظیم کنید. این درخواست یک شیء برمی گرداند که پاسخ را به عنوان data-only server-sent events برگشت می دهد.

توجه داشته باشید که استفاده از stream=true باعث می شود که مدیریت محتوای متن دشوارتر شود، زیرا کار ارزیابی تکه های جزئی متن ممکن است دشوارتر باشند. مخصوصا زمانی که قصد تولید خروجی JSON را دارید توصیه می‌شود از حالت استریم استفاده نکنید.

یکی دیگر از معایب پاسخ های استریم این است که پاسخ دیگر شامل فیلد usage نیست که به شما بگوید چند توکن مصرف شده است. برای حل این مشکل٬ پس از دریافت و ترکیب همه پاسخ ها، می توانید تعداد توکن‌های مصرف شده را با استفاده از tiktoken محاسبه کنید.

آنچه که در این نوت‌بوک یاد خواهید گرفت:

  1. خروجی معمولی اندپوینت تولید متن چگونه است

  2. خروجی استریم اندپوینت تولید متن چگونه است

  3. چقدر زمان با streaming تولید متن صرفه جویی می شود

برای اجرای کدهای زیر ابتدا باید یک کلید API را از طریق پنل کاربری گیلاس تولید کنید. برای این کار ابتدا یک حساب کاربری جدید بسازید یا اگر صاحب حساب کاربری هستید وارد پنل کاربری خود شوید. سپس، به صفحه کلید API بروید و با کلیک روی دکمه “ساخت کلید API” یک کلید جدید برای دسترسی به Gilas API بسازید.
1import time
2from openai import OpenAI
3import os
4
5client = OpenAI(
6   api_key=os.environ.get(("GILAS_API_KEY", "<کلید API خود را اینجا بسازید https://dashboard.gilas.io/apiKey>")), 
7   base_url="https://api.gilas.io/v1/" # Gilas APIs
8)

خروجی معمولی اندپوینت تولید متن چگونه است #

با فراخوانی API ChatCompletions ابتدا تمامی متن تولید می شود و سپس همه آن یکباره برگشت داده می شود.

پاسخ می تواند از مسیر response.choices[0].message استخراج شود.

و محتوای پاسخ می تواند از مسیر response.choices[0].message.content استخراج شود.

 1# Example of an OpenAI ChatCompletion request
 2# https://platform.openai.com/docs/guides/text-generation/chat-completions-api
 3
 4# record the time before the request is sent
 5start_time = time.time()
 6
 7# send a ChatCompletion request to count to 100
 8response = client.chat.completions.create(
 9    model='gpt-3.5-turbo',
10    messages=[
11        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
12    ],
13    temperature=0,
14)
15# calculate the time it took to receive the response
16response_time = time.time() - start_time
17
18# print the time delay and text received
19print(f"Full response received {response_time:.2f} seconds after request")
20print(f"Full response received:\n{response}")

خروجی:

Full response received 5.27 seconds after request
Full response received:
ChatCompletion(id='chatcmpl-8ZB8ywkV5DuuJO7xktqUcNYfG8j6I', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.', role='assistant', function_call=None, tool_calls=None))], created=1703395008, model='gpt-3.5-turbo', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=299, prompt_tokens=36, total_tokens=335))

همچنین اطلاعات مربوط به تعداد تون مصرف شده برای تولید این متن از مسیر response.choices[0].message.content قابل دسترس است.

خروجی استریم اندپوینت تولید متن چگونه است #

با فراخوانی API ChatCompletions به صورت استریم، پاسخ به طور تدریجی در تکه هایی به نام chunk از طریق یک event stream برگشت داده می شود. در پایتون، شما می توانید با استفاده از یک حلقه for روی این تکه ها ایتریت کنید تا در نهایت بتوانید متن کامل رو بسازید.

 1# For more info https://platform.openai.com/docs/api-reference/streaming#chat/create-stream
 2
 3# a ChatCompletion request
 4response = client.chat.completions.create(
 5   model='gpt-3.5-turbo',
 6   messages=[
 7       {'role': 'user', 'content': "What's 1+1? Answer in one word."}
 8   ],
 9   temperature=0,
10   stream=True  # this time, we set stream=True
11)
12
13for chunk in response:
14   print(chunk)
15   print(chunk.choices[0].delta.content)
16   print("****************")

خروجی:

``` ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None) **************** ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content='2', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None) 2 **************** ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None) None **************** ```

همانطور که می بینید، پاسخ های streaming یک فیلد delta به جای فیلد message دارند. delta می تواند چیزهایی مانند:

  • یک توکن نقش (مثلا، {“role”: “assistant”})
  • یک توکن محتوا (مثلا، {“content”: “\n\n”})
  • هیچ چیز (مثلا، {}) وقتی که استریم تمام شده است

را شامل باشد.

چقدر زمان با streaming تولید متن صرفه جویی می شود #

حالا بیایید از gpt-3.5-turbo بخواهیم دوباره تا 100 بشمارد و جواب ها رو به صورت استریم ارسال کند تا ببینیم این کار چقدر طول می کشد.

 1# record the time before the request is sent
 2start_time = time.time()
 3
 4# send a ChatCompletion request to count to 100
 5response = client.chat.completions.create(
 6    model='gpt-3.5-turbo',
 7    messages=[
 8        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
 9    ],
10    temperature=0,
11    stream=True  # again, we set stream=True
12)
13# create variables to collect the stream of chunks
14collected_chunks = []
15collected_messages = []
16# iterate through the stream of events
17for chunk in response:
18    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
19    collected_chunks.append(chunk)  # save the event response
20    chunk_message = chunk.choices[0].delta.content  # extract the message
21    collected_messages.append(chunk_message)  # save the message
22    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text
23
24# print the time delay and text received
25print(f"Full response received {chunk_time:.2f} seconds after request")
26# clean None in collected_messages
27collected_messages = [m for m in collected_messages if m is not None]
28full_reply_content = ''.join([m for m in collected_messages])
29print(f"Full conversation received: {full_reply_content}")
``` Message received 0.31 seconds after request: Message received 0.31 seconds after request: 1 Message received 0.34 seconds after request: , Message received 0.34 seconds after request: Message received 0.34 seconds after request: 2 Message received 0.39 seconds after request: , Message received 0.39 seconds after request: Message received 0.39 seconds after request: 3 Message received 0.42 seconds after request: , Message received 0.42 seconds after request: Message received 0.42 seconds after request: 4 Message received 0.47 seconds after request: , Message received 0.47 seconds after request: Message received 0.47 seconds after request: 5 Message received 0.51 seconds after request: , Message received 0.51 seconds after request: Message received 0.51 seconds after request: 6 Message received 0.55 seconds after request: , Message received 0.55 seconds after request: Message received 0.55 seconds after request: 7 Message received 0.59 seconds after request: , Message received 0.59 seconds after request: Message received 0.59 seconds after request: 8 Message received 0.63 seconds after request: , Message received 0.63 seconds after request: Message received 0.63 seconds after request: 9 Message received 0.67 seconds after request: , Message received 0.67 seconds after request: Message received 0.67 seconds after request: 10 Message received 0.71 seconds after request: , Message received 0.71 seconds after request: Message received 0.71 seconds after request: 11 Message received 0.75 seconds after request: , Message received 0.75 seconds after request: Message received 0.75 seconds after request: 12 Message received 0.98 seconds after request: , Message received 0.98 seconds after request: Message received 0.98 seconds after request: 13 Message received 1.02 seconds after request: , Message received 1.02 seconds after request: Message received 1.02 seconds after request: 14 Message received 1.04 seconds after request: , Message received 1.04 seconds after request: Message received 1.04 seconds after request: 15 Message received 1.08 seconds after request: , Message received 1.08 seconds after request: Message received 1.08 seconds after request: 16 Message received 1.12 seconds after request: , Message received 1.12 seconds after request: Message received 1.12 seconds after request: 17 Message received 1.16 seconds after request: , Message received 1.16 seconds after request: Message received 1.16 seconds after request: 18 Message received 1.19 seconds after request: , Message received 1.19 seconds after request: Message received 1.19 seconds after request: 19 Message received 1.23 seconds after request: , Message received 1.23 seconds after request: Message received 1.23 seconds after request: 20 Message received 1.27 seconds after request: , Message received 1.27 seconds after request: Message received 1.27 seconds after request: 21 Message received 1.31 seconds after request: , Message received 1.31 seconds after request: Message received 1.31 seconds after request: 22 Message received 1.35 seconds after request: , Message received 1.35 seconds after request: Message received 1.35 seconds after request: 23 Message received 1.39 seconds after request: , Message received 1.39 seconds after request: Message received 1.39 seconds after request: 24 Message received 1.43 seconds after request: , Message received 1.43 seconds after request: Message received 1.43 seconds after request: 25 Message received 1.47 seconds after request: , Message received 1.47 seconds after request: Message received 1.47 seconds after request: 26 Message received 1.51 seconds after request: , Message received 1.51 seconds after request: Message received 1.51 seconds after request: 27 Message received 1.55 seconds after request: , Message received 1.55 seconds after request: Message received 1.55 seconds after request: 28 Message received 1.59 seconds after request: , Message received 1.59 seconds after request: Message received 1.59 seconds after request: 29 Message received 1.59 seconds after request: , Message received 1.59 seconds after request: Message received 1.59 seconds after request: 30 Message received 1.59 seconds after request: , Message received 1.59 seconds after request: Message received 1.59 seconds after request: 31 Message received 1.59 seconds after request: , Message received 1.59 seconds after request: Message received 1.60 seconds after request: 32 Message received 1.60 seconds after request: , Message received 1.60 seconds after request: Message received 1.60 seconds after request: 33 Message received 1.60 seconds after request: , Message received 1.60 seconds after request: Message received 1.67 seconds after request: 34 Message received 1.67 seconds after request: , Message received 1.67 seconds after request: Message received 1.68 seconds after request: 35 Message received 1.68 seconds after request: , Message received 1.68 seconds after request: Message received 1.86 seconds after request: 36 Message received 1.86 seconds after request: , Message received 1.86 seconds after request: Message received 1.90 seconds after request: 37 Message received 1.90 seconds after request: , Message received 1.90 seconds after request: Message received 1.94 seconds after request: 38 Message received 1.94 seconds after request: , Message received 1.94 seconds after request: Message received 1.98 seconds after request: 39 Message received 1.98 seconds after request: , Message received 1.98 seconds after request: Message received 2.05 seconds after request: 40 Message received 2.05 seconds after request: , Message received 2.05 seconds after request: Message received 2.09 seconds after request: 41 Message received 2.09 seconds after request: , Message received 2.09 seconds after request: Message received 2.14 seconds after request: 42 Message received 2.14 seconds after request: , Message received 2.14 seconds after request: Message received 2.14 seconds after request: 43 Message received 2.14 seconds after request: , Message received 2.14 seconds after request: Message received 2.14 seconds after request: 44 Message received 2.14 seconds after request: , Message received 2.14 seconds after request: Message received 2.14 seconds after request: 45 Message received 2.14 seconds after request: , Message received 2.14 seconds after request: Message received 2.15 seconds after request: 46 Message received 2.15 seconds after request: , Message received 2.15 seconds after request: Message received 2.30 seconds after request: 47 Message received 2.30 seconds after request: , Message received 2.30 seconds after request: Message received 2.30 seconds after request: 48 Message received 2.30 seconds after request: , Message received 2.30 seconds after request: Message received 2.30 seconds after request: 49 Message received 2.30 seconds after request: , Message received 2.30 seconds after request: Message received 2.31 seconds after request: 50 Message received 2.31 seconds after request: , Message received 2.31 seconds after request: Message received 2.39 seconds after request: 51 Message received 2.39 seconds after request: , Message received 2.39 seconds after request: Message received 2.40 seconds after request: 52 Message received 2.40 seconds after request: , Message received 2.40 seconds after request: Message received 2.48 seconds after request: 53 Message received 2.48 seconds after request: , Message received 2.48 seconds after request: Message received 2.49 seconds after request: 54 Message received 2.49 seconds after request: , Message received 2.49 seconds after request: Message received 2.68 seconds after request: 55 Message received 2.68 seconds after request: , Message received 2.68 seconds after request: Message received 2.72 seconds after request: 56 Message received 2.72 seconds after request: , Message received 2.72 seconds after request: Message received 2.77 seconds after request: 57 Message received 2.77 seconds after request: , Message received 2.77 seconds after request: Message received 2.80 seconds after request: 58 Message received 2.80 seconds after request: , Message received 2.80 seconds after request: Message received 2.85 seconds after request: 59 Message received 2.85 seconds after request: , Message received 2.85 seconds after request: Message received 2.88 seconds after request: 60 Message received 2.88 seconds after request: , Message received 2.88 seconds after request: Message received 2.88 seconds after request: 61 Message received 2.88 seconds after request: , Message received 2.88 seconds after request: Message received 2.89 seconds after request: 62 Message received 2.89 seconds after request: , Message received 2.89 seconds after request: Message received 2.89 seconds after request: 63 Message received 2.89 seconds after request: , Message received 2.89 seconds after request: Message received 2.92 seconds after request: 64 Message received 2.92 seconds after request: , Message received 2.92 seconds after request: Message received 3.37 seconds after request: 65 Message received 3.37 seconds after request: , Message received 3.37 seconds after request: Message received 3.38 seconds after request: 66 Message received 3.38 seconds after request: , Message received 3.38 seconds after request: Message received 3.38 seconds after request: 67 Message received 3.38 seconds after request: , Message received 3.38 seconds after request: Message received 3.38 seconds after request: 68 Message received 3.38 seconds after request: , Message received 3.38 seconds after request: Message received 3.42 seconds after request: 69 Message received 3.42 seconds after request: , Message received 3.42 seconds after request: Message received 3.43 seconds after request: 70 Message received 3.43 seconds after request: , Message received 3.43 seconds after request: Message received 3.46 seconds after request: 71 Message received 3.46 seconds after request: , Message received 3.46 seconds after request: Message received 3.47 seconds after request: 72 Message received 3.47 seconds after request: , Message received 3.47 seconds after request: Message received 3.50 seconds after request: 73 Message received 3.50 seconds after request: , Message received 3.50 seconds after request: Message received 3.51 seconds after request: 74 Message received 3.51 seconds after request: , Message received 3.51 seconds after request: Message received 3.52 seconds after request: 75 Message received 3.52 seconds after request: , Message received 3.52 seconds after request: Message received 3.54 seconds after request: 76 Message received 3.54 seconds after request: , Message received 3.54 seconds after request: Message received 3.56 seconds after request: 77 Message received 3.56 seconds after request: , Message received 3.56 seconds after request: Message received 3.59 seconds after request: 78 Message received 3.59 seconds after request: , Message received 3.59 seconds after request: Message received 3.59 seconds after request: 79 Message received 3.59 seconds after request: , Message received 3.59 seconds after request: Message received 3.59 seconds after request: 80 Message received 3.59 seconds after request: , Message received 3.59 seconds after request: Message received 3.61 seconds after request: 81 Message received 3.61 seconds after request: , Message received 3.61 seconds after request: Message received 3.65 seconds after request: 82 Message received 3.65 seconds after request: , Message received 3.65 seconds after request: Message received 3.85 seconds after request: 83 Message received 3.85 seconds after request: , Message received 3.85 seconds after request: Message received 3.90 seconds after request: 84 Message received 3.90 seconds after request: , Message received 3.90 seconds after request: Message received 3.95 seconds after request: 85 Message received 3.95 seconds after request: , Message received 3.95 seconds after request: Message received 4.00 seconds after request: 86 Message received 4.00 seconds after request: , Message received 4.00 seconds after request: Message received 4.04 seconds after request: 87 Message received 4.04 seconds after request: , Message received 4.04 seconds after request: Message received 4.08 seconds after request: 88 Message received 4.08 seconds after request: , Message received 4.08 seconds after request: Message received 4.12 seconds after request: 89 Message received 4.12 seconds after request: , Message received 4.12 seconds after request: Message received 4.18 seconds after request: 90 Message received 4.18 seconds after request: , Message received 4.18 seconds after request: Message received 4.18 seconds after request: 91 Message received 4.18 seconds after request: , Message received 4.18 seconds after request: Message received 4.18 seconds after request: 92 Message received 4.18 seconds after request: , Message received 4.18 seconds after request: Message received 4.19 seconds after request: 93 Message received 4.19 seconds after request: , Message received 4.19 seconds after request: Message received 4.20 seconds after request: 94 Message received 4.20 seconds after request: , Message received 4.20 seconds after request: Message received 4.23 seconds after request: 95 Message received 4.23 seconds after request: , Message received 4.23 seconds after request: Message received 4.27 seconds after request: 96 Message received 4.27 seconds after request: , Message received 4.27 seconds after request: Message received 4.39 seconds after request: 97 Message received 4.39 seconds after request: , Message received 4.39 seconds after request: Message received 4.39 seconds after request: 98 Message received 4.39 seconds after request: , Message received 4.39 seconds after request: Message received 4.41 seconds after request: 99 Message received 4.41 seconds after request: , Message received 4.41 seconds after request: Message received 4.41 seconds after request: 100 Message received 4.41 seconds after request: . Message received 4.41 seconds after request: None Full response received 4.41 seconds after request Full conversation received: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100. ```

مقایسه زمان در مثال‌های بالا نشان می‌دهند که هر دو درخواست حدود 4 تا 5 ثانیه طول کشید تا کاملا تکمیل شوند. زمان درخواست ها بسته به بار و سایر عوامل تصادفی متفاوت خواهد بود. با این حال، با درخواست streaming، ما اولین توکن را پس از 0.1 ثانیه دریافت کردیم.