Make chatGPT API response faster, user more happy, by using streaming mode


It’s actually not really making results coming from chatGPT API much faster, but making users feel things are moving much faster.
So the user experience is a lot better.

The key is to enable streamming mode when calling the chatGPT API.

Here is the example code:

# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/guides/chat

# a ChatCompletion request
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': "What's 1+1? Answer in one word."}
],
temperature=0,
stream=True # this time, we set stream=True
)

for chunk in response:
print(chunk)

As you can see, the key is to add one more parameter as stream=True.
And here is the result from the above code:

{
"choices": [
{
"delta": {
"role": "assistant"
},
"finish_reason": null,
"index": 0
}
],
"created": 1677825464,
"id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}
{
"choices": [
{
"delta": {
"content": "\n\n"
},
"finish_reason": null,
"index": 0
}
],
"created": 1677825464,
"id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}
{
"choices": [
{
"delta": {
"content": "2"
},
"finish_reason": null,
"index": 0
}
],
"created": 1677825464,
"id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}
{
"choices": [
{
"delta": {},
"finish_reason": "stop",
"index": 0
}
],
"created": 1677825464,
"id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC