In this blog post, we will be discussing how to build your own chat bot using the ChatGPT API. It’s worth mentioning that we will be using the OpenAI APIs directly and not the Azure OpenAI APIs, and the code will be written in Python. A crucial aspect of creating a chat bot is maintaining context in the conversation, which we will achieve by storing and sending previous messages to the API at each request. If you are just starting with AI and chat bots, this post will guide you through the step-by-step process of building your own simple chat bot using the ChatGPT API.
Python setup
Ensure Python is installed. I am using version 3.10.8. For editing code, I am using Visual Studio code as the editor. For the text-based chat bot, you will need the following Python packages:
- openai: make sure the version is 0.27.0 or higher; earlier versions do not support the ChatCompletion APIs
- tiktoken: a library to count the number of tokens of your chat bot messages
Install the above packages with your package manager. For example: pip install openai
.
All code can be found on GitHub.
Getting an account at OpenAI
We will write a text-based chat bot that asks for user input indefinitely. The first thing you need to do is sign up for API access at https://platform.openai.com/signup. Access is not free but for personal use, while writing and testing the chat bot, the price will be very low. Here is a screenshot from my account:

When you have your account, generate an API key from https://platform.openai.com/account/api-keys. Click the Create new secret key button and store the key somewhere.
Writing the bot
Now create a new Python file called app.py
and add the following lines:
import os
import openai
import tiktoken
openai.api_key = os.getenv("OPENAI_KEY")
We already discussed the openai
and tiktoken
libraries. We will also use the builtin os
library to read environment variables.
In the last line, we read the environment variable OPENAI_KEY
. If you use Linux, in your shell, use the following command to store the OpenAI key in an environment variable: export OPENAI_KEY=your-OpenAI-key
. We use this approach to avoid storing the API key in your code and accidentally uploading it to GitHub.
To implement the core chat functionality, we will use a Python class. I was following a Udemy course about ChatGPT and it used a similar approach, which I liked. By the way, I can highly recommend that course. Check it out here.
Let’s start with the class constructor:
class ChatBot:
def __init__(self, message):
self.messages = [
{ "role": "system", "content": message }
]
In the constructor, we define a messages list and set the first item in that list to a configurable dictionary: { "role": "system", "content": message }
. In the ChatGPT API calls, the messages
list provides context to the API because it contains all the previous messages. With this initial system
message, we can instruct the API to behave in a certain way. For example, later in the code, you will find this code to create an instance of the ChatBot class:
bot = ChatBot("You are an assistant that always answers correctly. If not sure, say 'I don't know'.")
But you could also do:
bot = ChatBot("You are an assistant that always answers wrongly.Always contradict the user")
In practice, ChatGPT does not follow the system instruction to strongly. User messages are more important. So it could be that, after some back and forth, the answers will not follow the system instruction anymore.
Let’s continue with another method in the class, the chat
method:
def chat(self):
prompt = input("You: ")
self.messages.append(
{ "role": "user", "content": prompt}
)
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages = self.messages,
temperature = 0.8
)
answer = response.choices[0]['message']['content']
print(answer)
self.messages.append(
{ "role": "assistant", "content": answer}
)
tokens = self.num_tokens_from_messages(self.messages)
print(f"Total tokens: {tokens}")
if tokens > 4000:
print("WARNING: Number of tokens exceeds 4000. Truncating messages.")
self.messages = self.messages[2:]
The chat method is where the action happens. It does the following:
- It prompts the user to enter some input.
- The user’s input is stored in a dictionary as a message with a “user” role and appended to a list of messages called
self.messages
. If this is the first input, we now have two messages in the list, a system message and a user message. - It then creates a response using OpenAI’s
gpt-3.5-turbo
model, passing in theself.messages
list and a temperature of 0.8 as parameters. We use theChatCompletion
API versus the Completion API that you use with other models such astext-davinci-003
. - The generated response is stored in a variable named answer. The full response contains a lot of information. We are only interested in the first response (there is only one) and grab the
content
. - The
answer
is printed to the console. - The
answer
is also added to theself.messages
list as a message with an “assistant” role. If this is the first input, we now have three messages in the list: a system message, the first user message (the input) and the assistant’s response. - The total number of tokens in the
self.messages
list is computed using a separate function callednum_tokens_from_messages()
and printed to the console. - If the number of tokens exceeds 4000, a warning message is printed and the
self.messages
list is truncated to remove the first two messages. We will talk about these tokens later.
It’s important to realize we are using the Chat completions here. You can find more information about Chat completions here.
If you did not quite get how the text response gets extracted, here is an example of a full response from the Chat completion API:
{
'id': 'chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve',
'object': 'chat.completion',
'created': 1677649420,
'model': 'gpt-3.5-turbo',
'usage': {'prompt_tokens': 56, 'completion_tokens': 31, 'total_tokens': 87},
'choices': [
{
'message': {
'role': 'assistant',
'content': 'The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.'},
'finish_reason': 'stop',
'index': 0
}
]
}
The response is indeed in choices[0][‘message’][‘content’].
To make this rudimentary chat bot work, we will repeatedly call the chat method like so:
bot = ChatBot("You are an assistant that always answers correctly. If not sure, say 'I don't know'.")
while True:
bot.chat()
Every time you input a question, the API answers and both the question and answer is added to the messages list. Of course, that makes the messages list grow larger and larger, up to a point where it gets to large. The question is: “What is too large?”. Let’s answer that in the next section.
Counting tokens
A language model does not work with text as humans do. Instead, they use tokens. It’s not important how this exactly works but it is important to know that you get billed based on these tokens. You pay per token.
In addition, the model we use here (gpt-3.5-turbo) has a maximum limit of 4096 tokens. This might change in the future. With our code, we cannot keep adding messages to the messages list because, eventually, we will pass the limit and the API call will fail.
To have an idea about the tokens in our messages list, we have this function:
def num_tokens_from_messages(self, messages, model="gpt-3.5-turbo"):
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
if model == "gpt-3.5-turbo": # note: future models may deviate from this
num_tokens = 0
for message in messages:
num_tokens += 4 # every message follows {role/name}\n{content}\n
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name": # if there's a name, the role is omitted
num_tokens += -1 # role is always required and always 1 token
num_tokens += 2 # every reply is primed with assistant
return num_tokens
else:
raise NotImplementedError(f"""num_tokens_from_messages() is not presently implemented for model {model}.""")
The above function comes from the OpenAI cookbook on GitHub. In my code, the function is used to count tokens in the messages list and, if the number of tokens is above a certain limit, we remove the first two messages from the list. The code also prints the tokens so you now how many you will be sending to the API.
The function contains references to
and
. This is ChatML and is discussed here. Because you use the ChatCompletion API, you do not have to worry about this. You just use the messages list and the API will transform it all to ChatML. But when you count tokens, ChatML needs to be taken into account for the total token count.
Note that Microsoft examples for Azure OpenAI, do use ChatML in the prompt, in combination with the default Completion APIs. See Microsoft Learn for more information. You will quickly see that using the ChatCompletion API with the messages list is much simpler.
To see, and download, the full code, see GitHub.
Running the code
To run the code, just run app.py
. On my system, I need to use python3 app.py
. I set the system message to You are an assistant that always answers wrongly. Contradict the user
. 😀
Here’s an example conversation:

Although, at the start, the responses follow the system message, the assistant starts to correct itself and answers correctly. As stated, user messages eventually carry more weight.
Summary
In this post, we discussed how to build a chat bot using the ChatGPT API and Python. We went through the setup process, created an OpenAI account, and wrote the chat bot code using the OpenAI API. The bot used the ChatCompletion API and maintained context in the conversation by storing and sending previous messages to the API at each request. We also discussed counting tokens and truncating the message list to avoid exceeding the maximum token limit for the model. The full code is available on GitHub, and we provided an example conversation between the bot and the user. The post aimed to guide both beginning developers and beginners in AI and chat bot development through the step-by-step process of building their chat bot using the ChatGPT API and keep it as simple as possible.
Hope you liked it!