The Secret Sauce To Let ChatGPT Rock Your Customer Support

ChatGPT cannot answer questions from your company knowledge. We explore how company knowledge is used to answer customer questions

Published in

Better Programming

8 min readOct 13, 2023

ChatGPT is an awesome tool to answer annoying customers, write an outline the shareholder ordered, or handle corporate junk that pops up on your desk. But specific knowledge about your company is required to answer annoying customer questions. Most likely, the answer is already in the documentation, but ChatGPT didn’t read it.

The easiest way would be to just pass the whole documentation to ChatGPT, but GPT-4 can only handle about 30 pages of input. Also, passing a lot of information to GPT-4 makes it slow and gets expensive fast. It would be useful to have a tool to find the imported parts out of the ocean of information. That’s where embeddings enter the game.

In this article, we will explain what embeddings are and how we can use them to empower ChatGPT to answer customer questions with your documentation’s knowledge.

What are Embeddings?

A text embedding — or embedding — is a mathematical representation of the information contained in the text. This allows us to tell which text is about the same topic and which don’t have much in common.

As an example, let's look at three sentences:

“To bold a paragraph, use the CTRL-B shortcut.”
“By using the eraser symbol in the ribbon, you can remove italics from the selection.”
“To make it the way up the hill in one day, you need to take the shortcut. If not, at least two days are necessary.”

Even though the first and the third sentences share many words, the embedding of the first and second sentences is much closer than the ones from the first and third. This comes because the subject is very close in the first two, which differs from the third.

Embeddings are represented as large vectors of numbers. These numbers encode the information contained in the text. By measuring the distance between two embedding vectors, we can determine which texts contain similar information and which are about different topics.

This is the tool we were looking for! This enables us to find the relevant parts from a huge amount of text.

How Can We Use Embeddings To Answer a Question From Our Documentation

First, we prepare the documentation to find the relevant parts to answer a question.

Split the documentation into chunks of medium size.
Calculate the embeddings for each chunk.
Store the embeddings to be able to use them to answer a question.

Now, we are ready to answer the following questions:

Take the question and create the embedding of it.
Compute the distance between the embedding of the question and the embedding of each chunk of the documentation.
Take the nearest chunks of the documentation and ask ChatGPT to answer the question only using the information from these chunks

Now, Let’s Get to the Nitty Gritty Implementation Details

Split the documentation into chunks of medium size

There is a lot of research about how to chunk a large text. A common finding seems to be that a chunk should not be longer than 2,000 words.

For splitting the documentation into chunks, we found that splitting in logical sections gives a much better result than splitting at fixed character sizes. Another finding was that too short sections do not contain enough information to create a good, reliable embedding. Since adjacent sections usually contain similar information, we can group several sections together until we reach at least 500 characters.

For example, the Python code below shows how to parse an HTML documentation into sections. First, we use the BeautifulSoup package to parse the HTML and traverse the elements. While traversing, we can convert the HTML into text. We assume thatH1 and H2 tags represent section titles. If we reach one of these elements, we check if our chunk is long enough. If so, we start a new chunk in the next section. Here’s what that looks like:

from bs4 import BeautifulSoup

chunks = [];
elements = BeautifulSoup(documentationInHTML, features='html.parser')
nextId = 0
paragraph = ''
for element in elements.descendants:
    if isinstance(element, str):
        paragraph += element.strip()
    elif element.name in ['h1', 'h2']:
        if len(paragraph) > 500:
            chunks.append({ 'content': paragraph, 'id': nextId })
            nextId += 1
            paragraph = ''
        else:
            paragraph += '\n'
    elif element.name in ['br', 'p', 'tr', 'th', 'h3', 'h4']:
        paragraph += '\n'
    elif element.name == 'li':
        paragraph += '\n- '

if len(paragraph) > 0:
    chunks.append({ 'content': paragraph, 'id': nextId })

Calculate the embeddings for each section

There is an infinite amount of transformers that can create embeddings. The ADA-002 transformer from OpenAI outperforms all locally installable transformers we tested. That's why we use the transformer from OpenAI to produce the embeddings.

We loop over each chunk in the Python code below and send it to ADA-002. The Embedding is saved directly in the record of the chunk. Here’s the code:

import openai
openai.api_key = OPENAI_API_KEY

for chunk in chunks:
    response = openai.Embedding.create(model='text-embedding-ada-002', input=chunk['content'])
    chunk['embedding'] = response["data"][0]["embedding"]

Now, we have a library of embeddings for the documentation. Let’s save it to a JSON file. We could also use a vector database, but unless your documentation exceeds thousands of pages, a vector database is overkill.

The following Python code saves all our chunks to a file. It makes sense to add a version to the filename so you can easily determine if you have to regenerate the embeddings.

import json
import os

with open('documentation-1.13.json', 'w') as outfile:
    json.dump(chunks, outfile)

Take the question and create an embedding out of it

To compare the question with the documentation, we need to calculate the embedding of the question.

The following Python code embeds the question using the ADA-002 transformer:

import openai

response = openai.Embedding.create(model='text-embedding-ada-002', input=question)
questionEmbedding = response["data"][0]["embedding"]

Using this Embedding to find the related chunks of the documentation

First, we need to load the documentation with the according embeddings from the JSON file created above. Afterward, we can get the three closest embeddings.

The following Python code loads the documentation chunks from the disk and picks the three closest embeddings:

import json
import os

with open('documentation-1.13.json', 'r') as infile:
    chunks = json.load(infile)

chunkRelatedness = [(chunk, 1 - spatial.distance.cosine(questionEmbedding, chunk['embedding'])) for chunk in chunks]
chunkRelatedness.sort(key = lambda x: x[1], reverse=True)
topChunks = [item[0] for item in chunkRelatedness[:3]]

Take the nearest chunks of the documentation and ask ChatGPT to answer the question only using the information from the sections

Now we have all the information to answer the question. We only need someone to formulate an answer. For this task, GPT-4 is a perfect fit.

If ChatGPT is called from the API, it offers to describe the system we want ChatGPT to represent. In our case, we can instruct ChatGPT to behave like the following:

You answer customer questions appreciatively, in detail, and casually.
To answer the questions, you may only use information from the prompt.
If you cannot answer the question using only information from the prompt, answer with “I don’t know.”

For the content part — this is called the user part in ChatGPT — we pass the question and the sections from the documentation with the instruction Answer the following question only using informations from the promt. ”””<question>””” Documentation <sections>

The following Python code calls GPT-4 with the prompt described above:

userPrompt = F'Answer the following question only using informations from the promt.\n """{question}""\n\n' + \
    'Documentation:'n'
for section in averageSections:
    userPrompt += f'{section["content"]}\n'

messages = [
    {'role': 'system', 'content': 'You answer customer question appreciatively, in detail and in casual-form.\n' + \
        'To answer the questions, you may only use information from the prompt.\n' + \
        'If you cannot answer the question using only information from the prompt, answer with "I don\'t know".' },
    {'role': 'user', 'content': userPrompt }
]

response = openai.ChatCompletion.create(model='gpt-4', messages=messages)
answer = response["choices"][0]["message"]["content"]

print(answer)

Congratulations! We managed to answer a customer question using information extracted from the documentation. But the system suffers from two problems:

Many times, the answer is made up and does not only contain information from the prompt.
Many questions only contain less than a dozen words. This leads to unspecific embeddings because the transformer cannot extract much information from it.

How to make ChatGPT stick to the prompt

If you call ChatGPT on the normal web interface and ask any question, it will give you an answer — except the answer is offensive. It doesn’t matter if the question has an answer or not. The answer will sound reasonable, even if the answer does not exist or, for whatever reason, ChatGPT decided to make up a story. This is called a hallucination.

In some cases, a hallucination is a good thing. If you want ChatGPT to create a new novel, you want it to hallucinate. But in our case, we want it to stick with the information in the prompt.

In the API, there is a temperature that controls the level of hallucination. The normal level is 1, but to answer our question, we want to call GPT with a temperature of 0. This does not fully prevent GPT from hallucinating, but it gets much better.

response = openai.ChatCompletion.create(model='gpt-4', messages=messages, temperature=0)

How to find better embeddings for a question

To compare the question with the documentation, we need a good embedding of the question. But normally, the questions are quite short, which leads to random results in the ranking of the documentation.

This can be improved by letting GPT make up a hallucinated answer and calculate the embedding of the question and the hallucinated answer. Even though the answer is made up, it elaborates the vocabulary of the question. This improves the embedding a lot.

In our specific case, we use GPT-3.5 to generate the hallucinated answer since it’s made up anyway, and GPT-3.5 is much faster and cheaper than GPT-4.

To make sure GPT makes up a random answer, we pass a temperature of 1.2. This is the sweet spot of still answering in complete sentences but with maximum randomness. If we increase the temperature even more, GPT stops making sentences and starts outputting random characters.

The following Python generates the embedding from the question and a hallucinated answer.

import openai

response = openai.ChatCompletion.create(model='gpt-3.5-turbo', messages=[
    {'role': 'system', 'content': 'You answer customer question appreciatively, in detail and in casual-form.' },
    {'role': 'user', 'content': question }
], temperature=1.2)
hallucinatedAnswer = response["choices"][0]["message"]["content"]

response = openai.Embedding.create(model='text-embedding-ada-002', input=F'{question}\n{hallucinatedAnswer}')
embedding = response["data"][0]["embedding"]

chunkRelatedness = [(chunk, 1 - spatial.distance.cosine(questionEmbedding, chunk['embedding'])) for chunk in chunks]
chunkRelatedness.sort(key = lambda x: x[1], reverse=True)
topChunks = [item[0] for item in chunkRelatedness[:3]]

Conclusion

We are amazed to see how far AI technology has progressed. To use it for your own case, it needs a few steps and some logic. But we have had a great experience with the embedding so far.

Or, as ChatGPT would say: We are thrilled to see how our deepest dreams are coming true and how AI helps us get rid of the boring and repetitive work. This frees up our time and energy for more creative and demanding tasks. Like creating more AIs and removing more boring tasks from our table 🤣.