Because LLMs have limited context windows, as you continue chatting with an LLM-powered bot indefinitely, the conversation length will eventually become too long for the LLM to process. It doesn't matter how large the context window is - an infinitely-long conversation is larger.
In A Common-Sense Guide to AI Engineering on page 292, I briefly discuss the idea of compacting conversation history. In this post, I'll cover this topic further, as well as provide code samples that implement this concept.
The general idea of conversation compaction is that every so often, we reduce the length of the existing conversation history. There are a number of ways to go about this, but here's one approach that I'll implement shortly: Every ten conversation turns, we'll prompt a "summarizer LLM" (an LLM separate from our main chatbot) to summarize the existing conversation. We then rewrite the conversation history so that it consists of three things:
- The system prompt
- The conversation summary produced by the summarizer LLM
- The final exchange between the chatbot and user
The reason for keeping the final exchange between the chatbot and the user is so the conversation can continue seamlessly. If that part was summarized as well, the chatbot might miss the mark when trying to respond to the user's most recent request.
Here's a Python implementation of this solution, using the OpenAI SDK:
from dotenv import load_dotenvfrom openai import OpenAIload_dotenv()llm = OpenAI()user_turn = 0COMPACT_INTERVAL = 10def compact(history): response = llm.responses.create( model="gpt-5.4-mini", input=f"""Below is a conversation between a human user and an LLM-powered chatbot that is getting too long for the chatbot's context window. Summarize the conversation so that the conversation can continue from there. Here is the conversation: {history}""", ) return {"role": "user", "content": f"The following is a summary of the conversation so far between the user and the AI assistant: {response.output_text}"}def llm_response(history): response = llm.responses.create( model="gpt-5.4-nano", input=history, ) return responsedef system_prompt(): return "You are a friendly AI assistant."assistant_message = "How can I help?"user_input = input(f"\nAssistant: {assistant_message}\n\nUser: ")history = [ {"role": "developer", "content": system_prompt()}, {"role": "assistant", "content": assistant_message}, {"role": "user", "content": user_input}]while user_input != "exit": user_turn += 1 if user_turn % COMPACT_INTERVAL == 0: compacted_history = compact(history[:-2]) history = [{"role": "developer", "content": system_prompt()}, compacted_history] + history[-2:] response = llm_response(history) history += [{"role": "assistant", "content": response.output_text}] print(f"\nAssistant: {response.output_text}") user_input = input("\nUser: ") history += [{"role": "user", "content": user_input}]
(Use the "Walkthrough" button at the top of the code snippet to examine the most significant lines of code.)
In theory, this approach works. In practice, though, it only works if the summarizer LLM does a good job. There's a tough balance to strike here: We need to ensure that the summary contains all the key information, and at the same time we want our summary to be substantially shorter than the original conversation. This is easier said than done.
In fact, I was pretty surprised to find that even the newest LLMs can often fail to summarize a conversation properly.
Below is a ten-turn conversation I had with the chatbot:
Assistant: How can I help?User: who was the first person to land on the Moon?Assistant: The first person to land on the Moon was **Neil Armstrong** (Apollo 11), on **July 20, 1969**. He was followed by **Buzz Aldrin** shortly after.User: so, Buzz was the second?Assistant: Yes—**Buzz Aldrin** was the **second** person to land on the Moon. Neil Armstrong stepped onto the lunar surface first, and Buzz Aldrin followed moments later during Apollo 11.User: Who was third?Assistant: The **third person to land on the Moon** was **Pete Conrad** (Apollo 12), who landed in **November 1969**—after the two Apollo 11 astronauts (Armstrong and Aldrin).User: How many moonwalkers have there been in history?Assistant: As of today, **12** people have walked on the Moon (lunar surface). That includes astronauts from **Apollo 11 through Apollo 17** (except Apollo 13, which didn’t land).User: who were they?Assistant: The 12 people who have walked on the Moon are:- **Neil Armstrong** (Apollo 11) - **Buzz Aldrin** (Apollo 11) - **Pete Conrad** (Apollo 12) - **Alan Bean** (Apollo 12) - **Alan Shepard** (Apollo 14) - **Edgar Mitchell** (Apollo 14) - **David Scott** (Apollo 15) - **James Irwin** (Apollo 15) - **John Young** (Apollo 16) - **Charles Duke** (Apollo 16) - **Eugene Cernan** (Apollo 17) - **Harrison Schmitt** (Apollo 17) If you want, I can also list which ones walked on which mission date.User: These were all sent by NASA?Assistant: Yes—**all 12** Moonwalkers were **NASA astronauts**. They all landed during the **Apollo program** (NASA/US).(There have also been **robotic** lunar landers and rovers, but the people who walked on the Moon were all NASA astronauts.)User: No other countries have sent moonwalkers?Assistant: Correct. **So far, only the United States (NASA)** has sent people who have **walked on the Moon**.No other country has had human Moon landings. (Robotic missions to the Moon have been done by many countries, but not human “moonwalkers.”)User: I once heard something about a dog on the MoonAssistant: You may be thinking of **Laika**—the famous dog sent by the Soviet Union—but she wasn’t on the Moon.For a real “dog on the Moon” idea: there *was* a hoax/made-up story (sometimes repeated online) claiming a dog was brought to the lunar surface, but there’s no credible evidence or mission record of any dog (or any animal) actually landing on the Moon.If you tell me where you heard it (movie, TikTok, book, a specific name), I can help trace what the claim was based on.User: I was thinking of Laika. If she wasn't on the Moon, where was she?Assistant: **Laika** was sent into space by the **Soviet Union** in **1957** on **Sputnik 2**.- **Location/Orbit:** She was launched into **low Earth orbit** (not to the Moon).- **What happened:** Her flight was intended as a **one-way mission**; she **died** during the mission (the exact timing and cause vary by account, but it was generally due to conditions like overheating and stress).If you want, I can also explain why Laika is sometimes mentioned in connection with “space dogs” and how Sputnik 2 differed from later missions.User: How many of the moonwalkers are still living?
This last user request begins the tenth turn of the conversation, and so the compact function kicked in to summarize the conversation up until (and including) the user's ninth request (User: I was thinking of Laika. If she wasn't on the Moon, where was she?). Again, we don't include the final assistant and user messages in the summary, since we're going to leave those intact at the end of the conversation history.
To my dismay, here's the summary that was generated:
The following is a summary of the conversation so far between the user and the AI assistant: **Laika** was not on the Moon—she was sent by the **Soviet Union** into **Earth orbit** aboard **Sputnik 2** in **1957**.\n\nA few key points:\n- She was the **first animal to orbit Earth**.\n- Sadly, **she did not survive** the mission.\n- Her flight was a major early milestone in space exploration, but it was **not a lunar mission**.\n\nIf you want, I can also explain **why Laika was chosen** or give a quick timeline of the **first animals in space**.
This summary only captures the final part of the conversation! It completely ignores everything else, including all the human moon landers, which was the biggest topic discussed.
Although I used a "mini" model, I was still surprised that it performed so poorly in summarization. Indeed, when I upgraded the model to gpt-5.4, I received a much more comprehensive summary:
The following is a summary of the conversation so far between the user and the AI assistant: Summary of the conversation so far:\n\n- The assistant is a friendly AI assistant.\n- User asked about Moon landings and moonwalkers.\n- Assistant stated:\n - **Neil Armstrong** was the first person to land/walk on the Moon on **July 20, 1969**.\n - **Buzz Aldrin** was the second.\n - **Pete Conrad** was the third.\n- User asked how many moonwalkers there have been; assistant answered **12** total.\n- Assistant listed the 12 Moon walkers:\n - Neil Armstrong\n - Buzz Aldrin\n - Pete Conrad\n - Alan Bean\n - Alan Shepard\n - Edgar Mitchell\n - David Scott\n - James Irwin\n - John Young\n - Charles Duke\n - Eugene Cernan\n - Harrison Schmitt\n- User asked whether they were all sent by NASA; assistant said **yes**, all 12 were **NASA astronauts** from the **Apollo program**.\n- User asked whether any other countries sent moonwalkers; assistant said **no**, only the **United States/NASA** has sent humans who walked on the Moon.\n- User then mentioned hearing something about a dog on the Moon.\n- Assistant clarified that this was likely confusion with **Laika**, the Soviet space dog, and explained:\n - Laika was **not** on the Moon.\n - There is **no credible evidence** that any dog or other animal ever landed on the Moon.\n - Claims about a dog on the Moon are likely hoaxes or misunderstandings.\n- The user’s latest question is: **“I was thinking of Laika. If she wasn't on the Moon, where was she?”**\n\nUseful answer context for continuing:\n- Laika was a **Soviet space dog** launched aboard **Sputnik 2** in **1957**.\n- She **orbited Earth**, not the Moon.\n- She became the **first animal to orbit Earth**.\n- She did **not survive the mission**; later reports established she died from **overheating/stress** a few hours after launch.
This summary indeed covers the gamut of my conversation with the chatbot. However, the summary isn't much shorter than the original conversation! The summary is roughly 70% of the length of the conversation it's summarizing. This isn't very efficient.
So, I experimented to see if there was an ideal summarizer prompt that could perform well even with gpt-5.4-mini, as the "mini" model is both cheaper and faster.
I found that the following prompt, although not terribly different than the original, seems to work:
def compact(history): response = llm.responses.create( model="gpt-5.4-mini", input=f"""Below is a conversation between a human user and an LLM-powered chatbot that is getting too long for the chatbot's context window. Summarize very briefly the flow of the entire conversation so that the conversation can continue from there. Include all major topics in chronological order. Here is the conversation to summarize: {history}""", ) return {"role": "user", "content": f"The following is a summary of the conversation so far between the user and the AI assistant: {response.output_text}"}
The two key phrases I added here are "very briefly" and "the flow of the entire conversation." This time around, the summarizer LLM produced the following:
The following is a summary of the conversation so far between the user and the AI assistant: The conversation started with Moon-landing history: the user asked who the first person on the Moon was, then confirmed Buzz Aldrin was second, and asked who was third. The assistant explained that the first 12 Moonwalkers, all NASA Apollo astronauts, were the only people who have walked on the Moon so far, and listed them. The user then asked whether all were sent by NASA and whether any other countries have sent moonwalkers; the assistant said yes, only the U.S. has sent humans to walk on the Moon. The user then mentioned hearing about a dog on the Moon, and the assistant clarified that this was likely confusion with Laika, who was not on the Moon. The user’s last question was asking where Laika actually was.
This summary succinctly captures all the major points of the conversation, and is only around 30% of the original conversation's length.
The big takeaway here is that it can be tricky to find the perfect prompt to get an LLM to summarize conversation history well. And this means keeping the summary short while also retaining all the major points.
There was nothing magical about the prompt I finally landed on, and I expect that this prompt may only work well with certain models, but not others. You'll have to optimize your summarizer prompt for whichever model you decide to use.