I don’t recall exactly how I became curious about this, but I started to wonder how much (if anything at all) LLMs recall about their own reasoning, ESPECIALLY coding agents. So I dug in a bit to figure it out.
I started by opening a new session in OpenCode and sending a simple “Hi!” and observed the reasoning traces generated prior to it’s response, then followed up with something like “Do you recall your thinking before you responded? What was it, verbatim. Hint: it began with ‘The user said…’”
Trying this multiple times, I either got a response saying it doesn’t have access to it’s internal reasoning, or it would fabricate something reasonable, but not the same. That was a bit scary, because if it can’t recall any of it’s reasoning, even for it’s last message, what happens to all the reasoning it does prior to making successive tool calls? It’s just wasted/lost?
Next I tried something similar with custom curl requests to the API (for the LLM service), making sure to include `reasoning_content` in the assistant message, but injecting my name into it. Same thing, it had no recollection of my name.
I started looking at the chat templates for models like GLM-5 and the Qwen series models. It turns out they strip reasoning content for all assistant messages prior to the last user message. That makes sense! It’s a great way to keep context from exploding, and because it’s only prior to the last user message, a sequence of tool calls from the model won’t destroy the reasoning.
Next experiment: Back in OpenCode I send this prompt in a new session:
“Make a single tool call to list files, then select two random files, then read only one of them. THEN your final message should tell me which two files you chose at random.”
The reason it should read only one of them is to ensure a) a follow-up tool call fires, and b) it’s regular context has no record of the file it chose but did not read.
Observing it’s reasoning traces I could see the two files it chose at random, and it then correctly told me which two it chose but never made explicit in the regular content.
BUT it’s not over: if I use a similar strategy but asking it to tell me it’s thoughts instead, like this:
“Make a single tool call to list files and then tell me your exact thoughts, verbatim.”
The result is fabricated reasoning that is roughly approximate, but it’s no where near verbatim. So it seems like LLMs have a very difficult time reproducing their own reasoning.
Main Takeaway: By default, it seems most (if not all) LLM services will strip out all reasoning content EXCEPT for the latest assistant message that does NOT have a user message after it in the session thread.
NOTE: Some chat templates, like GLM-5’s, accept a clear_history argument that is true by default. Submitting false will preserve ALL previous reasoning.
