Most support agents are fine in a single conversation and then fall apart the second a user comes back later.

That was the thing I wanted to fix.

I did not want a bot that acts helpful for one chat and then starts from zero the next day. In support, that gets old fast. A customer explains the same billing issue twice, repeats the same background, answers the same question again, and the whole interaction feels cheap even if the model response is technically correct.

So I built a support agent that keeps customer context between sessions instead of treating every conversation like a blank slate.

When a new message comes in, the agent calls recall with the user ID and the current query. That pulls back a few relevant memories for that specific user. Those memories get added to the prompt, the model generates a response with that context, and then the new interaction gets written back with retain so it is available later.

- recall relevant context for the current user

- generate the response with that context in prompt

- retain the new interaction for future sessions

The important part was keeping the memory scoped per user. That sounds obvious, but it matters a lot. Hindsight made this easy with its tags feature. I simply attach the userID as a tag in the ingest and the recall. In a support setting, the last thing I want is one customer’s details bleeding into another customer’s conversation.

The other thing that mattered was keeping recall small.

I did not need a huge memory dump on every message. Using the reflect endpoint I only got the context that actually mattered. Once I found the right setup the model had enough context to sound consistent and informed.

I also found that this worked better than trying to stuff full conversation history into the prompt every time. Full transcripts get noisy. They make the prompt longer, they introduce irrelevant details, and after a while the model starts sounding less focused rather than more informed.

Instead of replaying everything, the agent only sees the parts that are relevant to the current message. That makes the system feel more stable across sessions, and it also makes the memory layer easier to reason about.

The stack itself is pretty simple: Python for orchestration, Streamlit for the frontend, Groq for inference, and Hindsight for memory.

I like this setup because it does one thing clearly. It turns a support bot from a session-based responder into something that can carry context forward without pretending it has a human memory. It still needs the right retrieval logic, the right scoping, and some restraint about how much context gets injected, but once those pieces are in place, the experience is noticeably better.

That was really the point.

Not to build a bot that sounds more dramatic or more intelligent. Just one that does not forget the customer every time the tab closes.

[留言]

为什么值得关注

能改变理解方式,而不只是重复常识;符合当前抓取需求;它提供了新的理解或解释,而不只是表面观点

来源:reddit,领域:tech,保留分:0.64