I built an autonomous data investigation agent on top of LangGraph + Claude - here's how the loo

Been building a project for a client that monitors Shopify stores overnight and autonomously investigates revenue anomalies. Not just alerting - actually digging in. Sharing details for your feedback and suggestions:

What it does

- Every night it fetches the last 65 days of data, runs a 3-level anomaly check (daily vs 14-day rolling average → week-over-week → month-over-month), and if it finds a >20% deviation, kicks off an investigation. You wake up to a WhatsApp/email: "Revenue dropped 34% yesterday. Most likely: SKU-447 stockout - it appeared in 6 of 8 spike-day orders last week and now has 0 inventory. Restock it."

The agent loop

Built on LangGraph. Each investigation step is:

form_hypothesis - LLM proposes one specific testable hypothesis given prior steps + memory

select_tool - LLM picks the best tool to test it and calls it

evaluate - LLM evaluates whether the tool output confirms/rejects/is inconclusive

Router decides: loop again or conclude

conclude - produces ranked candidates with evidence + one concrete recommended action

The memory system - this was the interesting part

Three layers of persistent memory in Postgres, all tenant-scoped:

Schema memory — tracks which Shopify/GA4/GSC fields work, which custom queries succeeded/failed. Injected into every prompt so the agent stops retrying queries that will never work.

Business context — extracted patterns after each investigation: "branded search queries held steady while non-branded dropped in Apr 2026", "typical weekly order count 45–60". Gets invalidated when new evidence contradicts it.

Investigation history — last N investigations on this metric. Agent explicitly told not to re-test already-confirmed/rejected hypotheses.

Without schema memory the agent would repeatedly hit error on queries and waste steps. Without business context it had no baseline for what "normal" looked like for this specific store.

Things that still need to be fixed:

- Anthropic's 30k input tokens/min rate limit: three LLM calls per step × large tool outputs = rate limit hit on step 3–4. - Keep memory fresh and pick up relevant items from memory - Agent sometimes ignores schema constraints

Still rough but the core loop works.

Would love to get feedback from this group on how can I improve this more.

[留言]

为什么值得关注

原内容本身有足够细节，不是表面信息；符合当前抓取需求；原内容本身有足够细节，不是标题党或空洞总结

来源：reddit，领域：projects，保留分：0.56